Introduction
Have you ever wondered what a class
file that IntelliJ runs actually is? Or the process of actually running it? You’ll explore the class
format
and learn how Java executes programs by writing your own JVM in this project!
Please be advised: At the end of this project, an arbitrary subset of students will be asked to complete an oral interview explaining their code as part of the assignment. Students could be selected for any reason, including to ensure understanding or because course staff thinks the code is really cool.
These interviews are expected to last about 15 minutes, and your code will be available to you throughout. If you are called to interview, course staff will contact you to schedule a time, and your grade will be held until the interview is completed.
Setup
Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.
The labradoodle.caltech.edu
Machine
As we’ve discussed in class, labradoodle.caltech.edu
has an x86-64 architecture which means it knows how to execute x86-64 machine code. The standard
executable format for Linux (which contains x86-64 machine code) is called ELF
. clang
outputs ELF
files which can
then be run directly from the command line. For this class, you will primarily be using the remote machine you set up in the pre-test which can be
ssh
ed to at labradoodle.caltech.edu
. This machine is a standard x86-64 Ubuntu
installation. For the duration of the course, we will
generally assume you are using this machine. As such, our explanations may only apply to labradoodle.caltech.edu
(and not Windows or OS X).
The Scenario
Recall that C code is compiled directly to machine code; it’s the job of a compiler like clang
to take in text and output bytes. In contrast, Java is interpreted.
Instead of being run directly on the machine, a program called the JVM (for “Java Virtual Machine”) interprets the program.
In this project, we will examine exactly how compiled Java classes are executed by the JVM by writing our own.
Compiling Java Manually
In CS 2, IntelliJ did the compiling and running for you. Under the hood, it was compiling using a program called javac
and executing with a program called java
(the JVM) in a pipeline like
this:
graph LR;
java[Hello.java]
cls[Hello.class]
java--javac Hello.java-->cls
Task 0.
In the task0
folder, there is a Java program called Test.java
. For this set of questions, we will ask about various aspects of this program.
Begin by cd
ing into the task0
directory, ls
ing to see what’s in the directory, and compiling Test.java
with the command javac Test.java
.
Originally, there was just a Test.java
file and a picture, but, now, the Java compiler has created Test.class
for us. This class
file can be run by the JVM.
Make sure to git add task0/Test.class
to get it in the repository!
Submit Responses For Credit
This is all fine, but why does Java use a virtual machine at all?
Virtual Machines
Using a virtual machine allows Java to be more portable than machine code. You can run the same Java class on multiple systems as long as there is a JVM implementation for that system. Additionally, using a JVM allows Java to be safe. We can always check if an instruction is okay before executing it rather than just letting the hardware run wild.
Submit Responses For Credit
Your Main Task
In this project, you will implement a (simplified) JVM which can handle all the Java bytecode integer instructions, and thus, you will be able to run Java programs that compute over the integers on your TeenyJVM.
The JVM
The JVM is a stack machine (in contrast with a register machine like x86-64). This means that all the operations implicitly happen on an operand stack (made up of 32-bit signed integers) popping arguments and pushing the result. The callee is responsible for setting up the operand stack.
The Operand Stack
This is very abstract; so, we’ll look at an example. In the following Java Bytecode snippet, we load constants 1 and 2 and then add them. The operand stack is pictured after each instruction.
# | Instruction | Operand Stack |
---|---|---|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{c} \hline \end{array}\end{array}\) | ||
0 | bipush 1 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 1 \\\hline \end{array}\end{array}\) |
1 | bipush 2 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 2 \\\hline 1 \\\hline \end{array}\end{array}\) |
2 | iadd | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 3 \\\hline \end{array}\end{array}\) |
Instructions and Opcodes
Class files do not actually contain the names of the instructions like we showed above; instead, they use pre-defined “opcodes” which are bytes that the JVM specification defines
to correspond to the instruction “mnemonics”. These are defined in jvm.h
. Please do not use magic numbers everywhere. We will start with these three instructions.
Opcode | Mnemonic | Effect |
---|---|---|
0x10 b |
bipush b
|
Pushes b onto the operand stack. |
0x60 | iadd | Pops the top two values from the operand stack, adds them, and pushes the result. |
0xb1 | return | Exits the current function without returning any value |
The Program Counter
Every method is represented in the class file as an array of bytes of code. For example, for the code snippet
bipush 1; bipush 2; iadd
, the array of bytes would be [0x10, 0x01, 0x10, 0x02, 0x60]
.
At all times during execution, the JVM needs to know what instruction it is at. The pc
or “program counter” is an index into the array of code for the current method.
During normal execution, the pc
increments after every instruction is run. Notice that some instructions (like bipush
) take up more than one byte in the code array.
Furthermore, there are some “jump” or “goto” instructions that can change the pc
in other ways, which you will implement later.
Every method ends in a return
instruction, so the program counter should never reach the end of the code array.
When implementing bytecode instructions, make sure you are using switch
statements to avoid stack overflow errors.
If you use if-else chains, you may cause stack overflows due to debug stack frames in the later tasks.
Consider reading the switch tips.
Task 1.
Our understanding of the JVM is now rich enough to implement the three instructions bipush
, iadd
, and return
in the execute
function of jvm.c
. Go ahead and implement
these (making sure to update the program counter correctly in all cases).
Note that the argument to bipush
is a signed byte.
The compiler guarantees that the bytecode tells us how big the operand stack needs to be. You can use the
max_stack
field of the code_t
struct to access this information. You will notice that the execute
function takes in four parameters. For now, you should ignore
all of them except method
; we will specify as the others become relevant.
After implementing these instructions, run make test1
; your program should not output anything, but it should execute successfully.
Keep in mind that the tests for later stages may expose subtle bugs in your code.
There are debugging hints in the appendix which you may find useful.
If you want to use print debugging, read how to get prints to show up in CS 24.
Printing 1 + 2
As you likely noticed in part 1, we were unable to actually see the results we were computing.
In this part, you will make System.out.println
work. Since we will not be implementing Object
s, we will need to execute printing in a (very) incorrect way.
Opcode | Mnemonic | Effect |
---|---|---|
0xb2 b1 b2
|
getstatic b1 b2
|
Moves the program counter past b2 (i.e., increment it by three). |
0xb6 b1 b2
|
invokevirtual b1 b2
|
Pops and prints the top value of the operand stack followed by a newline character. Then, moves the program counter past b2 (i.e., increments it by three). |
Task 2.
Implement the getstatic
and invokevirtual
exactly to our specification.
After implementing these instructions, run make test2
; if your code is correct, the tester will output a success message.
More Constant Instructions
To reduce the bytecode size, the JVM has specific instructions to load common constants. When implementing these, you must avoid code duplication. We will take off a significant number of points if you repeat the same line repeatedly. (Hint: The opcodes are consecutive numbers, see switch tips)
Here is a listing of these instructions:
Opcode | Mnemonic | Operand Stack |
---|---|---|
0x02 | iconst_m1 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline -1 \\\hline ... \\\hline \end{array}\end{array}\) |
0x03 | iconst_0 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 0 \\\hline ... \\\hline \end{array}\end{array}\) |
0x04 | iconst_1 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 1 \\\hline ... \\\hline \end{array}\end{array}\) |
0x05 | iconst_2 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 2 \\\hline ... \\\hline \end{array}\end{array}\) |
0x06 | iconst_3 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 3 \\\hline ... \\\hline \end{array}\end{array}\) |
0x07 | iconst_4 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 4 \\\hline ... \\\hline \end{array}\end{array}\) |
0x08 | iconst_5 | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 5 \\\hline ... \\\hline \end{array}\end{array}\) |
0x11 b1 b2
|
sipush b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{(b1 << 8) | b2} \\\hline ... \\\hline \end{array}\end{array}\) |
Task 3.
Implement the iconst_*
and sipush
instructions.
Note that the argument to sipush
is a signed short.
After implementing these instructions, run make test3
; if your code is correct, the tester will output a success message.
Arithmetic and Bitwise Operations
Next, we will implement the integer operations. All operations pop their argument(s) from the operand stack and push the result.
Here is a listing of the instructions:
Opcode | Mnemonic | Operand Stack |
---|---|---|
0x60 | iadd | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a + b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x64 | isub | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a - b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x68 | imul | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a * b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x6c | idiv | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a / b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x70 | irem | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a % b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x74 | ineg | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{-a} \\\hline ... \\\hline \end{array}\end{array}\) |
0x78 | ishl | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a << b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x7a | ishr | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a >> b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x7c | iushr | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{((unsigned) a) >> b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x7e | iand | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a & b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x80 | ior | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a | b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
0x82 | ixor | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a ^ b} \\\hline ... \\\hline \end{array}\\\\\end{array}\) |
Task 4. Implement these instructions.
You may assume for the shift instructions that b
is a valid shift (i.e. b
is nonnegative and less than the number of bits in an int
.)
After implementing them, run make test4
; if your code is correct, the tester will output a success message.
Using Local Variables
In addition to the operand stack, every method call has an array of local variables which it can read from and write to. The locals array is 0-indexed, and the arguments to the method call are stored at the start of the locals array. The caller of a method is responsible for setting up the locals array. This means you do not need to allocate any locals arrays right now. We will come back to this later when we implement method calls.
Consider the following snippet of Java code which uses arguments and local variables:
1
2
3
4
5
public static void locals(int a, int b) {
int c = a + b;
int d = c / 2;
int e = d * 2;
}
Here are the corresponding (annotated) Java bytecode instructions:
Instruction | Operand stack | Locals |
---|---|---|
[] |
[a, b, _, _, _] |
|
iload_0 |
[a] |
[a, b, _, _, _] |
iload_1 |
[a, b] |
[a, b, _, _, _] |
iadd |
[a + b] |
[a, b, _, _, _] |
istore_2 |
[] |
[a, b, a + b, _, _] |
iload_2 |
[a + b] |
[a, b, a + b, _, _] |
iconst_2 |
[a + b, 2] |
[a, b, a + b, _, _] |
idiv |
[(a + b) / 2] |
[a, b, a + b, _, _] |
istore_3 |
[] |
[a, b, a + b, (a + b) / 2, _] |
iload_3 |
[(a + b) / 2] |
[a, b, a + b, (a + b) / 2, _] |
iconst_2 |
[(a + b) / 2, 2] |
[a, b, a + b, (a + b) / 2, _] |
imul |
[(a + b) / 2 * 2] |
[a, b, a + b, (a + b) / 2, _] |
istore 4 |
[] |
[a, b, a + b, (a + b) / 2, (a + b) / 2 * 2] |
return |
Here are the instructions that interact with the local variables.
Like the iconst_*
instructions, there are several shorthand instructions for iload i
and istore i
.
Opcode | Mnemonic | Operand Stack | Locals Array |
---|---|---|---|
0x15 i
|
iload i
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{locals[i]} \\\hline ... \\\hline \end{array}\end{array}\) | |
0x36 i
|
istore i
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | locals[i] = a |
0x84 i b
|
iinc i b
|
locals[i] += b | |
0x1a | iload_0 | Same as iload 0
|
|
0x1b | iload_1 | Same as iload 1
|
|
0x1c | iload_2 | Same as iload 2
|
|
0x1d | iload_3 | Same as iload 3
|
|
0x3b | istore_0 | Same as istore 0
|
|
0x3c | istore_1 | Same as istore 1
|
|
0x3d | istore_2 | Same as istore 2
|
|
0x3e | istore_3 | Same as istore 3
|
Task 5. Implement the arithmetic instructions as above.
Note that i
is an unsigned byte and b
is a signed byte.
After implementing these instructions, run make test5
; if your code is correct, the tester will output a success message.
Using The Constant Pool
You may have noticed that up until now, we haven’t dealt with any large numbers. This is because these (along with method references) are stored in yet another area called the constant pool.
Consider this sample program:
1
2
3
4
5
6
7
public class Test {
public static int addLargeNumbers() {
int x = 24242424;
int y = 123456789;
return x + y;
}
}
This is the corresponding constant pool:
1
2
3
4
5
6
7
8
#1 = Methodref #4.#7 // Test.addLargeNumbers:()I
#2 = Integer 24242424
#3 = Integer 123456789
#4 = Class #8 // Test
#5 = Utf8 "addLargeNumbers"
#6 = Utf8 "()I"
#7 = NameAndType #5:#6 // addLargeNumbers:()I
#8 = Utf8 "Test"
Notice that in addition to integers, it stores other kinds of values (such as method references).
We will use these later to resolve methods when they are called.
To load a constant from the constant pool, programs use the ldc
instruction, which specifies its index in the constant pool:
1
2
3
4
5
6
7
8
9
10
public static int addLargeNumbers():
ldc #2 // int 24242424
istore_0
ldc #3 // int 123456789
istore_1
iload_0
iload_1
iadd
ireturn
Opcode | Mnemonic | Operand Stack |
---|---|---|
0x12 b
|
ldc b
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{constant_pool[b - 1]} \\\hline ... \\\hline \end{array}\end{array}\) |
Task 6.
Implement ldc
as above.
Note that b
is an unsigned byte and 1-indexed, but class->constant_pool
is 0-indexed.
After implementing this instruction, run make test6
; if your code is correct, the tester will output a success message.
Jumps
The Java compiler transforms loops and if
statements into various forms of “jumps”.
Some of these are “conditional jumps” (i.e., they change the program counter if a condition is met, and otherwise continue to the next instruction).
Note that the jump can go forward or backward in the code array relative to the jump instruction!
Opcode | Mnemonic | Operand Stack | Program Counter |
---|---|---|---|
0x99 b1 b2
|
ifeq b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | if (a == 0) pc += (b1 << 8) | b2 |
0x9a b1 b2
|
ifne b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | if (a != 0) pc += (b1 << 8) | b2 |
0x9b b1 b2
|
iflt b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | if (a < 0) pc += (b1 << 8) | b2 |
0x9c b1 b2
|
ifge b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | if (a >= 0) pc += (b1 << 8) | b2 |
0x9d b1 b2
|
ifgt b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | if (a > 0) pc += (b1 << 8) | b2 |
0x9e b1 b2
|
ifle b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | if (a <= 0) pc += (b1 << 8) | b2 |
Opcode | Mnemonic | Operand Stack | Program Counter |
---|---|---|---|
0x9f b1 b2
|
if_icmpeq b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) | if (a == b) pc += (b1 << 8) | b2 |
0xa0 b1 b2
|
if_icmpne b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) | if (a != b) pc += (b1 << 8) | b2 |
0xa1 b1 b2
|
if_icmplt b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) | if (a < b) pc += (b1 << 8) | b2 |
0xa2 b1 b2
|
if_icmpge b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) | if (a >= b) pc += (b1 << 8) | b2 |
0xa3 b1 b2
|
if_icmpgt b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) | if (a > b) pc += (b1 << 8) | b2 |
0xa4 b1 b2
|
if_icmple b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) | if (a <= b) pc += (b1 << 8) | b2 |
Opcode | Mnemonic | Program Counter |
---|---|---|
0xa7 b1 b2
|
goto b1 b2
|
pc += (b1 << 8) | b2 |
Task 7. Implement the jump instructions as above.
Note that the offsets for the jumps are signed values relative to the index of the jump instruction in the code array.
After implementing these instructions, run make test7
; if your code is correct, the tester will output a success message.
Static Method Calls
The second to last feature we have to implement for our TeenyJVM is method calls and returns.
These use the invokestatic
and ireturn
/return
instructions, respectively.
Opcode | Mnemonic | Operand Stack | Effect |
---|---|---|---|
0xac | ireturn | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | return a from the current function |
0xb1 | return | return from the current function | |
0xb8 b1 b2
|
invokestatic b1 b2
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{argN} \\\hline \texttt{...} \\\hline \texttt{arg2} \\\hline \texttt{arg1} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{result} \\\hline ... \\\hline \end{array}\\\\\\\\\end{array}\) | Calls a method (see below) |
Conceptually, each method call has two pieces of context: its locals array and its operand stack. When one method calls another, the following steps happen:
- The caller resolves the provided unsigned constant pool index (
(b1 << 8) | b2
) into amethod_t
object using the providedfind_method_from_index
function. - The caller allocates and initializes the locals array of the callee with the arguments to the function as the first few entries.
The arguments are popped from the operand stack of the caller.
(Hint: Remember that you are transferring the arguments from a stack to a queue. What does this mean about the ordering?)
You might find the
get_number_of_parameters
function helpful here. - The caller (recursively) executes the
method_t
object corresponding to the callee. - The callee executes and
return
svoid
orireturn
s anint
from its operand stack. - The caller pushes the return value (if there is one) onto its own operand stack.
Task 8. Implement the method call instructions as above.
After implementing these instructions, run make test8
; if your code is correct, the tester will output a success message.
Integer Arrays
The last feature we are implementing for TeenyJVM is integer arrays, or int[].
First, here are some miscellaneous instructions to implement:
Opcode | Mnemonic | Operand Stack | Effect |
---|---|---|---|
0x00 | nop | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\end{array}\) | Does nothing |
0x59 | dup | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array} \\ \\ \end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline \texttt{a} \\ \hline ... \\\hline \end{array}\\\end{array}\) | Duplicate the top operand stack value |
The instructions that works with arrays are listed as follows:
Opcode | Mnemonic | Operand Stack | Effect |
---|---|---|---|
0xbc b
|
newarray b
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{count} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\ \hline ... \\\hline \end{array}\\\end{array}\) | Create a new array with count elements |
0xbe | arraylength | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{count} \\ \hline ... \\\hline \end{array}\\\end{array}\) | Get length of array |
0xb0 | areturn | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | return ref from the current function |
0x4f | iastore | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{value}\\ \hline \texttt{index}\\\hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\\\end{array}\) | Store into an int array |
0x2e | iaload | \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{index}\\\hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{value}\\ \hline ... \\\hline \end{array}\\\\\end{array}\) | Load int from an array |
The C analogue to Java’s int is int32_t, so the analogue to int[] is int32_t *. Instead of being left uninitialized, arrays in Java are initialized to their type’s default values. A Java int[] array is thus simply a block of contiguous memory allocated in the heap. In C, we can interact with integer arrays through pointers, but Java does not have a concept of a pointer–it instead has references.
-
For simplification, we treat references as indexes into a fake memory array that we call heap_t. These reference indexes are also represented with int32_t. After allocating an int32_t * array, you can use the function
heap_add
inheap.h
to store an allocated int array pointer and retrieve a reference. You can then use the functionheap_get
inheap.h
to dereference a reference. -
For example, newarray
int
would allocate an int32_t * array with an appropriate capacity, put it in heap_t to get a reference, and put the reference on the stack. iastore would dereference ref into array, and then set array[index] = value.
Arrays can also be stored in local variables, and they have their own associated instructions for doing so. They are very similar to the iload
and istore
instructions. Since our simplified references are just integer indices into a fake memory array, they are implemented in the exact same way as iload
and istore
.
Opcode | Mnemonic | Operand Stack | Locals Array |
---|---|---|---|
0x19 i
|
aload i
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{locals[i]} \\\hline ... \\\hline \end{array}\end{array}\) | |
0x3a i
|
astore i
|
\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) | locals[i] = ref |
0x2a | aload_0 | Same as aload 0
|
|
0x2b | aload_1 | Same as aload 1
|
|
0x2c | aload_2 | Same as aload 2
|
|
0x2d | aload_3 | Same as aload 3
|
|
0x4b | astore_0 | Same as astore 0
|
|
0x4c | astore_1 | Same as astore 1
|
|
0x4d | astore_2 | Same as astore 2
|
|
0x4e | astore_3 | Same as astore 3
|
Task 9. Implement the integer array instructions as above.
You may assume that b
in newarray b
is 10, which is the type code for int[] (just read and discard a byte). You must store the array in such a way that you are able to retrieve the array length. You may also assume that all indexing operation are valid. Java arrays are indexed by int values.
After implementing these instructions, run make test9
; if your code is correct, the tester will output a success message.
Reflection
Take a moment to think about what you’ve just implemented. The tests are real Java programs. This is how all the programs you wrote in CS 2 were executed. TeenyJVM is (for the most part) implemented the same way as the real JVM.
Appendix: Switch Tips
- You can’t declare a variable right after a
case
label. Instead, add braces or a semicolon:
1
2
3
4
5
6
7
8
9
switch (x) {
case 5: {
int five = x;
break;
}
case 6:; // note the ';'
int six = x;
break;
}
-
enum
variants (including, for instance,i_bipush
) compile into integers that you can do arithmetic with! For example:
1
2
3
4
5
6
7
8
typedef enum {
op_add,
op_sub,
op_mul,
op_div
} arith_op_t;
printf("%d", op_div - op_sub); // prints "2"
-
case
labels can accept a ranges, which is equivalent to using fallthrough:
1
2
3
4
5
6
7
8
9
10
11
12
13
switch (x) {
case op_sub:
case op_mul:
case op_div:
do_thing();
break;
}
// equivalently
switch (x) {
case op_sub ... op_div:
do_thing();
break;
}
- If the
switch
‘ed value doen’t match anycase
’s, nothing happens by default. This is almost never what you want, because we should account for all possible inputs to our program. Add an explicitdefault
, which can simply crash:
1
2
3
4
5
6
7
switch (x) {
// ...
default:
fprintf(stderr, "Unhandled value! %d\n", x);
exit(99);
break;
}
When might you want the default
block to do nothing? When might you want it to
crash? Which is more correct for your interpreter loop? Which makes debugging easier?
Appendix: Debugging tests
make test
provides a simple way to verify that your code passes all the tests.
However, you will likely find it useful to run a single test at a time when debugging.
Here is an overview of how the tests are run:
- The tests are all Java programs, represented as
.java
or precompiled.class
files in thetests
directory, for exampletests/Arithmetic.java
. Each.java
file is compiled withjavac
into a.class
file, for exampletests/Arithmetic.class
. - The “expected” output of each test is computed by running it through the standard
java
interpreter. This is stored in the fileTEST_NAME-expected.txt
, for exampletests/Arithmetic-expected.txt
. - The “actual” output of each test is computed by running it through your
./jvm
executable. This is stored in the fileTEST_NAME-actual.txt
, for exampletests/Arithmetic-actual.txt
. - The two output files are compared to each other using the
diff
command, which reports any differences between the files.
To run a test, you can use the command make TEST_NAME-result
.
For example:
1
2
3
4
5
6
7
8
9
$ make Arithmetic-result
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c jvm.c -o jvm.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c read_class.c -o read_class.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer jvm.o read_class.o -o jvm
javac tests/Arithmetic.java
java -cp tests Arithmetic > tests/Arithmetic-expected.txt
./jvm tests/Arithmetic.class > tests/Arithmetic-actual.txt
diff -u tests/Arithmetic-expected.txt tests/Arithmetic-actual.txt ...
PASSED test Arithmetic.
If you don’t want to see the diff
output, you can just build the TEST_NAME-actual.txt
file:
1
2
3
4
5
6
$ make tests/Arithmetic-actual.txt
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c jvm.c -o jvm.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c read_class.c -o read_class.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer jvm.o read_class.o -o jvm
javac tests/Arithmetic.java
./jvm tests/Arithmetic.class > tests/Arithmetic-actual.txt
Note that the command to generate TEST_NAME-actual.txt
redirects the output of ./jvm
.
This means that anything printed using printf()
will be written to TEST_NAME-actual.txt
instead of the terminal.
If you want to print some debugging information, use fprintf(stderr, ...)
instead.
You can also use the javap
tool to show the constant pool and bytecode instructions in a .class
file.