Introduction

Have you ever wondered what a class file that IntelliJ runs actually is? Or the process of actually running it? You’ll explore the class format and learn how Java executes programs by writing your own JVM in this project!

Please be advised: At the end of this project, an arbitrary subset of students will be asked to complete an oral interview explaining their code as part of the assignment. Students could be selected for any reason, including to ensure understanding or because course staff thinks the code is really cool.

These interviews are expected to last about 15 minutes, and your code will be available to you throughout. If you are called to interview, course staff will contact you to schedule a time, and your grade will be held until the interview is completed.

Setup

Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.

The `labradoodle.caltech.edu` Machine

As we’ve discussed in class, labradoodle.caltech.edu has an x86-64 architecture which means it knows how to execute x86-64 machine code. The standard executable format for Linux (which contains x86-64 machine code) is called ELF. clang outputs ELF files which can then be run directly from the command line. For this class, you will primarily be using the remote machine you set up in the pre-test which can be sshed to at labradoodle.caltech.edu. This machine is a standard x86-64 Ubuntu installation. For the duration of the course, we will generally assume you are using this machine. As such, our explanations may only apply to labradoodle.caltech.edu (and not Windows or OS X).

The Scenario

Recall that C code is compiled directly to machine code; it’s the job of a compiler like clang to take in text and output bytes. In contrast, Java is interpreted. Instead of being run directly on the machine, a program called the JVM (for “Java Virtual Machine”) interprets the program. In this project, we will examine exactly how compiled Java classes are executed by the JVM by writing our own.

Compiling Java Manually

In CS 2, IntelliJ did the compiling and running for you. Under the hood, it was compiling using a program called javac and executing with a program called java (the JVM) in a pipeline like this:

graph LR;
    java[Hello.java]
    cls[Hello.class]
    java--javac Hello.java-->cls

Task 0. In the task0 folder, there is a Java program called Test.java. For this set of questions, we will ask about various aspects of this program.

Begin by cding into the task0 directory, lsing to see what’s in the directory, and compiling Test.java with the command javac Test.java. Originally, there was just a Test.java file and a picture, but, now, the Java compiler has created Test.class for us. This class file can be run by the JVM. Make sure to git add task0/Test.class to get it in the repository!

Question. Magic Numbers and Class Files

To explore what the class actually looks like, we’re going to use a tool called xxd which prints out a file as hex. Run the command xxd Test.class. The output should have a few hex digits that can be read as words at the front (specifically eight of them). What are those hex digits?

Those eight hex digits are known as a “file magic number”; they allow your system to know what type of file something is. Determine what the file magic number for a png is. Hint: There’s already a png in the folder!

What is the file magic number for pngs (in hex)?

Another way to interpret the bytes in the file is as Java Bytecode; there is a command called javap which reads a class file and shows what instructions get executed when it is eventually run. Run the command javap -v Test.class. This command spits out a lot of information which we will begin to understand later. For now, find the main method and determine how many local variables it uses by reading the output.

The number of local variables in main is .

Check Answer

Submit Responses For Credit

This is all fine, but why does Java use a virtual machine at all?

Virtual Machines

Using a virtual machine allows Java to be more portable than machine code. You can run the same Java class on multiple systems as long as there is a JVM implementation for that system. Additionally, using a JVM allows Java to be safe. We can always check if an instruction is okay before executing it rather than just letting the hardware run wild.

Submit Responses For Credit

Your Main Task

In this project, you will implement a (simplified) JVM which can handle all the Java bytecode integer instructions, and thus, you will be able to run Java programs that compute over the integers on your TeenyJVM.

The JVM

The JVM is a stack machine (in contrast with a register machine like x86-64). This means that all the operations implicitly happen on an operand stack (made up of 32-bit signed integers) popping arguments and pushing the result. The callee is responsible for setting up the operand stack.

The Operand Stack

This is very abstract; so, we’ll look at an example. In the following Java Bytecode snippet, we load constants 1 and 2 and then add them. The operand stack is pictured after each instruction.

#	Instruction	Operand Stack
		\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{c} \hline \end{array}\end{array}\)
0	bipush 1	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 1 \\\hline \end{array}\end{array}\)
1	bipush 2	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 2 \\\hline 1 \\\hline \end{array}\end{array}\)
2	iadd	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 3 \\\hline \end{array}\end{array}\)

Instructions and Opcodes

Class files do not actually contain the names of the instructions like we showed above; instead, they use pre-defined “opcodes” which are bytes that the JVM specification defines to correspond to the instruction “mnemonics”. These are defined in jvm.h. Please do not use magic numbers everywhere. We will start with these three instructions.

Opcode	Mnemonic	Effect
`0x10 b`	`bipush` `b`	Pushes `b` onto the operand stack.
`0x60`	`iadd`	Pops the top two values from the operand stack, adds them, and pushes the result.
`0xb1`	`return`	Exits the current function without returning any value

The Program Counter

Every method is represented in the class file as an array of bytes of code. For example, for the code snippet bipush 1; bipush 2; iadd, the array of bytes would be [0x10, 0x01, 0x10, 0x02, 0x60].

At all times during execution, the JVM needs to know what instruction it is at. The pc or “program counter” is an index into the array of code for the current method. During normal execution, the pc increments after every instruction is run. Notice that some instructions (like bipush) take up more than one byte in the code array. Furthermore, there are some “jump” or “goto” instructions that can change the pc in other ways, which you will implement later. Every method ends in a return instruction, so the program counter should never reach the end of the code array.

When implementing bytecode instructions, make sure you are using switch statements to avoid stack overflow errors. If you use if-else chains, you may cause stack overflows due to debug stack frames in the later tasks. Consider reading the switch tips.

Task 1. Our understanding of the JVM is now rich enough to implement the three instructions bipush, iadd, and return in the execute function of jvm.c. Go ahead and implement these (making sure to update the program counter correctly in all cases).

Note that the argument to bipush is a signed byte.

The compiler guarantees that the bytecode tells us how big the operand stack needs to be. You can use the max_stack field of the code_t struct to access this information. You will notice that the execute function takes in four parameters. For now, you should ignore all of them except method; we will specify as the others become relevant.

After implementing these instructions, run make test1; your program should not output anything, but it should execute successfully. Keep in mind that the tests for later stages may expose subtle bugs in your code. There are debugging hints in the appendix which you may find useful.

If you want to use print debugging, read how to get prints to show up in CS 24.

Printing 1 + 2

As you likely noticed in part 1, we were unable to actually see the results we were computing. In this part, you will make System.out.println work. Since we will not be implementing Objects, we will need to execute printing in a (very) incorrect way.

Opcode	Mnemonic	Effect
`0xb2` `b1 b2`	`getstatic` `b1 b2`	Moves the program counter past `b2` (i.e., increment it by three).
`0xb6` `b1 b2`	`invokevirtual` `b1 b2`	Pops and prints the top value of the operand stack followed by a newline character. Then, moves the program counter past `b2` (i.e., increments it by three).

Task 2. Implement the getstatic and invokevirtual exactly to our specification.

After implementing these instructions, run make test2; if your code is correct, the tester will output a success message.

More Constant Instructions

To reduce the bytecode size, the JVM has specific instructions to load common constants. When implementing these, you must avoid code duplication. We will take off a significant number of points if you repeat the same line repeatedly. (Hint: The opcodes are consecutive numbers, see switch tips)

Here is a listing of these instructions:

Opcode	Mnemonic	Operand Stack
`0x02`	`iconst_m1`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline -1 \\\hline ... \\\hline \end{array}\end{array}\)
`0x03`	`iconst_0`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 0 \\\hline ... \\\hline \end{array}\end{array}\)
`0x04`	`iconst_1`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 1 \\\hline ... \\\hline \end{array}\end{array}\)
`0x05`	`iconst_2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 2 \\\hline ... \\\hline \end{array}\end{array}\)
`0x06`	`iconst_3`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 3 \\\hline ... \\\hline \end{array}\end{array}\)
`0x07`	`iconst_4`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 4 \\\hline ... \\\hline \end{array}\end{array}\)
`0x08`	`iconst_5`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline 5 \\\hline ... \\\hline \end{array}\end{array}\)
`0x11` `b1 b2`	`sipush` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{(b1 << 8) \| b2} \\\hline ... \\\hline \end{array}\end{array}\)

Task 3. Implement the iconst_* and sipush instructions.

Note that the argument to sipush is a signed short.

After implementing these instructions, run make test3; if your code is correct, the tester will output a success message.

Arithmetic and Bitwise Operations

Next, we will implement the integer operations. All operations pop their argument(s) from the operand stack and push the result.

Here is a listing of the instructions:

Opcode	Mnemonic	Operand Stack
`0x60`	`iadd`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a + b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x64`	`isub`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a - b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x68`	`imul`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a * b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x6c`	`idiv`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a / b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x70`	`irem`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a % b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x74`	`ineg`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{-a} \\\hline ... \\\hline \end{array}\end{array}\)
`0x78`	`ishl`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a << b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x7a`	`ishr`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a >> b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x7c`	`iushr`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{((unsigned) a) >> b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x7e`	`iand`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a & b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x80`	`ior`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a \| b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
`0x82`	`ixor`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a ^ b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)

Task 4. Implement these instructions.

You may assume for the shift instructions that b is a valid shift (i.e. b is nonnegative and less than the number of bits in an int.)

After implementing them, run make test4; if your code is correct, the tester will output a success message.

Using Local Variables

In addition to the operand stack, every method call has an array of local variables which it can read from and write to. The locals array is 0-indexed, and the arguments to the method call are stored at the start of the locals array. The caller of a method is responsible for setting up the locals array. This means you do not need to allocate any locals arrays right now. We will come back to this later when we implement method calls.

Consider the following snippet of Java code which uses arguments and local variables:

public static void locals(int a, int b) {
    int c = a + b;
    int d = c / 2;
    int e = d * 2;
}

Here are the corresponding (annotated) Java bytecode instructions:

Instruction	Operand stack	Locals
	`[]`	`[a, b, _, _, _]`
`iload_0`	`[a]`	`[a, b, _, _, _]`
`iload_1`	`[a, b]`	`[a, b, _, _, _]`
`iadd`	`[a + b]`	`[a, b, _, _, _]`
`istore_2`	`[]`	`[a, b, a + b, _, _]`
`iload_2`	`[a + b]`	`[a, b, a + b, _, _]`
`iconst_2`	`[a + b, 2]`	`[a, b, a + b, _, _]`
`idiv`	`[(a + b) / 2]`	`[a, b, a + b, _, _]`
`istore_3`	`[]`	`[a, b, a + b, (a + b) / 2, _]`
`iload_3`	`[(a + b) / 2]`	`[a, b, a + b, (a + b) / 2, _]`
`iconst_2`	`[(a + b) / 2, 2]`	`[a, b, a + b, (a + b) / 2, _]`
`imul`	`[(a + b) / 2 * 2]`	`[a, b, a + b, (a + b) / 2, _]`
`istore 4`	`[]`	`[a, b, a + b, (a + b) / 2, (a + b) / 2 * 2]`
`return`

Here are the instructions that interact with the local variables. Like the iconst_* instructions, there are several shorthand instructions for iload i and istore i.

Opcode	Mnemonic	Operand Stack	Locals Array
`0x15` `i`	`iload` `i`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{locals[i]} \\\hline ... \\\hline \end{array}\end{array}\)
`0x36` `i`	`istore` `i`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`locals[i] = a`
`0x84` `i b`	`iinc` `i b`		`locals[i] += b`
`0x1a`	`iload_0`	Same as `iload` `0`
`0x1b`	`iload_1`	Same as `iload` `1`
`0x1c`	`iload_2`	Same as `iload` `2`
`0x1d`	`iload_3`	Same as `iload` `3`
`0x3b`	`istore_0`	Same as `istore` `0`
`0x3c`	`istore_1`	Same as `istore` `1`
`0x3d`	`istore_2`	Same as `istore` `2`
`0x3e`	`istore_3`	Same as `istore` `3`

Task 5. Implement the arithmetic instructions as above.

Note that i is an unsigned byte and b is a signed byte.

After implementing these instructions, run make test5; if your code is correct, the tester will output a success message.

Using The Constant Pool

You may have noticed that up until now, we haven’t dealt with any large numbers. This is because these (along with method references) are stored in yet another area called the constant pool.

Consider this sample program:

public class Test {
    public static int addLargeNumbers() {
        int x = 24242424;
        int y = 123456789;
        return x + y;
    }
}

This is the corresponding constant pool:

#1 = Methodref   #4.#7             // Test.addLargeNumbers:()I
#2 = Integer     24242424
#3 = Integer     123456789
#4 = Class       #8                // Test
#5 = Utf8        "addLargeNumbers"
#6 = Utf8        "()I"
#7 = NameAndType #5:#6             // addLargeNumbers:()I
#8 = Utf8        "Test"

Notice that in addition to integers, it stores other kinds of values (such as method references). We will use these later to resolve methods when they are called. To load a constant from the constant pool, programs use the ldc instruction, which specifies its index in the constant pool:

public static int addLargeNumbers():

ldc #2   // int 24242424
istore_0
ldc #3   // int 123456789
istore_1
iload_0
iload_1
iadd
ireturn

Opcode	Mnemonic	Operand Stack
`0x12` `b`	`ldc` `b`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{constant_pool[b - 1]} \\\hline ... \\\hline \end{array}\end{array}\)

Task 6. Implement ldc as above.

Note that b is an unsigned byte and 1-indexed, but class->constant_pool is 0-indexed.

After implementing this instruction, run make test6; if your code is correct, the tester will output a success message.

Jumps

The Java compiler transforms loops and if statements into various forms of “jumps”. Some of these are “conditional jumps” (i.e., they change the program counter if a condition is met, and otherwise continue to the next instruction). Note that the jump can go forward or backward in the code array relative to the jump instruction!

Opcode	Mnemonic	Operand Stack	Program Counter
`0x99` `b1 b2`	`ifeq` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`if (a == 0) pc += (b1 << 8) \| b2`
`0x9a` `b1 b2`	`ifne` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`if (a != 0) pc += (b1 << 8) \| b2`
`0x9b` `b1 b2`	`iflt` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`if (a < 0) pc += (b1 << 8) \| b2`
`0x9c` `b1 b2`	`ifge` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`if (a >= 0) pc += (b1 << 8) \| b2`
`0x9d` `b1 b2`	`ifgt` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`if (a > 0) pc += (b1 << 8) \| b2`
`0x9e` `b1 b2`	`ifle` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`if (a <= 0) pc += (b1 << 8) \| b2`

Opcode	Mnemonic	Operand Stack	Program Counter
`0x9f` `b1 b2`	`if_icmpeq` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\\\end{array}\)	`if (a == b) pc += (b1 << 8) \| b2`
`0xa0` `b1 b2`	`if_icmpne` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\\\end{array}\)	`if (a != b) pc += (b1 << 8) \| b2`
`0xa1` `b1 b2`	`if_icmplt` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\\\end{array}\)	`if (a < b) pc += (b1 << 8) \| b2`
`0xa2` `b1 b2`	`if_icmpge` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\\\end{array}\)	`if (a >= b) pc += (b1 << 8) \| b2`
`0xa3` `b1 b2`	`if_icmpgt` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\\\end{array}\)	`if (a > b) pc += (b1 << 8) \| b2`
`0xa4` `b1 b2`	`if_icmple` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\\\end{array}\)	`if (a <= b) pc += (b1 << 8) \| b2`

Opcode	Mnemonic	Program Counter
`0xa7` `b1 b2`	`goto` `b1 b2`	`pc += (b1 << 8) \| b2`

Task 7. Implement the jump instructions as above.

Note that the offsets for the jumps are signed values relative to the index of the jump instruction in the code array.

After implementing these instructions, run make test7; if your code is correct, the tester will output a success message.

Static Method Calls

The second to last feature we have to implement for our TeenyJVM is method calls and returns. These use the invokestatic and ireturn/return instructions, respectively.

Opcode	Mnemonic	Operand Stack	Effect
`0xac`	`ireturn`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`return a` from the current function
`0xb1`	`return`		`return` from the current function
`0xb8` `b1 b2`	`invokestatic` `b1 b2`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{argN} \\\hline \texttt{...} \\\hline \texttt{arg2} \\\hline \texttt{arg1} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{result} \\\hline ... \\\hline \end{array}\\\\\\\\\end{array}\)	Calls a method (see below)

Conceptually, each method call has two pieces of context: its locals array and its operand stack. When one method calls another, the following steps happen:

The caller resolves the provided unsigned constant pool index ((b1 << 8) | b2) into a method_t object using the provided find_method_from_index function.
The caller allocates and initializes the locals array of the callee with the arguments to the function as the first few entries. The arguments are popped from the operand stack of the caller. (Hint: Remember that you are transferring the arguments from a stack to a queue. What does this mean about the ordering?) You might find the get_number_of_parameters function helpful here.
The caller (recursively) executes the method_t object corresponding to the callee.
The callee executes and returns void or ireturns an int from its operand stack.
The caller pushes the return value (if there is one) onto its own operand stack.

Task 8. Implement the method call instructions as above.

After implementing these instructions, run make test8; if your code is correct, the tester will output a success message.

Integer Arrays

The last feature we are implementing for TeenyJVM is integer arrays, or int[].

First, here are some miscellaneous instructions to implement:

Opcode	Mnemonic	Operand Stack	Effect
`0x00`	`nop`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\end{array}\)	Does nothing
`0x59`	`dup`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline ... \\\hline \end{array} \\ \\ \end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{a} \\\hline \texttt{a} \\ \hline ... \\\hline \end{array}\\\end{array}\)	Duplicate the top operand stack value

The instructions that works with arrays are listed as follows:

Opcode	Mnemonic	Operand Stack	Effect
`0xbc` `b`	`newarray` `b`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{count} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{ref} \\ \hline ... \\\hline \end{array}\\\end{array}\)	Create a new array with `count` elements
`0xbe`	`arraylength`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{count} \\ \hline ... \\\hline \end{array}\\\end{array}\)	Get length of array
`0xb0`	`areturn`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{ref} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`return ref` from the current function
`0x4f`	`iastore`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{value}\\ \hline \texttt{index}\\\hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\\\\\end{array}\)	Store into an `int` array
`0x2e`	`iaload`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{index}\\\hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{value}\\ \hline ... \\\hline \end{array}\\\\\end{array}\)	Load `int` from an array

The C analogue to Java’s int is int32_t, so the analogue to int[] is int32_t *. Instead of being left uninitialized, arrays in Java are initialized to their type’s default values. A Java int[] array is thus simply a block of contiguous memory allocated in the heap. In C, we can interact with integer arrays through pointers, but Java does not have a concept of a pointer–it instead has references.

For simplification, we treat references as indexes into a fake memory array that we call heap_t. These reference indexes are also represented with int32_t. After allocating an int32_t * array, you can use the function heap_add in heap.h to store an allocated int array pointer and retrieve a reference. You can then use the function heap_get in heap.h to dereference a reference.
For example, newarray int would allocate an int32_t * array with an appropriate capacity, put it in heap_t to get a reference, and put the reference on the stack. iastore would dereference ref into array, and then set array[index] = value.

Arrays can also be stored in local variables, and they have their own associated instructions for doing so. They are very similar to the iload and istore instructions. Since our simplified references are just integer indices into a fake memory array, they are implemented in the exact same way as iload and istore.

Opcode	Mnemonic	Operand Stack	Locals Array
`0x19` `i`	`aload` `i`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{locals[i]} \\\hline ... \\\hline \end{array}\end{array}\)
`0x3a` `i`	`astore` `i`	\(\begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline \texttt{ref} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{\|c\|} \hline ... \\\hline \end{array}\\\\\end{array}\)	`locals[i] = ref`
`0x2a`	`aload_0`	Same as `aload` `0`
`0x2b`	`aload_1`	Same as `aload` `1`
`0x2c`	`aload_2`	Same as `aload` `2`
`0x2d`	`aload_3`	Same as `aload` `3`
`0x4b`	`astore_0`	Same as `astore` `0`
`0x4c`	`astore_1`	Same as `astore` `1`
`0x4d`	`astore_2`	Same as `astore` `2`
`0x4e`	`astore_3`	Same as `astore` `3`

Task 9. Implement the integer array instructions as above.

You may assume that b in newarray b is 10, which is the type code for int[] (just read and discard a byte). You must store the array in such a way that you are able to retrieve the array length. You may also assume that all indexing operation are valid. Java arrays are indexed by int values.

After implementing these instructions, run make test9; if your code is correct, the tester will output a success message.

Reflection

Take a moment to think about what you’ve just implemented. The tests are real Java programs. This is how all the programs you wrote in CS 2 were executed. TeenyJVM is (for the most part) implemented the same way as the real JVM.

Appendix: Switch Tips

You can’t declare a variable right after a case label. Instead, add braces or a semicolon:

switch (x) {
  case 5: {
    int five = x;
    break;
  }
  case 6:; // note the ';'
    int six = x;
    break;
}

enum variants (including, for instance, i_bipush) compile into integers that you can do arithmetic with! For example:

typedef enum {
  op_add,
  op_sub,
  op_mul,
  op_div
} arith_op_t;

printf("%d", op_div - op_sub); // prints "2"

case labels can accept a ranges, which is equivalent to using fallthrough:

switch (x) {
  case op_sub:
  case op_mul:
  case op_div:
    do_thing();
    break;
}
// equivalently
switch (x) {
  case op_sub ... op_div:
    do_thing();
    break;
}

If the switch‘ed value doen’t match any case’s, nothing happens by default. This is almost never what you want, because we should account for all possible inputs to our program. Add an explicit default, which can simply crash:

switch (x) {
  // ...
  default:
    fprintf(stderr, "Unhandled value! %d\n", x);
    exit(99);
    break;
}

When might you want the default block to do nothing? When might you want it to crash? Which is more correct for your interpreter loop? Which makes debugging easier?

Appendix: Debugging tests

make test provides a simple way to verify that your code passes all the tests. However, you will likely find it useful to run a single test at a time when debugging.

Here is an overview of how the tests are run:

The tests are all Java programs, represented as .java or precompiled .class files in the tests directory, for example tests/Arithmetic.java. Each .java file is compiled with javac into a .class file, for example tests/Arithmetic.class.
The “expected” output of each test is computed by running it through the standard java interpreter. This is stored in the file TEST_NAME-expected.txt, for example tests/Arithmetic-expected.txt.
The “actual” output of each test is computed by running it through your ./jvm executable. This is stored in the file TEST_NAME-actual.txt, for example tests/Arithmetic-actual.txt.
The two output files are compared to each other using the diff command, which reports any differences between the files.

To run a test, you can use the command make TEST_NAME-result. For example:

$ make Arithmetic-result
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c jvm.c -o jvm.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c read_class.c -o read_class.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer jvm.o read_class.o -o jvm
javac tests/Arithmetic.java
java -cp tests Arithmetic > tests/Arithmetic-expected.txt
./jvm tests/Arithmetic.class > tests/Arithmetic-actual.txt
diff -u tests/Arithmetic-expected.txt tests/Arithmetic-actual.txt ...
PASSED test Arithmetic.

If you don’t want to see the diff output, you can just build the TEST_NAME-actual.txt file:

$ make tests/Arithmetic-actual.txt
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c jvm.c -o jvm.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c read_class.c -o read_class.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer jvm.o read_class.o -o jvm
javac tests/Arithmetic.java
./jvm tests/Arithmetic.class > tests/Arithmetic-actual.txt

Note that the command to generate TEST_NAME-actual.txt redirects the output of ./jvm. This means that anything printed using printf() will be written to TEST_NAME-actual.txt instead of the terminal. If you want to print some debugging information, use fprintf(stderr, ...) instead.

You can also use the javap tool to show the constant pool and bytecode instructions in a .class file.

cs24-24fa Project 01: TeenyJVM

Introduction to Computing Systems (Fall 2024)

Introduction

Setup

The `labradoodle.caltech.edu` Machine

The Scenario

Compiling Java Manually

Virtual Machines

Your Main Task

The JVM

The Operand Stack

Instructions and Opcodes

The Program Counter

Printing 1 + 2

More Constant Instructions

Arithmetic and Bitwise Operations

Using Local Variables

Using The Constant Pool

Jumps

Static Method Calls

Integer Arrays

Reflection

Appendix: Switch Tips

Appendix: Debugging tests

Introduction

Setup

The labradoodle.caltech.edu Machine

The Scenario

Compiling Java Manually

Virtual Machines

Your Main Task

The JVM

The Operand Stack

Instructions and Opcodes

The Program Counter

Printing 1 + 2

More Constant Instructions

Arithmetic and Bitwise Operations

Using Local Variables

Using The Constant Pool

Jumps

Static Method Calls

Integer Arrays

Reflection

Appendix: Switch Tips

Appendix: Debugging tests

The `labradoodle.caltech.edu` Machine