cs24-24fa Project 01: TeenyJVM

Introduction to Computing Systems (Fall 2024)

Introduction

Have you ever wondered what a class file that IntelliJ runs actually is? Or the process of actually running it? You’ll explore the class format and learn how Java executes programs by writing your own JVM in this project!

Setup

Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.

The labradoodle.caltech.edu Machine

As we’ve discussed in class, labradoodle.caltech.edu has an x86-64 architecture which means it knows how to execute x86-64 machine code. The standard executable format for Linux (which contains x86-64 machine code) is called ELF. clang outputs ELF files which can then be run directly from the command line. For this class, you will primarily be using the remote machine you set up in the pre-test which can be sshed to at labradoodle.caltech.edu. This machine is a standard x86-64 Ubuntu installation. For the duration of the course, we will generally assume you are using this machine. As such, our explanations may only apply to labradoodle.caltech.edu (and not Windows or OS X).

The Scenario

Recall that C code is compiled directly to machine code; it’s the job of a compiler like clang to take in text and output bytes. In contrast, Java is interpreted. Instead of being run directly on the machine, a program called the JVM (for “Java Virtual Machine”) interprets the program. In this project, we will examine exactly how compiled Java classes are executed by the JVM by writing our own.

Compiling Java Manually

In CS 2, IntelliJ did the compiling and running for you. Under the hood, it was compiling using a program called javac and executing with a program called java (the JVM) in a pipeline like this:

graph LR;
    java[Hello.java]
    cls[Hello.class]
    java--javac Hello.java-->cls

Submit Responses For Credit

This is all fine, but why does Java use a virtual machine at all?

Virtual Machines

Using a virtual machine allows Java to be more portable than machine code. You can run the same Java class on multiple systems as long as there is a JVM implementation for that system. Additionally, using a JVM allows Java to be safe. We can always check if an instruction is okay before executing it rather than just letting the hardware run wild.

Submit Responses For Credit

Your Main Task

In this project, you will implement a (simplified) JVM which can handle all the Java bytecode integer instructions, and thus, you will be able to run Java programs that compute over the integers on your TeenyJVM.

The JVM

The JVM is a stack machine (in contrast with a register machine like x86-64). This means that all the operations implicitly happen on an operand stack (made up of 32-bit signed integers) popping arguments and pushing the result. The callee is responsible for setting up the operand stack.

The Operand Stack

This is very abstract; so, we’ll look at an example. In the following Java Bytecode snippet, we load constants 1 and 2 and then add them. The operand stack is pictured after each instruction.

# Instruction Operand Stack
    \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{c} \hline \end{array}\end{array}\)
0 bipush 1 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 1 \\\hline \end{array}\end{array}\)
1 bipush 2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 2 \\\hline 1 \\\hline \end{array}\end{array}\)
2 iadd \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 3 \\\hline \end{array}\end{array}\)

Instructions and Opcodes

Class files do not actually contain the names of the instructions like we showed above; instead, they use pre-defined “opcodes” which are bytes that the JVM specification defines to correspond to the instruction “mnemonics”. These are defined in jvm.h. Please do not use magic numbers everywhere. We will start with these three instructions.

Opcode Mnemonic Effect
0x10 b bipush b Pushes b onto the operand stack.
0x60 iadd Pops the top two values from the operand stack, adds them, and pushes the result.
0xb1 return Exits the current function without returning any value

The Program Counter

Every method is represented in the class file as an array of bytes of code. For example, for the code snippet bipush 1; bipush 2; iadd, the array of bytes would be [0x10, 0x01, 0x10, 0x02, 0x60].

At all times during execution, the JVM needs to know what instruction it is at. The pc or “program counter” is an index into the array of code for the current method. During normal execution, the pc increments after every instruction is run. Notice that some instructions (like bipush) take up more than one byte in the code array. Furthermore, there are some “jump” or “goto” instructions that can change the pc in other ways, which you will implement later. Every method ends in a return instruction, so the program counter should never reach the end of the code array.

Printing 1 + 2

As you likely noticed in part 1, we were unable to actually see the results we were computing. In this part, you will make System.out.println work. Since we will not be implementing Objects, we will need to execute printing in a (very) incorrect way.

Opcode Mnemonic Effect
0xb2 b1 b2 getstatic b1 b2 Moves the program counter past b2 (i.e., increment it by three).
0xb6 b1 b2 invokevirtual b1 b2 Pops and prints the top value of the operand stack followed by a newline character. Then, moves the program counter past b2 (i.e., increments it by three).

More Constant Instructions

To reduce the bytecode size, the JVM has specific instructions to load common constants. When implementing these, you must avoid code duplication. We will take off a significant number of points if you repeat the same line repeatedly. (Hint: The opcodes are consecutive numbers, see switch tips)

Here is a listing of these instructions:

Opcode Mnemonic Operand Stack
0x02 iconst_m1 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline -1 \\\hline ... \\\hline \end{array}\end{array}\)
0x03 iconst_0 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 0 \\\hline ... \\\hline \end{array}\end{array}\)
0x04 iconst_1 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 1 \\\hline ... \\\hline \end{array}\end{array}\)
0x05 iconst_2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 2 \\\hline ... \\\hline \end{array}\end{array}\)
0x06 iconst_3 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 3 \\\hline ... \\\hline \end{array}\end{array}\)
0x07 iconst_4 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 4 \\\hline ... \\\hline \end{array}\end{array}\)
0x08 iconst_5 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline 5 \\\hline ... \\\hline \end{array}\end{array}\)
0x11 b1 b2 sipush b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{(b1 << 8) | b2} \\\hline ... \\\hline \end{array}\end{array}\)

Arithmetic and Bitwise Operations

Next, we will implement the integer operations. All operations pop their argument(s) from the operand stack and push the result.

Here is a listing of the instructions:

Opcode Mnemonic Operand Stack
0x60 iadd \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a + b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x64 isub \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a - b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x68 imul \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a * b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x6c idiv \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a / b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x70 irem \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a % b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x74 ineg \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{-a} \\\hline ... \\\hline \end{array}\end{array}\)
0x78 ishl \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a << b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x7a ishr \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a >> b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x7c iushr \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{((unsigned) a) >> b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x7e iand \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a & b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x80 ior \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a | b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)
0x82 ixor \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a ^ b} \\\hline ... \\\hline \end{array}\\\\\end{array}\)

Using Local Variables

In addition to the operand stack, every method call has an array of local variables which it can read from and write to. The locals array is 0-indexed, and the arguments to the method call are stored at the start of the locals array. The caller of a method is responsible for setting up the locals array. This means you do not need to allocate any locals arrays right now. We will come back to this later when we implement method calls.

Consider the following snippet of Java code which uses arguments and local variables:

1
2
3
4
5
public static void locals(int a, int b) {
    int c = a + b;
    int d = c / 2;
    int e = d * 2;
}

Here are the corresponding (annotated) Java bytecode instructions:

Instruction Operand stack Locals
  [] [a, b, _, _, _]
iload_0 [a] [a, b, _, _, _]
iload_1 [a, b] [a, b, _, _, _]
iadd [a + b] [a, b, _, _, _]
istore_2 [] [a, b, a + b, _, _]
iload_2 [a + b] [a, b, a + b, _, _]
iconst_2 [a + b, 2] [a, b, a + b, _, _]
idiv [(a + b) / 2] [a, b, a + b, _, _]
istore_3 [] [a, b, a + b, (a + b) / 2, _]
iload_3 [(a + b) / 2] [a, b, a + b, (a + b) / 2, _]
iconst_2 [(a + b) / 2, 2] [a, b, a + b, (a + b) / 2, _]
imul [(a + b) / 2 * 2] [a, b, a + b, (a + b) / 2, _]
istore 4 [] [a, b, a + b, (a + b) / 2, (a + b) / 2 * 2]
return    

Here are the instructions that interact with the local variables. Like the iconst_* instructions, there are several shorthand instructions for iload i and istore i.

Opcode Mnemonic Operand Stack Locals Array
0x15 i iload i \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{locals[i]} \\\hline ... \\\hline \end{array}\end{array}\)  
0x36 i istore i \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) locals[i] = a
0x84 i b iinc i b   locals[i] += b
0x1a iload_0 Same as iload 0  
0x1b iload_1 Same as iload 1  
0x1c iload_2 Same as iload 2  
0x1d iload_3 Same as iload 3  
0x3b istore_0 Same as istore 0  
0x3c istore_1 Same as istore 1  
0x3d istore_2 Same as istore 2  
0x3e istore_3 Same as istore 3  

Using The Constant Pool

You may have noticed that up until now, we haven’t dealt with any large numbers. This is because these (along with method references) are stored in yet another area called the constant pool.

Consider this sample program:

1
2
3
4
5
6
7
public class Test {
    public static int addLargeNumbers() {
        int x = 24242424;
        int y = 123456789;
        return x + y;
    }
}

This is the corresponding constant pool:

1
2
3
4
5
6
7
8
#1 = Methodref   #4.#7             // Test.addLargeNumbers:()I
#2 = Integer     24242424
#3 = Integer     123456789
#4 = Class       #8                // Test
#5 = Utf8        "addLargeNumbers"
#6 = Utf8        "()I"
#7 = NameAndType #5:#6             // addLargeNumbers:()I
#8 = Utf8        "Test"

Notice that in addition to integers, it stores other kinds of values (such as method references). We will use these later to resolve methods when they are called. To load a constant from the constant pool, programs use the ldc instruction, which specifies its index in the constant pool:

1
2
3
4
5
6
7
8
9
10
public static int addLargeNumbers():

ldc #2   // int 24242424
istore_0
ldc #3   // int 123456789
istore_1
iload_0
iload_1
iadd
ireturn
Opcode Mnemonic Operand Stack
0x12 b ldc b \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{constant_pool[b - 1]} \\\hline ... \\\hline \end{array}\end{array}\)

Jumps

The Java compiler transforms loops and if statements into various forms of “jumps”. Some of these are “conditional jumps” (i.e., they change the program counter if a condition is met, and otherwise continue to the next instruction). Note that the jump can go forward or backward in the code array relative to the jump instruction!

Opcode Mnemonic Operand Stack Program Counter
0x99 b1 b2 ifeq b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) if (a == 0) pc += (b1 << 8) | b2
0x9a b1 b2 ifne b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) if (a != 0) pc += (b1 << 8) | b2
0x9b b1 b2 iflt b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) if (a < 0) pc += (b1 << 8) | b2
0x9c b1 b2 ifge b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) if (a >= 0) pc += (b1 << 8) | b2
0x9d b1 b2 ifgt b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) if (a > 0) pc += (b1 << 8) | b2
0x9e b1 b2 ifle b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) if (a <= 0) pc += (b1 << 8) | b2
Opcode Mnemonic Operand Stack Program Counter
0x9f b1 b2 if_icmpeq b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) if (a == b) pc += (b1 << 8) | b2
0xa0 b1 b2 if_icmpne b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) if (a != b) pc += (b1 << 8) | b2
0xa1 b1 b2 if_icmplt b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) if (a < b) pc += (b1 << 8) | b2
0xa2 b1 b2 if_icmpge b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) if (a >= b) pc += (b1 << 8) | b2
0xa3 b1 b2 if_icmpgt b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) if (a > b) pc += (b1 << 8) | b2
0xa4 b1 b2 if_icmple b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{b} \\\hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\end{array}\) if (a <= b) pc += (b1 << 8) | b2
Opcode Mnemonic Program Counter
0xa7 b1 b2 goto b1 b2 pc += (b1 << 8) | b2

Static Method Calls

The second to last feature we have to implement for our TeenyJVM is method calls and returns. These use the invokestatic and ireturn/return instructions, respectively.

Opcode Mnemonic Operand Stack Effect
0xac ireturn \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) return a from the current function
0xb1 return   return from the current function
0xb8 b1 b2 invokestatic b1 b2 \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{argN} \\\hline \texttt{...} \\\hline \texttt{arg2} \\\hline \texttt{arg1} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{result} \\\hline ... \\\hline \end{array}\\\\\\\\\end{array}\) Calls a method (see below)

Conceptually, each method call has two pieces of context: its locals array and its operand stack. When one method calls another, the following steps happen:

  1. The caller resolves the provided unsigned constant pool index ((b1 << 8) | b2) into a method_t object using the provided find_method_from_index function.
  2. The caller allocates and initializes the locals array of the callee with the arguments to the function as the first few entries. The arguments are popped from the operand stack of the caller. (Hint: Remember that you are transferring the arguments from a stack to a queue. What does this mean about the ordering?) You might find the get_number_of_parameters function helpful here.
  3. The caller (recursively) executes the method_t object corresponding to the callee.
  4. The callee executes and returns void or ireturns an int from its operand stack.
  5. The caller pushes the return value (if there is one) onto its own operand stack.

Integer Arrays

The last feature we are implementing for TeenyJVM is integer arrays, or int[].

First, here are some miscellaneous instructions to implement:

Opcode Mnemonic Operand Stack Effect
0x00 nop \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\end{array}\) Does nothing
0x59 dup \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline ... \\\hline \end{array} \\ \\ \end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{a} \\\hline \texttt{a} \\ \hline ... \\\hline \end{array}\\\end{array}\) Duplicate the top operand stack value

The instructions that works with arrays are listed as follows:

Opcode Mnemonic Operand Stack Effect
0xbc b newarray b \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{count} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\ \hline ... \\\hline \end{array}\\\end{array}\) Create a new array with count elements
0xbe arraylength \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{count} \\ \hline ... \\\hline \end{array}\\\end{array}\) Get length of array
0xb0 areturn \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) return ref from the current function
0x4f iastore \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{value}\\ \hline \texttt{index}\\\hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\\\\\end{array}\) Store into an int array
0x2e iaload \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{index}\\\hline \texttt{ref} \\ \hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{value}\\ \hline ... \\\hline \end{array}\\\\\end{array}\) Load int from an array

The C analogue to Java’s int is int32_t, so the analogue to int[] is int32_t *. Instead of being left uninitialized, arrays in Java are initialized to their type’s default values. A Java int[] array is thus simply a block of contiguous memory allocated in the heap. In C, we can interact with integer arrays through pointers, but Java does not have a concept of a pointer–it instead has references.

Arrays can also be stored in local variables, and they have their own associated instructions for doing so. They are very similar to the iload and istore instructions. Since our simplified references are just integer indices into a fake memory array, they are implemented in the exact same way as iload and istore.

Opcode Mnemonic Operand Stack Locals Array
0x19 i aload i \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{locals[i]} \\\hline ... \\\hline \end{array}\end{array}\)  
0x3a i astore i \(\begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline \texttt{ref} \\\hline ... \\\hline \end{array}\end{array} \longmapsto \begin{array}{c}\uparrow \downarrow \\\begin{array}{|c|} \hline ... \\\hline \end{array}\\\\\end{array}\) locals[i] = ref
0x2a aload_0 Same as aload 0  
0x2b aload_1 Same as aload 1  
0x2c aload_2 Same as aload 2  
0x2d aload_3 Same as aload 3  
0x4b astore_0 Same as astore 0  
0x4c astore_1 Same as astore 1  
0x4d astore_2 Same as astore 2  
0x4e astore_3 Same as astore 3  

Reflection

Take a moment to think about what you’ve just implemented. The tests are real Java programs. This is how all the programs you wrote in CS 2 were executed. TeenyJVM is (for the most part) implemented the same way as the real JVM.


Appendix: Switch Tips

1
2
3
4
5
6
7
8
9
switch (x) {
  case 5: {
    int five = x;
    break;
  }
  case 6:; // note the ';'
    int six = x;
    break;
}
1
2
3
4
5
6
7
8
typedef enum {
  op_add,
  op_sub,
  op_mul,
  op_div
} arith_op_t;

printf("%d", op_div - op_sub); // prints "2"
1
2
3
4
5
6
7
8
9
10
11
12
13
switch (x) {
  case op_sub:
  case op_mul:
  case op_div:
    do_thing();
    break;
}
// equivalently
switch (x) {
  case op_sub ... op_div:
    do_thing();
    break;
}
1
2
3
4
5
6
7
switch (x) {
  // ...
  default:
    fprintf(stderr, "Unhandled value! %d\n", x);
    exit(99);
    break;
}

When might you want the default block to do nothing? When might you want it to crash? Which is more correct for your interpreter loop? Which makes debugging easier?

Appendix: Debugging tests

make test provides a simple way to verify that your code passes all the tests. However, you will likely find it useful to run a single test at a time when debugging.

Here is an overview of how the tests are run:

To run a test, you can use the command make TEST_NAME-result. For example:

1
2
3
4
5
6
7
8
9
$ make Arithmetic-result
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c jvm.c -o jvm.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c read_class.c -o read_class.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer jvm.o read_class.o -o jvm
javac tests/Arithmetic.java
java -cp tests Arithmetic > tests/Arithmetic-expected.txt
./jvm tests/Arithmetic.class > tests/Arithmetic-actual.txt
diff -u tests/Arithmetic-expected.txt tests/Arithmetic-actual.txt ...
PASSED test Arithmetic.

If you don’t want to see the diff output, you can just build the TEST_NAME-actual.txt file:

1
2
3
4
5
6
$ make tests/Arithmetic-actual.txt
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c jvm.c -o jvm.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer -c read_class.c -o read_class.o
clang-with-asan -Wall -Wextra -Werror -fno-sanitize=integer jvm.o read_class.o -o jvm
javac tests/Arithmetic.java
./jvm tests/Arithmetic.class > tests/Arithmetic-actual.txt

Note that the command to generate TEST_NAME-actual.txt redirects the output of ./jvm. This means that anything printed using printf() will be written to TEST_NAME-actual.txt instead of the terminal. If you want to print some debugging information, use fprintf(stderr, ...) instead.

You can also use the javap tool to show the constant pool and bytecode instructions in a .class file.