cs24-25fa Project 02: asmgen

Introduction to Computing Systems (Fall 2025)

In this project, you will build the code generation part of a BASIC-to-ASM compiler.

TeenyBASIC (Our Source Language)

The BASIC programming language was invented in 1964 as an attempt to make programming more wide-spread across fields besides STEM. It is quite primitive and we’ve made it even simpler for this assignment. Nonetheless, the language you will compile (called “TeenyBASIC”) is a full programming language capable of complex computations.

Syntax and Semantics

Example Program

To give you an idea of what the language looks like, the following TeenyBASIC program prints out the first 1000 prime numbers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
LET I = 0
LET P = 2
WHILE I < 1000
    LET A = 1
    LET T = 2
    WHILE T * T < P + 1
        IF P / T * T = P
            LET A = 0
            LET T = P
        END IF
        LET T = T + 1
    END WHILE
    IF A = 1
        PRINT P
        LET I = I + 1
    END IF
    LET P = P + 1
END WHILE

The Compilation Process

There are three types of files involved in compiling TeenyBASIC into an executable. First, the programmer writes TeenyBASIC source code (.bas files in the progs directory). Then, the parser (which we have written) converts the source code into a tree of nodes (called an “abstract syntax tree”) which is consumed by your compiler. Your compiler will output ARM64 assembly code (.s files in the out directory). Finally, the LLVM assembler will convert your assembly files into binaries (in the bin directory).

You can visualize this process as the graph below:

graph LR;
    bas["BASIC Code (.bas)"]
    AST[Abstract Syntax Tree]
    asm["ASM Code (.s)"]
    bin[Executable]
    bas--Parser-->AST
    AST--Compiler-->asm
    asm--LLVM assembler-->bin

Writing the Compiler (70 points)

Reading the Existing Code and Development Strategy

In previous courses, it may have been possible for you to get away without reading and understanding the provided code. This is VERY not true for this course. You absolutely need to understand how the AST works and what the structs look like. We have provided an example function at the bottom of ast.c which prints out an entire AST. We strongly recommend that you read and understand this code as well as ast.h before beginning to code yourself. Your code will likely mirror the recursive structure of the print code; so, it will be extremely valuable to understand how it interacts with the nodes.

Most of the code you will be writing will be in compile.c. If you need to edit another file, we will tell you so in that particular stage.

The primary function you will be writing is compile_ast which returns a bool. This function returns false in the case of a compilation error.

Stage 1.1: Function Prologue and Epilogue in ARM64

For your convenience, the reference guide is linked here.

This stage is conceptually the hardest one in the assignment. Remember that unlike in other assembly languages which automatically store old pc values on function call in a confusing manner, ARM64 makes use of register x30, aliased as lr, to store the old pc. Because nested function calls will inherently overwrite the value stored in this register, handling this correctly is crucial to working assembly code. To fix this problem, the standard solution is to manually store/load this value along with the frame pointer by using the stack at the beginning and end of each function/program. Additionally, you will need to update the frame pointer after storing the old one to properly setup the stack frame, which will become important in Stage 4.

Stage 1.2: Printing Numbers

We have provided a function called print_int (defined in runtime/print_int.s) which prints a TeenyBASIC value. In this stage, you will have to compile the NUM, PRINT, and SEQUENCE node types (read the provided code to see what we mean by these). We recommend you start out simply (just to get a program that runs) which means you will likely eventually have to re-write these clauses. The easiest thing to do is stick the number directly in a register (in the NUM clause) and call print_int in the PRINT clause. The SEQUENCE clause is needed to handle multiple print statements, but you can implement it after the single-print tests are passing.

For now, you should use the constant loading variant of the ldr instruction. In the first optimization, we will change how constant loading is handled.

After you’ve completed this section, running make compile1 should compile and run several test programs. If the tests seem to “hang” or never stop running, your 1.1 implementation is likely the culprit. Make sure the tests output PASSED before moving on. A useful debugging strategy is to check the .s files your compiler produces (in the out directory).

Stage 2: Adding Numbers

In this stage, you will need to handle the '+' operator of the BINARY_OP clause. For now, return false for all other binary operators; you’ll fill them in in the next stage. To implement addition, you will need to choose some invariants for how your compiler outputs code. In particular, you can store computed values either in a register (we’d recommend x0) or on the stack. You will, however, need to store intermediary results on the stack (use str and ldr) because of potential recursion. At a high-level, the addition clause involves five steps:

Remember that the stack must be 16 byte aligned.

After you’ve completed this section, running make compile2 should compile and run several test programs. Make sure the tests output PASSED before moving on.

Stage 3: Multiplication, Subtraction, and Division of Numbers

This stage should be very similar to the previous one. Multiplication, subtraction, and division are nearly identical to addition with the exception of the actual instruction used. (You should use mul, sub, sdiv.) As usual, run make compile3 to make sure your code passes all the tests.

Stage 4: Reading and Writing Variables

Real compilers attempt to minimize the usage of the stack by choosing registers for each variable (this is called “register allocation”), but we won’t worry about this yet. Instead, we will allocate space on the stack for all 26 possible variables (this is called “spilling”) at the beginning of basic_main by setting up a stack frame. Then, we will assume the variables are in alphabetical order on the stack and load and store from those memory locations as necessary.

In particular, you will need to add lines to modify and restore sp–taking special care to allocate enough bytes on the stack for exactly 26 64-bit integer variables. Remember that sp must remain 16 byte aligned. After you write code to allocate and deallocate the stack frame, you should be able to fill in the VAR and LET clauses.

After you’re finished, don’t forget to run make compile4 to make sure your code passes all the tests.

Stage 5: Conditionals

Now, you will implement IF statements. An IF statement in TeenyBASIC looks like IF [expr1] [op] [expr2] [statement_true] ELSE [statement_false] END IF where [statement_true] should be executed when [expr1] [op] [expr2] evaluates to true, and [statement_false] should be executed otherwise. ELSE [statement_false] can be omitted, in which case nothing should happen if [expr1] [op] [expr2] evaluates to false. [op] will always be one of equality (=), less than (<), or greater than (>).

You will likely find yourself needing a distinct label or two for each IF statement. You may use a global variable as a counter to accomplish this. We recommend prefixing your labels with IF.

After you’re finished, don’t forget to run make compile5 to make sure your code passes all the tests.

Stage 6: Loops

Finally, you will implement WHILE loops. A WHILE statement looks like WHILE [expr1] [op] [expr2] [statements] END WHILE, where [statements] should be executed while [expr1] [op] [expr2] evaluates to true. Like for IF statements, you can use labels and jumps to translate WHILE loops. We recommend prefixing your labels with WHILE. The main difference is that you will need to jump backwards to run the loop again.

After you’re finished, don’t forget to run make compile6 to make sure your code passes all the tests.

Optimizing the Generated Assembly (30 points)

There are a ton of optimizations that one can make to the code we’re generating. To get a taste, we’d like you to implement the following three optimizations. The tests for this part will check the speed-up compared to the reference solution on tests that are particularly amenable to these optimizations.

Loading Constants

In stage 1, part of the task involved loading 64 bit integers into a register for later use. At the time, we used the ldr instruction, but now we want to make use of the mov and movk instructions to load the constants instead. While in some cases it might be possible to make use of a single mov xd, #imm16, its important your code meets the requirements to support 64 bit integers, which this instruction alone doesn’t. Unfortunately, ARM64 only allows loading of 16 bits at a time, meaning 4 instructions to load a 64 bit constant into a register.

As a hint, a load of a constant follows four instructions:

Bitwise operations will do a lot of the work for you in this step. It’s crucial that you load into the correct register using a zeroing move instruction for the first load, to clear all data currently in the register. For the other 0-3 loads, use a logical shift move instruction with no zeroing procedure so that previous loads don’t get overwritten.

While you implement this optimization, it’s important to note that performing operations with a 64 bit register, all 64 bits are used as part of the two’s-complement representation even if less than 64 bits were loaded in. The number of instructions needed to load each constant will change because of this, and you should use exactly the least number of mov’s needed to correctly represent each number.

Run make opt0 to test your implementation.

Strength Reduction

As discussed in class, strength reduction replaces multiplications by powers-of-two with much faster shifts. Specifically, we’d like you to check if a node is of the form [expr] * k where k is a power of 2 and replace the multiplication emitted by your compiler with a left-shift.

Run make opt1 to test your optimized assembly’s speed improvement over the reference solution.

Constant Folding

Sometimes, the AST (or part of it) is easier to just evaluate at compile time. In particular, if the operations we’re compiling are all constant arithmetic operations, we can just evaluate the entire sub-tree into a number. For this task, recursively find arithmetic operations that consist of only constants and evaluate them into a constant. Then, instead of emitting code to calculate the constant, emit the constant directly.

Run make opt2 to test your optimized assembly’s speed improvement over the reference solution.


Extra Credit: Register Allocation (+15 points)

Above, we recommended using the stack to store variables and intermediate values computed by expressions. However, reading and writing values on the stack requires memory accesses, which are much slower than accessing registers. Because registers are so fast, real-world compilers store as many values in registers as possible. Algorithms for assigning values to registers can get quite complicated, but you can implement a basic version of register allocation for extra credit.

You should store both variables and intermediate values in registers while there are registers available. Additional values should be “spilled” onto the stack (either to a pre-allocated stackframe or by extending the bottom of the stack). Make sure to consider the distinction between caller-save and callee-save registers. Which ones can store variables and which can store expression values?

Although it is possible to keep track of the stack height as you compile expressions, your compiler will probably need to know how many variables the program uses before it starts printing the assembly. You can “pre-traverse” the AST to find all the variables that are read/written.

Even this simple heurestic should give a 10% to 100% speed improvement on almost all the stage6 tests. We haven’t included automated tests, so let Ethan know if you have implemented this extra credit.