This midterm is nocollab. You may not discuss anything about it with other students. You may get conceptual help from course staff at office hours. TAs will not look at your laptop or discuss specific code with you. Please do not ask them to.

The code required for this midterm is a smallish number of lines, but the conceptual understanding is very important. We are happy to help significantly, but we do not want to be pressured to look at code.

Introduction

Over the past few weeks, we have discussed representation, assembly, the stack, and buffer overflows. In this midterm, you will put all of this together to hack into one of our servers.

Please be advised: At the end of this midterm, an arbitrary subset of students will be asked to complete an oral interview explaining their code as part of the assignment. Students could be selected for any reason, including to ensure understanding or because course staff thinks the code is really cool.

These interviews are expected to last about 15 minutes, and your code will be available to you throughout. If you are called to interview, course staff will contact you to schedule a time, and your grade will be held until the interview is completed.

Setup

Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.

The Scenario

John is a recently graduated CS major who is working for Adventures Incorporated (tm). He was tasked with creating their website which he finally got running here:

http://adventure.com.puter.systems/index.html

Unfortunately, John didn’t go to Caltech or learn about systems; so, he used a small HTTP server that he found on the internet called tiny: https://gitlab.caltech.edu/john/adventures-tiny-server

Even more unfortunately, John was having problems getting the “save progress” feature working. So, he wrote some debugging code and accidentally left it in production…

Your Task

Your task for this midterm is to write several buffer overflow exploits in tiny to get various forms of your password on the server. By the end of this midterm, you will have arbitrary code execution on the remote machine. Please remember the honor code here–you wil be able to delete other students’ progress, but we’re expecting you to not do this. In an effort to make this as real as possible, we are going to rely heavily on the honor code for this one. Please don’t take advantage.

Orienting Yourself

The first thing you’ll have to do for this assignment is explore the webserver you’ll be hacking. tiny is set up to give a directory listing if you access a page that matches the structure of the files on the server. This is an standard web server feature that should usually be turned off, but not on John’s website! If you access the root (the website without index.html), it will list the root directory. Several directories are “forbidden” (namely, config and tokens), and these are the ones you will be hacking.

The webserver has been run from the /hackme/tiny directory. For example, if you run ls on the config directory, you’d get the following:

john@adventure.com.puter.systems:~$ ls /hackme/tiny/config

admin_token

All the other files in /hackme/tiny are also useful, but we’ll let you explore them yourself. In particular, you might find it useful to download the binary itself (consider using wget URL_FOR_TINY) which John helpfully called tiny and left in the root directory of the website. This binary should run and be gdb-able on labradoodle.

Throughout this midterm, we will refer to your “password”. All passwords for this midterm are 10 characters long. Click Here to fill in your username and password throughout this specification.

There are three “stages” to this midterm, and we strongly recommend you complete them in order:

Put a token in the /hackme/tiny/tokens folder with no contents and a name computed by sha256(“adamtoken=PASSWORD”).
Put a token in the /hackme/tiny/tokens folder with no contents and a name computed by sha256(ADMIN_PASSWORD + “token=PASSWORD”). Where ADMIN_PASSWORD is the contents of the /hackme/tiny/config/admin_token file.
Put a token in the /hackme/tiny/tokens folder with the contents computed by sha384(“YOUR_USERNAME”) and a name of PASSWORD.

The attacks necessary to get these results get increasingly difficult. We will, however, explain the exploits you should be writing for each stage. Determining what code is exploitable is not a major goal of this midterm; instead, we will focus on actually writing the attacks once an exploit has been identified.

HTTP Requests and Responses

When you type a URL into your browser, it communicates with the remote server via an “HTTP request” which looks like this example:

GET / HTTP/1.1
Host: www.caltech.edu
Connection: close

This request is made up of several sections:

The “status line” which is itelf made up of three pieces:
1. The “method” (or type) of the request. We will only be dealing with GET requests which are what run when you type a URL into Chrome.
2. The “URI” (or path) of the request. This tells the server what to serve back.
3. The protocol name. We will be using exclusively HTTP/1.1.
A list of “headers” that provide information to the server about how to serve the content. The HTTP specification requires a Host line, but tiny will ignore all headers anyway; so, there is no reason to send one.
An empty line that indicates the end of the headers.
Arbitrary data that is used as the “body” of the request. For our purposes, this is where the password will be read from.

As a raw string, the HTTP request above looks like:

b"GET / HTTP/1.1\r\nHost: www.caltech.edu\r\nConnection: close\r\n\r\n"

Notice the \r\n after the first line, the \r\n between headers, and the trailing \r\n\r\n before any data (of which this request has none). Also notice the b in front of the string; this indicates that it is a byte string. For our purposes, all you really need to know is (1) we should always use byte strings when sending over a connection, and (2) you can concacentate them like regular strings.

This midterm will require manipulating raw HTTP requests. So, we have drafted a little script in python3 for you which sends a request to www.caltech.edu and prints out the data sent back. Note that this script isn’t actually complete. You need to put the pieces of the request together in the request line.

example-request.py

import socket

METHOD = b"GET"
URL = b"/"
HEADERS = b"Host: www.caltech.edu"

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# By default, web browsers request unencrypted websites on port 80.
# We want to simulate a "default" request here.
client.connect(("caltech.edu", 80))
request = b"PUT TOGETHER THE PIECES HERE. DON'T FORGET TO USE \r\n's!"
print(request)
client.send(request)
response = client.recv(4096)
print(response.decode())

Task 0. Start by completing the example-request.py script (given above). Then, try running it on labradoodle to see what the output looks like, because you will be looking at a lot of output like it…

Note that if your output says “HTTP Version Not Supported”, you’ve done something wrong!

Uh Oh, Buffer Overflow…

Consider the following snippet of code which can be found in eat.c (part of the tiny webserver). Note that HEADERS_SIZE is a constant defined elsewhere.

eat.c

int eat_headers() {
	char garbage[HEADERS_SIZE];
	i = 0;
	garbage[0] = getc(stream);
	i++;
	garbage[1] = getc(stream);
	i++;
	garbage[2] = getc(stream);
	i++;
	garbage[3] = getc(stream);

	while (garbage[i - 3] != '\r' || garbage[i - 2] != '\n' ||
               garbage[i - 1] != '\r' || garbage[i]     != '\n') {
		i++;
		garbage[i] = getc(stream);
	}
	return 1;
}

As advertised, this function “eats” (reads and discards) the HTTP headers of a connection called stream. tiny is not written very well, and stream and i are both global variables. Unfortunately (or fortunately for us!), this code is written poorly and is susceptible to a buffer overflow attack!

Converting an Address Into Bytes

When writing exploit strings, it is often very important to convert between a numerical address and the corresponding string made up of raw bytes.

Hex Number:	`0xdeadbeef`
Raw String:	`b'\xef\xbe\xad\xde\x00\x00\x00\x00'`

Notably, the order of the bytes in the raw string is backwards, because we’re working on a little-endian machine. This can be very annoying to do by hand; luckily, python3 gives us a function that makes it easy:

def to_bytes(i, l=8):
    return int.to_bytes(i, length=l, byteorder='little')

We strongly recommend that any python scripts you write or use for this midterm are executed with python3 NOT python2!

From HTTP Request to Exploit

All three attacks you will write will involve preparing exploit strings of varying complexities. Usually, they will consist of two pieces: (1) padding bytes and (2) a return address. We will send these in the headers of the HTTP requests, because eat_headers has the bug we’re exploiting. All together, the core of the script you’ll be writing looks like this:

sploit.py

import socket

# Method should always be get
METHOD = b"GET"

# We don't care what URL we're grabbing; they all have the possibility of exploit...
URL = b"/"

# Figure out how many bytes we need to pad until we get to the return address
# on the stack. (Hint: It's not 10...)
N = 10
PADDING = b"\xff" * N

# Fill this in with the address you actually want instead of `0xdeadbeef`
ADDRESS = to_bytes(0xdeadbeef)

# The "exploit string" is what we send in as the headers
HEADERS = PADDING + ADDRESS

# The functions we call will often look in the request's data for a password.
# So, we send it here.
DATA = b"YOUR_PASSWORD"

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("adventure.com.puter.systems", 80))
request = b"PUT TOGETHER THE PIECES HERE. DON'T FORGET TO USE \r\n's!"
print(request)
client.send(request)
response = client.recv(4096)
print(response.decode())

Stage 1: Return to a function (20 points)

In this exploit, you will change a return address on the stack to make tiny execute a different function of your choice.

Task 1. Your goal in this first exploit is to put a token in the /hackme/tiny/tokens folder with no contents and a name computed by sha256(“adamtoken=PASSWORD”). Luckily, there is a function in the source code of tiny (utils.c, specifically) that does precisely this.

Somehow get the source file utils.c, and find a function that does what you want (Hint: It’s called update_stage1_grade). Then, download the tiny binary and figure out (1) where the function you want to call is located, and (2) how big the buffer you’re exploiting is by using a tool like cs24-dasm. Then, create an exploit string which overwrites the return pointer with the address of the function you want to call and send it to the server.

tiny will read the “body” of the request as your password; so, make sure to include the DATA variable at the very end of your request (after the \r\n\r\n).

The “padding” in your exploit may not be what it initially looks like it should be. When trying to figure out how much padding to use, we’d recommend looking at the first location on the stack that the code reads into.

If you’ve successfully made a correct exploit, the server will respond with a message indicating so.

After you think you’ve send it correctly, go to http://adventure.com.puter.systems/debug.html and enter your password to check that the system registered it.

Stage 1.5: Running Your Own `tiny` (0 points)

We will need to get some information about tiny as it executes; so, we’ll have to get our own instance of tiny running on labradoodle.

Making `tiny` Executable

In the last stage, you downloaded tiny, but if you tried to run it you might have gotten an error message (-bash: ./tiny: Permission denied). This is because we need to mark the file as “executable”. To do this, run chmod +x tiny. Now, you’ll be able to run tiny on your own. Note that tiny takes an argument which is the port to run it on. Choose (and remember!) some number larger than 9000 as this argument. However, to capture debugging information, we’ll need to use a debugger.

Using `gdb` to inspect `tiny` while it runs

We have a short tutorial for gdb which can be found here which we recommend you read before continuing. Here is an example gdb session during which we run tiny and determine what address the clientaddr variable in the accept_connection function is stored at:

blank@labradoodle:~$ gdb ./tiny

GNU gdb (GDB; openSUSE Leap 15.0) 8.3.1

...

>>> b accept_connection

Breakpoint 1 at 0x400fd7: file junk.c, line 69.

>>> run PORT

>>> p &clientaddr

$1 = (struct sockaddr_in *) ADDRESS

Why Bother?

In the next stage, we will need the address of the garbage buffer in eat_headers. The process to getting the address of this buffer is very similar to our example. You should follow approximately the same steps with the following exceptions:

Set a breakpoint on eat_headers instead of accept_connection
After running tiny in gdb by typing run PORT, it will hang, because it’s waiting for a connection. You can create a connection by opening a new terminal (don’t close the old one!) and running the command wget labradoodle.caltech.edu:PORT
After execution stops, you will be in the eat_headers function and ready to print out the address of the garbage buffer.

Stage 2: Return to your own code (45 points)

In the first stage, we caused execution to jump to an existing function and run it. This is already really powerful, but we can do better! What if, instead of jumping to code that exists, we jumped to our own code? In this exploit, you will cause tiny to jump to code of your own design which will call several existing functions to complete your goal.

Shellcode

At its core, code is just executable bytes at a memory location. Remember that eat_headers helpfully reads the bytes we send onto the stack. So, is there any reason we can’t execute bytes on the stack? In the first attack, we replaced the return address with the address of a particular function, but this time, we’ll replace it with the address where our code will be…on the stack. Just like you had to carefully count bytes to calculate where to put the address, this time you will need to do a similar count to figure out where you’re putting your bytes that represent code. This will be some kind of offset from garbage’s address as calculated in the previous stage.

The “code” that we write as an exploit is usually called “shellcode”; it’s named that way, because the goal of the shellcode is usually to start a shell. In stage 3, we will do exactly this!

(N.B. In the real world, a new protection (an “NX bit”) was put in place to avoid exactly what we’re about to do, but it’s turned off for the purposes of this midterm.)

Absolute Calls

You will likely find yourself trying to callq a particular address in this exploit. Unfortunately, if you try this naively, like so:

bad-call.s

callq *$0xdeadbeef

You get the following output:

blank@labradoodle:~$ as bad-call.s

Error: immediate operand illegal with absolute jump

That is, you cannot directly jump to an immediate address. Instead, you must put it in a register first, like this:

better-call.s

movq $0xdeadbeef, %rax
callq *%rax

Note that you should use cs24-dasm or gdb to get the actual bytes of the shellcode exploit you write after compiling it with as.

With a careful application of these concepts, you have enough to write this exploit!

Task 2. Put a token in the /hackme/tiny/tokens folder with no contents and a name computed by sha256(ADMIN_PASSWORD + “token=PASSWORD”). Where ADMIN_PASSWORD is the contents of the /hackme/tiny/config/admin_token file.

To do this, you will first find a function that gets the admin token, and then feed it to another function that takes the sha256 of its arguments. Once again, you should look in utils.c for the requisite functions. Then, once you’ve found the functions, write assembly to perform the correct actions, compile it with as, put the exploit together, and send it to the server.

Stage 3: Execute a shell command (25 points)

Finally, you will implement the holy grail. In this exploit, you will force tiny to run your shellcode to load and execute a shell with a bash script of your own design. In reality, this script could do anything you wanted. For the purposes of this midterm, it will create a file on the file system.

System Calls and the `syscall` Instruction

A system call is a request to the operating system (specifically, the kernel) to complete an operation for the user program. You’ve used many system calls without knowing it; all the file operations are, at their core, system calls, for example. We’ll talk much more about these later, but, for now, it suffices to know that if we want to execute a program, we can ask the OS to do it for us using the execve system call.

Because we’re writing shellcode, we need to know how to request a system call from assembly. To do this, one uses the syscall instruction which behaves very similarly to callq except that it takes arguments in different registers. Every syscall has a “number” that tells the OS which operation to complete. A full list can be found at https://filippo.io/linux-syscall-table/. For our purposes, we will only care about exit (which is #60) and execve (which is #59).

Use the following registers to complete a syscall:

Syscall #	Param 1	Param 2	Param 3	Param 4	Param 5	Param 6
`%rax`	`%rdi`	`%rsi`	`%rdx`	`%r10`	`%r8`	`%r9`

Then, to complete an exit(0) system call, we’d write the following x86-64 code:

movq $60, %rax  ; use the exit syscall
movq $0, %rdi   ; error code 0
syscall         ; make syscall

The `execve` System Call

To really understand what execve is doing, we must understand processes which we won’t get to until later. So, we’ll provide a working definition that will be enough to do the current assignment.

execve() executes the program pointed to by filename. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments. The execve system call has the following signature:

int execve(const char *filename, char *const argv[], char *const envp[]);

filename is the full path to the executable to run.
argv is a NULL terminated array of strings indicating what arguments to run filename with. Note that execve will expect the first argument to be a repeat of the path to the binary to execute)
envp is a NULL terminated array of strings indicating the “environment” to run the program in. For our purposes, we will always pass in {NULL} (i.e., an empty environment).

Notably, since we will be making a syscall to execve in assembly, we will have to manually set up all arguments (and their contents and pointers)! Getting those right will be the hardest part of this exploit!

Arrays on the Stack

To get the execve call right, you will need to set up the arguments which are all arrays of strings. The only place that you can put things in your exploit is on the stack. Where on the stack might be a convenient place to put the strings for this stage?

Shells and Shell Scripts

When you type into a terminal, there is a program (called a “shell”) which interprets what you write and executes the commands you write. (Incidentally, it does this using the execve syscall from above…) Specifically, the shell program most commonly used is called sh (or bash) and can be found at /bin/sh on most machines. This is the program that you will be execveing using your h4x0r skillz! In particular, since you are trying to put a token with particular contents and password, the shell script you will want to run will be:

/bin/sh -c 'echo -n "USERNAME" | sha384sum > /hackme/tiny/tokens/PASSWORD'

Note that /bin/sh is the name of the program and -c and the rest of the line are its two arguments. Do not include the quotes in the second argument in your exploit.

That is, your arguments array to execve should look like {"/bin/sh", "-c", "echo -n USERNAME | sha384sum > /hackme/tiny/tokens/PASSWORD", NULL}.

As you have likely gathered at this point, you could send whatever command you wanted to do whatever nasty things you wanted on a machine if you can execve /bin/sh!

Putting it all together

Task 3. Write shellcode that execve’s /bin/sh with the provided arguments (“shell script”). Then, force tiny to execute your shellcode by returning to it like before. Your shellcode should put a token in the /hackme/tiny/tokens folder with the contents computed by sha384(YOUR_USERNAME) and a name of PASSWORD.

You must upload your code on GitLab by the due date. In particular, we will redownload your repository’s stage1.py, stage2.py, and stage3.py and run them by themselves to verify your work for credit.

Your script must connect to adventure.com.puter.systems and a port between 10000 and 10025.
Your script should be able to run without any issues by itself.
- The script must be able to be ran without any arguments.
- No third-party imports or importing from other scripts.
If you choose to do the extra credit, it should be in a separate file and not stage3.py.

Extra Credit: Proof of Arbitrary Code Execution (+5 points)

Given the third task gives you a shell, you can now do arbitrary code execution. To prove that you understand what you’ve done, add a new link to the index.html page as well as a corresponding directory and index.html page for your link. Make sure to sign the page in some way using your access username.

cs24-23fa Midterm: Adventure

Introduction to Computing Systems (Fall 2023)

Introduction

Setup

The Scenario

Your Task

Orienting Yourself

HTTP Requests and Responses

Uh Oh, Buffer Overflow…

Converting an Address Into Bytes

From HTTP Request to Exploit

Stage 1: Return to a function (20 points)

Stage 1.5: Running Your Own `tiny` (0 points)

Making `tiny` Executable

Using `gdb` to inspect `tiny` while it runs

Why Bother?

Stage 2: Return to your own code (45 points)

Shellcode

Absolute Calls

Stage 3: Execute a shell command (25 points)

System Calls and the `syscall` Instruction

The `execve` System Call

Shells and Shell Scripts

Putting it all together

Extra Credit: Proof of Arbitrary Code Execution (+5 points)

Introduction

Setup

The Scenario

Your Task

Orienting Yourself

HTTP Requests and Responses

Uh Oh, Buffer Overflow…

Converting an Address Into Bytes

From HTTP Request to Exploit

Stage 1: Return to a function (20 points)

Stage 1.5: Running Your Own tiny (0 points)

Making tiny Executable

Using gdb to inspect tiny while it runs

Why Bother?

Stage 2: Return to your own code (45 points)

Shellcode

Absolute Calls

Stage 3: Execute a shell command (25 points)

System Calls and the syscall Instruction

The execve System Call

Shells and Shell Scripts

Putting it all together

Extra Credit: Proof of Arbitrary Code Execution (+5 points)

Stage 1.5: Running Your Own `tiny` (0 points)

Making `tiny` Executable

Using `gdb` to inspect `tiny` while it runs

System Calls and the `syscall` Instruction

The `execve` System Call