Introduction

Over the past few weeks, we have discussed processes, what a kernel is, caches, and virtual memory. In this project, you will put all of these ideas together into an exploit that effects most processors designed between 1995 and 2018.

Please be advised: At the end of this project, an arbitrary subset of students will be asked to complete an oral interview explaining their code as part of the assignment. Students could be selected for any reason, including to ensure understanding or because course staff thinks the code is really cool.

These interviews are expected to last about 15 minutes, and your code will be available to you throughout. If you are called to interview, course staff will contact you to schedule a time, and your grade will be held until the interview is completed.

Setup

Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.

The Scenario

In Adam’s office, there are two machines named meltdown1 and meltdown2. The kernel on these machines has been customized slightly in the following ways:

We have installed a kernel module that stores a five character password of uppercase letters in kernel memory. Your job in this project will be to recover this password from user space without root access to these machines! That is, you will break the isolation between kernel space and user space. To assist you in this goal a little bit, the kernel module also provides a special file /sys/kernel/kernel_data/address which contains a string representation of the address of the password.
We have given a parameter to the kernel to turn off a mitigation for the meltdown attack (called Page Table Isolation).

Normally, when you push to GitLab, a server called oobleck runs your code in a sandbox. Your repository for this project has been set up so that, when you push to gitlab, the machine that runs your code will be either meltdown1 or meltdown2 (these two machines should be considered interchangable). Specifically, when you push, we call make and then run ./bin/exploit directly on the meltdown machine. After your code runs, the machine verifies that you printed out the correct password and terminates with success or failure as a result. That is, the gitlab tests will pass if and only if your code outputs exactly the correct password and nothing else. Please note that the password chosen in the kernel will depend on your username; so, you will likely not have the same password as your friends do.

Game Plan

Implementing the meltdown attack can be very finicky; so, we will guide you through it step by step. Please make sure each stage is working as you go, because, otherwise, the final result will be nearly impossible to debug. We will go through the following stages:

First, you will explore the difference in timing between a cache hit and a cache miss.
Then, you will determine which page in an array of pages has already been accessed.
Next, you will write a side-channel attack to “recover” a string from user memory as a proof-of-concept.
Finally, you will put the pieces together into an attack on kernel memory.

Stage 1: Cache Timing

Ultimately, we will need to be able to differentiate between a miss and a hit. The best way to do this is to determine a window where we’re fairly confident that the time taken is less than a miss AND at least as high as a hit. Specifically, we should choose a threshold that might have false negatives (missing a hit that’s an outlier), but rarely if ever false positives (deciding that a miss is a hit).

To determine a reasonable threshold for later phases, you should write a program in cache_timing.c that determines the average miss time and the average hit time across approximately 100000 trials. Each trial should:

Allocate a page_t on the heap
Ensure that the page is not in cache (this is called “flushing the cache”)
Time two reads of that page. If everything has gone right, then the first read will be a miss, and the second read will be a hit. If the kernel context switched during your timing, it’s possible that these values will be off. You should discard any trial where the proported “hit time” is more than the proported “miss time” to compensate.
Update the running miss/max hit totals if necessary, and repeat.

We have written two functions for you which you will find useful:

void flush_cache_line(void *address)

This function ensures that the L1, L2, and L3 cache lines containing address are evicted.
uint64_t time_read(void *address)

This function measures the number of clock cycles it takes the machine to read address.

Task 1. Finish the implementation of this algorithm in cache_timing.c. Run ./bin/cache_timing several times to find a value between the two extremes which you will use in future tasks.

Stage 2: A Guessing Game

Consider the following situation:

We create an array of page_t’s.
We access exactly one page of the array.

Can we use a side channel attack to recover which page was accessed? Figuring out how to solve this toy problem is a large step on the way to implementing the meltdown attack. The key idea is if NONE of the pages are in the cache before we access one page, then there will be a difference in timing between accessing the one page we accessed and the rest. In particular, the accessed page will have an access time of at most our threshold from Stage 1!

The beginning of this scenario is implemented in index_guesser.c. The provided function do_access() accesses exactly one page of the provided array. Your job in this stage is to fill in the two interesting functions:

flush_all_pages(), which should loop through all the pages in the array and flush all of them from the cache.
guess_accessed_page(), which should check each page in the array to see if its first and second access times are both below your threshold. If both access times are below the threshold, it should return that index. If no such index is found, it should return 0. To make this function work later, you MUST not store the results of the time_read’s in variables. Put them directly in an if statement.

Once these are implemented correctly, your program should consistently output the same number.

Task 2. Finish the implementation of this algorithm in index_guesser.c. Run ./bin/index_guesser several times to ensure that it always outputs the same value.

Stage 3: Unprotected User Memory String

In this stage, we will “recover” a string in user space through a side-channel attack.

From this stage onwards, we will assume that the “secrets” that we’re trying to recover are (1) five characters long and (2) made up entirely of capital letters.

You should copy flush_all_pages() and guess_accessed_page() from the previous stage into recover_local_secret.c. Then make the minor changes to satisfy new requirement that we’re only searching for capital letters (see above). Since the processor sometimes loads “boundary” pages, make sure to flush all the pages from 'A' - 1 to 'Z' + 1.

Now implement do_access() so that it reads a character from the secret string and accesses the page at the index corresponding to the character’s value. Use access_secret(i) to get the ith character of the secret string. Then use this character as an index into the page_t array. For example, if the character is 'A' (which is equivalent to 65), index 65 of the page_t array should be accessed. Make sure to use force_read() to ensure the address is accessed.

In main(), your program should determine each character in the secret string, one character at a time, by doing the following:

Flush all the pages of the page_t array from the cache.
Call do_access() to access the page indexed by the next character in the secret string.
As in the previous stage, determine which page was accessed using guess_accessed_page(). There should be one page in the array that is a hit; find it and print out the index as a character using printf(). Note that stdout is buffered until '\n' is written, so if you want to see the characters as they are computed, you should call fflush(stdout); after printf(). Note that the password should be printed on a single line!

Task 3. Finish the implementation of this algorithm in recover_local_secret.c. Run ./bin/recover_local_secret to “recover” the password for this stage.

Stage 4: Protected User Memory String

The indirection in the previous part seems unnecessary: why bother using the page_t array at all when we can just access the secret characters directly? Well, what happens if the secret string is actually secret? In the next stage, the secret string will be stored in kernel memory, to which we do not (and should not) have access! In this stage, you will extend your attack from the previous stage to handle a segmentation fault that occurs when the secret string is accessed. (For now, we will cause a segmentation fault manually, but in the next stage, it will happen automatically when accessing kernel memory.) You will still be able to recover the secret using your hacking scillz!

Transfer your code from the previous stage into recover_protected_local_secret.c and make the following changes:

Add a segfault handler which jumps to label and define label at the location to jump to
The program must repeatedly try to determine each character until it succeeds. It’s possible that the processor will (correctly) cause a segfault before speculatively accessing the page_t array, in which case none of the pages will show a cache hit. It’s also possible that 'A' - 1 or 'Z' + 1 will be chosen. In all cases where the result is not a letter, ignore the trial and go again.
You must use cache_secret() to ensure that the secret is cached before reading it in do_access(). Otherwise, it will take too long to access it, so access_secret() will cause a segfault before it can speculatively access the page_t array.

Task 4. Finish the implementation of this algorithm in recover_protected_local_secret.c. Run ./bin/recover_protected_local_secret to “recover” the password for this stage.

Stage 5: Meltdown! (80 points)

Write the final stage in exploit.c, copying most of your code from the previous stage. The only change is that do_access() needs to call get_kernel_data_address() to get the address of the secret string (in kernel memory). Calling get_kernel_data_address() will also take care of pulling the secret string into the cache. If your attack is not working, try making all your helper functions (except the signal handler), static inline functions to speed them up. If your attack is still not working, try a “threshold” of between 180 and 200 clocks.

Task 5. Finish the implementation of your meltdown attack in exploit.c. Commit and push your code to gitlab to make it run on one of the meltdown machines. If it succeeds, you were able to recover the password from kernel space! Remember that the password should be printed on a single line!

cs24-23fa Project 04b: `meltdown`

Introduction to Computing Systems (Fall 2023)