cs24-23fa Project 05: passwd

Introduction to Computing Systems (Fall 2023)

Introduction

Concurrent programming is hard to get right. In this project, you will write a concurrent data structure and recover passwords from their hashes in a parallel program.

Setup

Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code.

The Scenario

The sysadmins lost all the password to the CMS clusters, and they need your help to recover them. Each student has been given a subset of these “CMS hashes” (which can be found at https://grinch.caltech.edu/etcshadow) to recover. Luckily, the passwords are all of the same form: a word with exactly one number inserted somewhere. Unfortunately, there are so many dictionary words, that it won’t be possible to just write a single-threaded password cracker. Instead, you’ll have to use your concurrency skills to recover these passwords! (More on this later.)

Warning 1

Warning 2

Warning 3

Google Compute Engine

Because this project will involve using many processors all at the same time, we can’t just have everyone use labradoodle simultaneously. So, we’ve secured google compute credits for you to use which will allow you to have your own server that is uncontested. You can redeem the credits using this link:

https://gcp.secure.force.com/GCPEDU?cid=lKJ%2BS05CXbYsmZYJ%2BczWf7IMze6pYD0A2d8H1OSGbirBFwIRd8ifT5Owl5%2BNNNzL/

The link will ask you to enter your name and e-mail. Then, it will send you an e-mail verification. Finally, it will allow you to redeem the coupon.

Creating Your VM

Once you’ve redeemed the coupon, go to

https://console.cloud.google.com/projectselector/compute/instances

Then, choose “Create a Project” (which is all the way at the bottom). It will take a minute for GCE to create your project, but once it’s done, create a VM instance by clicking “Create”.

You will need to change two settings. First change the “Machine type” to e2-standard-16 which will give you 16 cores to work with. Second, click on “Management, security, disks, networking, sole tenancy” and choose the “Security” Tab. You will need to enter your SSH key which we generated for you when you first connected to labradoodle. On labradoodle, run the command cat ~/.ssh/id_rsa.pub and copy the output into the big box labelled “SSH”. Take note of the username that appears on the left–that will be the user you use to connect to your remote machine.

SSHing to Your VM

Once the VM is created, you should see an IP address on the right side of the screen under “External IP”. This is the IP address you will use to connect to your machine. In a terminal on labradoodle, run the command ssh USER@IP where USER is the username from the previous step and IP is the external IP listed on the GCE website. If all goes well, you should be connected to your remote machine. Once you’ve connected, run sudo apt install -y git clang make to install the necessary software for the project.

Working with Two Machines

We recommend you WRITE your code on labradoodle as usual, and only TEST on your GCE machine. You will probably find it easiest to just use scp -r <code directory> USER@IP: (Note the trailing colon–this is important!) to get your code to the remote machine (from labradoodle). This will avoid you having to generate yet another key to use with the new machine.

Part 0: Writing a Queue (50 points)

Ultimately, for our threadpool to work, it will need an underlying (thread-safe) queue which we can use to queue up tasks to be completed by the threads in the threadpool. In this part, you will write a linked-list-based FIFO queue. A queue implementation that doesn’t use a linked list will not receive any credit!

Part 1: Writing a Thread Pool (15 points)

Creating threads is relatively expensive, and only so many of them can run at once (i.e., one per processor core). So, to avoid wasting the time instantiating one thread per task, we can use a thread pool which allocates tasks to threads that are part of the “pool”. Then, these threads get “reused” when they finish their tasks.

You should create your pthreads in thread_pool_init. Then, these threads (called “workers”) should wait until a queue that is shared across all of them has work to be done. Once there is work, one worker should wake up, complete that work, and wait for more work.

Part 2: Recovering Passwords (30 points)

Passwords are not stored as plaintext by your computer. Instead, when you log into a machine, it hashes the password and compares the hash to a stored hash in a file called /etc/shadow. If an attacker can get access to the /etc/shadow file, they can brute force the password by figuring out what input hashes to the required result. That is exactly what you’ll do in this step. You will write a client for your threadpool that brute forces passwords from their corresponding hashes. Each student has a different set of passwords to reverse which can be found here:

https://grinch.caltech.edu/etcshadow

All passwords are a word in the DICTIONARY augmented with a single digit (0-9) somewhere in the word (which could be before or after it).

For testing purposes, here is a hash of the password q6uests:

$6$sI0nXQzT/Pg.zJae$MCYdZmuv89Q6Oij.bXWArQOXcsDvOrN.PdiB8Za6lXtaajwdb6GkgXGy3i/fTYogJT6kiMWb9BHBfnB3HTme90

We have provided some starter code for this section in src/password_cracker.c. Specifically, you will find hashes_match to be useful to compare a word with a hash, and we have written the annoying parsing code for the password hashes. Each “task” that your threadpool executes should consist of trying a single dictionary word (and its variants) on all possible hashes. You should print any recovered passwords to standard out.

We have provided you with a simple “prime printing” client in tests/prime_printer.c which you might consider referencing when writing your password cracker.