Introduction

The main goal of this project is to orient you to what CS 24 will be like. It will also cover representation, use number bases, and review pointers.

The open-ended questions in this document are part of your grade on this project. Make sure to answer them and submit using the button at the bottom!!! You may submit as many times as you want. Your responses should be automatically saved locally if you click out of the box.

The lint test is also part of your grade. If you are not passing the lint test, you will lose at least 10 points on the project.

Setup

First, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code. Then, you should follow the

ENVIRONMENT SETUP INSTRUCTIONS

to get ready for the course. These shouldn’t take that long, but they’re extremely important.

The Scenario

Adam accidentally deleted the contents of their USB drive (oh noes!). This USB drive was particularly important, because Adam painstakingly created a unique welcome PDF for every student. He even went to the trouble of password protecting them; so, that nobody could read someone else’s message. Unfortunately, he lost the passwords too, because they were stored on the USB drive as well.

Game Plan

You will recover your CS 24 “welcome PDF” and the password to open it from an image of Adam’s USB drive step-by-step. There might also be other hidden goodies that were on the drive that are worth exploring…

Part 1: Constructing and Printing A Directory Tree Data Structure

In this course, we will often find ourselves working with recursive tree structures with multiple “types” of nodes. One example of such a tree is the directory tree on your file system. A node is either a file node or a directory node. File nodes contain a name, a file size, and a byte array of contents. Directory nodes contain a name and an array of children. Notably, the name is common to both types of nodes; so, it should appear in the “super class”. Unfortunately, C doesn’t have objects; so, we have to simulate them using a “tag” to keep track of what type the node is. These tags are usually implement as enums which are just a way of specifying that a variable must be one of a list of choices:

Node Type

typedef enum {
    FILE_TYPE,
    DIRECTORY_TYPE
} node_type_t;

Thus, our node_t struct will consist of a type and a name:

Node struct

typedef struct {
    node_type_t type;
    char *name;
} node_t;

The interesting part is how the “subclasses” are implemented. We “embed” the root struct as the first element of the subclasses. This way, they will both start with the same members. So, if we are given a node_t, we can “extend” it to the right type by casting it. See the init functions in directory_tree.c or free_directory_tree for examples of what that looks like.

File Node struct

typedef struct {
    node_t base;
    size_t size;
    char *contents;
} file_node_t;

Directory Node struct

typedef struct {
    node_t base;
    size_t num_children;
    node_t **children;
} directory_node_t;

We have provided the following functions to help initialize and free tree nodes:

node_t *init_file_node(char *name, size_t size, char *contents)

Allocates and initializes a file node.
node_t *init_directory_node(char *name)

Allocates and initializes an empty directory node. Copies the name; so, the caller is responsible for freeing it.
void free_directory_tree(node_t *node)

Frees all memory associated with the entire tree rooted at node.

This function is very similar in structure to the print_directory_tree and create_directory_tree functions below that you will be writing.

Your first task is to implement the functions to manipulate, print, and create a directory tree:

void add_child_directory_tree(directory_node_t *dnode, node_t *child)

Adds child to dnode’s children array. This function should use realloc to get more space. It should also ensure that all nodes in dnode’s children array are in alphabetical order.
void print_directory_tree(node_t *node)
Pretty prints the directory tree rooted at node subject to the following:
- The root directory should be printed as “ROOT” (if you pass NULL to init_directory_node, the starter code does this)
- Nodes at level \(n\) should be preceded with \(4n\) spaces (not tabs)
- Nodes at the same level should be indented the same amount
- Nodes at the same level should appear in alphabetical order
For example, a directory tree with the following nodes
a/b/c/c

a/b/c/a

a/b/c/e
a/b/c/b

a/b/c/d

a/b/d/d
a/b/d/a

a/b/d/b

a/b/d/c
would be displayed as
```
ROOT
    a
        b
            c
                a
                b
                c
                d
                e
            d
                a
                b
                c
                d
```
void create_directory_tree(node_t *node)

Creates the directory structure corresponding to node on the file system (including the contents of the files). You will want to look into mkdir (use 0777 for permissions), fopen, fwrite, and fclose. You should include the ROOT directory at the top in the files you create.

Task 1. Implement the three functions above as specified. Then, make the test target, and our tests for task 1 will run.

Part 2: Recovering a File System

In this part, you will be working with an image of Adam’s USB drive.

Checking the Disk Image

First, download usb.dmg from the link above. Before you start trying to recover files, you should check that the files are actually deleted. To do this, you’ll have to mount the image as a disk. On OS X, you can just double-click it. On Linux, you should use mount. On Windows, you can see it using 7zip or you can download this program to actually mount it as a drive. Regardless of machine, you should notice that there aren’t any useful files on the drive. So, we’ll have to recover them the hard way!

The FAT16 File System

Your computer organizes the files on your machine in a particular format called a “file system”. There are many formats (ntfs on windows and ext3 on linux are some of the most common), but one of the older formats which is used for cross-compatibility is called FAT (which stands for “File Allocation Table”). Adam’s USB drive used FAT16 because of the relatively small size. In this section, we will describe the basics of the FAT16 file system to give you enough information to recover Adam’s files.

When a file is deleted on a FAT16 file system, the contents are actually kept in place! All that changes is the first character of the filename which is set to 0xe5. This means we can recover the files even though we can’t see them in finder/explorer!

The image begins with some special sections as pictured below: fat-layout

Since we’re not trying to boot from the disk image, we can ignore the first section. Since the files are deleted, we don’t care what the system thinks is allocated. So, the only two sections we care about in this “preamble” are: (1) the BPB and (2) the “root directory entries”. The BPB (represented by as a bios_parameter_block_t in fat16.h) stores a number of important parameters about the data that follows. For now, all you have to do is fread the bytes into a struct, and we will use the information contained inside later. The master boot record has size const size_t MASTER_BOOT_RECORD_SIZE = 0x20B as defined in recover.c.

To get around the file, you will need to use the fseek function which takes an origin as the third argument SEEK_SET indicates the offset is relative to the beginning of the file and SEEK_CUR indicates the offset is relative to where the file currently points; you may want to use both of these during this assignment.

Task 2. In recover.c, the starter code opens up a FILE * called disk. Internally, FILEs store the current offset in the file (which you can get with ftell) and fread, fseek, and fwrite change this pointer. Your job is to skip past the master boot record (by fseeking 0x20B bytes from the start), and fread the bios parameter block into the variable bpb. Note that bpb is NOT a pointer. Make sure the types match up in your call to fread. Then, skip past the padding and the file allocation tables directly to the beginning of the root directory entries block.

Immediately after the File Allocation Tables, is an “array” of 512 directory_entry_t’s which is where you should start your analysis. These directory_entry_t’s all correspond to some kind of file or directory, but we are only concerned with ones that aren’t hidden. We have provided a utility function bool is_hidden(directory_entry_t d) which determines if a proposed directory_entry_t is hidden or not.

Notably, all directory_entry_t’s have a “cluster number” which is basically a pointer to a section on disk to get either the file contents or the files in the directory. You can get an offset (in bytes) from the beginning of the file by calling our utility function get_offset_from_cluster(d.first_cluster, bpb) which does a short calculation to determine where to go. Think of this as the ->next in a linked list.

The next several steps are to write the follow function, which recursively follows all non-hidden directories and constructs node_t’s out of them.

void follow(FILE *disk, directory_node_t *node, bios_parameter_block_t bpb)

Pre-Condition

The FILE * disk points exactly to the directory_entry_t intended to be processed next.

For each file, if the file read is not hidden…

…and is a file: read file_size bytes from the next location into a char *, then make a file_node_t * and attach the parent node to it.
…and is a directory: make a directory_node_t *, attach the parent node to it, and recursively call follow on the next location.

The loop should read and process directory_entry_t’s from disk until it finds one with a filename that starts with \0.

You will find the is_directory function useful in determining if an entry is a file or directory. In both cases, when we say “next location”, we mean the one pointed to by get_offset_from_cluster. In both cases, don’t forget to reset the location of the file to the next directory after you are done reading from the next location. Additionally, when you initialize the file/directory name, make sure to call the get_file_name function we’ve provided, because FAT16 stores file names weirdly.

Task 3. Implement follow.

Task 4. make recovery and run your code. The contents of the disk will be spilled into the ROOT folder. Find the directory containing your PDF and PASSWORD file. Open the PDF using the password you recovered, and follow the instructions inside to pass the gitlab tests. If you added the course late, there may not be a specific PDF labelled for you. If that is the case for you, contact Adam via Campuswire or E-mail.

cs24-20fa Project 00: `disk`

Introduction to Computing Systems (Fall 2020)