Introduction
The main goal of this pre-test is to orient you to what CS 24 will be like. It will also cover representation, use number bases, and review pointers. Despite being called “pre-test”, it is not easy, and you should expect to spend a reasonable amount of time on it.
In Fall 2020, the median amount of time spent on this project by students was 8 hours.
The lint
test is part of your grade. If you are not passing the lint test, you will lose at least 10
points on the project.
Please be advised: At the end of this pretest, an arbitrary subset of students will be asked to complete an oral interview explaining their code as part of the assignment. Students could be selected for any reason, including to ensure understanding or because course staff thinks the code is really cool.
These interviews are expected to last about 15 minutes, and your code will be available to you throughout. If you are called to interview, course staff will contact you to schedule a time, and your grade will be held until the interview is completed.
Setup
First, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code. Then, you should follow the
ENVIRONMENT SETUP INSTRUCTIONS
to get ready for the course. These shouldn’t take that long, but they’re extremely important.
The Scenario
Adam accidentally deleted the contents of their USB drive (oh noes!). This USB drive was particularly important, because Adam painstakingly created a unique welcome PDF for every student. They even went to the trouble of password protecting them so that nobody could read someone else’s message. Unfortunately, they lost the passwords too, because they were stored on the USB drive as well.
Game Plan
You will recover your CS 24 “welcome PDF” and the password to open it from an image of Adam’s USB drive step-by-step. There might also be other hidden goodies that were on the drive that are worth exploring…
Part 1: Constructing and Printing A Directory Tree Data Structure
In this course, we will often find ourselves working with recursive tree structures with multiple “types” of nodes. One example of such a tree is the directory tree on your file system.
A node is either a file node or a directory node. File nodes contain a name, a file size, and a byte array of contents. Directory nodes contain a name and an array of children. Notably,
the name is common to both types of nodes; so, it should appear in the “super class”. Unfortunately, C doesn’t have objects; so, we have to simulate them using a “tag” to keep track of
what type the node is. These tags are usually implemented as enum
s which are just a way of specifying that a variable must be one of a list of choices:
Node Type
1
2
3
4
typedef enum {
FILE_TYPE,
DIRECTORY_TYPE
} node_type_t;
Thus, our node_t
struct will consist of a type and a name:
Node
struct
1
2
3
4
typedef struct {
node_type_t type;
char *name;
} node_t;
The interesting part is how the “subclasses” are implemented. We “embed” the root struct as the first element of the subclasses. This way, they will both start with the same members. So, if we
are given a node_t
, we can “extend” it to the right type by casting it. See the init
functions in directory_tree.c
or free_directory_tree
for examples of what that looks like.
File Node
struct
1
2
3
4
5
typedef struct {
node_t base;
size_t size;
char *contents;
} file_node_t;
Directory Node
struct
1
2
3
4
5
typedef struct {
node_t base;
size_t num_children;
node_t **children;
} directory_node_t;
We have provided the following functions to help initialize and free tree nodes:
-
node_t *init_file_node(char *name, size_t size, char *contents)
Allocates and initializes a file node. The resulting node “owns”
name
and will free it whenfree_directory_tree
is called on the node. Ifname
isNULL
, “ROOT” will be used. -
node_t *init_directory_node(char *name)
Allocates and initializes an empty directory node. The resulting node “owns”
name
and will free it whenfree_directory_tree
is called on the node. Ifname
isNULL
, “ROOT” will be used. -
void free_directory_tree(node_t *node)
Frees all memory associated with the entire tree rooted at
node
.
This function is very similar in structure to the
print_directory_tree
andcreate_directory_tree
functions below that you will be writing.
Your first task is to implement the functions to manipulate, print, and create a directory tree:
-
void add_child_directory_tree(directory_node_t *dnode, node_t *child)
Adds
child
todnode
’s children array. This function should userealloc
to get more space. It should also ensure that all nodes indnode
’s children array are in alphabetical order. -
void print_directory_tree(node_t *node)
Pretty prints the directory tree rooted at
node
subject to the following:- The root directory should be printed as “ROOT” (if you pass
NULL
toinit_directory_node
, the starter code does this) - Nodes at level \(n\) should be preceded with \(4n\) spaces (not tabs)
- Nodes at the same level should be indented the same amount
- Nodes at the same level should appear in alphabetical order
For example, a directory tree with the following nodes
- a/b/c/c
- a/b/c/a
- a/b/c/e
- a/b/c/b
- a/b/c/d
- a/b/d/d
- a/b/d/a
- a/b/d/b
- a/b/d/c
would be displayed as
1 2 3 4 5 6 7 8 9 10 11 12 13 14
ROOT a b c a b c d e d a b c d
- The root directory should be printed as “ROOT” (if you pass
-
void create_directory_tree(node_t *node)
Creates the directory structure corresponding to
node
on the file system (including the contents of the files). You will want to look intomkdir
(use0777
for permissions),fopen
,fwrite
, andfclose
. You should include the ROOT directory at the top in the files you create. You may not use thechdir
function in the standard library in this function or anywhere else in this assignment.
Task 1.
Implement the three functions above as specified. Then, make the test
target, and our tests for task 1 will run.
If you want to use print debugging, read how to get prints to show up in CS 24.
Part 2: Recovering a File System
In this part, you will be working with an image of Adam’s USB drive.
Checking the Disk Image
First, download usb.dmg
from the link above. Before you start trying to recover files, you should check that the files are actually deleted. To do this,
you’ll have to mount the image as a disk. On OS X, you can just double-click it. On Linux, you should use mount
. On Windows, you can see it using 7zip
or you can download
this program to actually mount it as a drive. Regardless of machine, you should notice that there aren’t any useful
files on the drive. So, we’ll have to recover them the hard way!
The FAT16 File System
Your computer organizes the files on your machine in a particular format called a “file system”. There are many formats (ntfs
on windows and ext3
on linux are some of the most common),
but one of the older formats which is used for cross-compatibility is called FAT (which stands for “File Allocation Table”). Adam’s USB drive used FAT16 because of the relatively small size.
In this section, we will describe the basics of the FAT16 file system to give you enough information to recover Adam’s files.
When a file is deleted on a FAT16 file system, the contents are actually kept in place! All that changes is the first character of the filename which is set to 0xe5
. This means we can
recover the files even though we can’t see them in finder/explorer!
The image begins with some special sections as pictured below:
Since we’re not trying to boot from the disk image, we can ignore the first section. Since the files are deleted, we don’t care what the system thinks is allocated. So, the only two sections
we care about in this “preamble” are: (1) the BPB and (2) the “root directory entries”. The BPB (represented by as a bios_parameter_block_t
in fat16.h
) stores a number of important parameters
about the data that follows. For now, all you have to do is fread
the bytes into a struct, and we will use the information contained inside later. The master
boot record has size const size_t MASTER_BOOT_RECORD_SIZE = 0x20B
as defined in recover.c
.
To get around the file, you will need to use
the fseek
function which takes an origin
as the third argument: SEEK_SET
indicates the offset is relative to the beginning of the file and SEEK_CUR
indicates the offset is relative to
where the file currently points; you may want to use both of these during this assignment.
Task 2.
In recover.c
, the starter code opens up a FILE *
called disk
. Internally, FILE
s store the current offset in the file (which you can get with ftell
) and fread
,
fseek
, and fwrite
change this pointer.
Your job is to skip past the master boot record (by fseek
ing 0x20B
bytes from the start), and fread
the
bios parameter block into the variable bpb
. Note that bpb
is NOT a pointer. Make sure the types match up in your call to fread
. Then, skip past the padding and the file allocation tables directly to the beginning of the root directory entries block. We have provided a helper function get_root_directory_location
which will get you this location, relative to the start of the file, without
any manual calculations.
Immediately after the File Allocation Tables, is an “array” of 512 directory_entry_t
’s which is where you should start your analysis. These directory_entry_t
’s all correspond to some
kind of file or directory, but we are only concerned with ones that aren’t hidden. We have provided a utility function bool is_hidden(directory_entry_t d)
which determines if a
proposed directory_entry_t
is hidden or not.
Notably, all directory_entry_t
’s have a “cluster number” which is basically a
pointer to a section on disk where the contents of that entry are located.
You can get an offset (in bytes) from the beginning of the file for this section
by calling our utility function get_offset_from_cluster(d.first_cluster, bpb)
,
which does a short calculation to determine where to go.
For file entries, the offset calculated will lead to the contents of that file.
For directory entries, the offset calculated will lead to a block of directory_entry_t
’s.
which are the entries contained in that directory.
The next several steps are to write the follow
function, which recursively follows all non-hidden directories and constructs node_t
’s out of them.
void follow(FILE *disk, directory_node_t *node, bios_parameter_block_t bpb)
Pre-Condition
The FILE *
disk points to a contiguous block of one or several directory_entry_t
’s,
which represent the files or directories in the directory associated with node
.
For each entry, if the entry read is not hidden…
- …and is a file: read
file_size
bytes from the offset location into achar *
, then make afile_node_t *
and attach the parent node to it. - …and is a directory: make a
directory_node_t *
, attach the parent node to it, and recursively callfollow
using the offset location.
You should use a loop to process directory_entry_t
’s from disk
until encountering an entry with a name that starts with \0
.
You should use fread
to read the next directory_entry_t
from disk
, which will move the
disk
pointer to the next directory_entry_t
in the block.
You will find the is_directory
function useful in determining if an entry is a file or directory.
In both cases, don’t forget to reset the location of the file to the next directory_entry_t
after you are done reading from the offset location.
Additionally, when you initialize the file/directory name, make sure to call the get_file_name
function we’ve provided, because FAT16 stores file names weirdly.
Task 3.
Implement follow
.
Task 4.
make recovery
and run your code. The contents of the disk will be spilled into the ROOT
folder. Find the directory containing your PDF and PASSWORD
file.
Open the PDF using the password you recovered, and follow the instructions inside to pass the gitlab tests. If you added the course late, there may not
be a specific PDF labelled for you. If that is the case for you, contact Adam via E-mail.