Introduction
In this project you will be writing a general purpose dynamic
storage allocator for C programs; that is, your own version of the
malloc
, free
, realloc
, and calloc
functions.
You are encouraged to explore the design space creatively and implement an allocator that is correct, efficient, and fast.
In order to get the most out of this project, we strongly encourage you to start early. The total time you spend designing and debugging can easily eclipse the time you spend coding.
Bugs can be especially pernicious and difficult to track down in an allocator, and you will probably spend a significant amount of time debugging your code. Buggy code will not get any credit.
Please be advised: At the end of this project, an arbitrary subset of students will be asked to complete an oral interview explaining their code as part of the assignment. Students could be selected for any reason, including to ensure understanding or because course staff thinks the code is really cool.
These interviews are expected to last about 15 minutes, and your code will be available to you throughout. If you are called to interview, course staff will contact you to schedule a time, and your grade will be held until the interview is completed.
Setup
Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register
which will create you a repository on gitlab with the starter code.
We recommend doing all testing on labradoodle.caltech.edu
, because the driver is tuned to the reference solution subject to the machine characteristics.
Roadmap
In previous terms, we have noticed that students who jump directly to trying to write a performant allocator usually spend a disproportionate amount of time on the project (in debugging-land where nobody wants to be). So, this term, we’ve provided a roadmap to keep you from creating a mess of code that is impossible to debug. Specifically, we’ve laid out smaller chunks at the beginning (“implement parts of an implicit list”) followed by a larger chunk at the end (“convert your implicit list to an explicit list”). Our hope is that this gets you to the point where you understand everything that’s going on by the time you try to write your explicit list. As a result, we’ve attached points to the first few tasks which also increases the possibility of partial credit.
The open-ended questions in this document are part of your grade on this project. Make sure to submit your answers on Gradescope!!! You may submit as many times as you want.
Provided Support Functions
The memlib.c
package simulates the memory system for your
dynamic memory allocator. You can invoke the following functions
in memlib.c
:
-
void *mem_sbrk(ssize_t incr)
Expands the heap by
incr
bytes, whereincr
is a non-negative integer, and returns a generic pointer to the first byte of the newly allocated heap area. The semantics are identical to the Unixsbrk
function, except thatmem_sbrk
will fail for negative arguments. -
void *mem_heap_lo(void)
Returns a generic pointer to the first byte in the heap.
-
void *mem_heap_hi(void)
Returns a generic pointer to the last byte in the heap.
-
size_t mem_heapsize(void)
Returns the current size of the heap in bytes.
You are also allowed to use the following libc
library
functions: memcpy
, memset
, printf
, fprintf
,
and sprintf
. Other than these functions and the support functions, your mm-implicit.c
and mm-explicit.c
code may not call any externally-defined function.
Required Functions
Your dynamic storage allocators will consist of the standard functions defined in an allocator library, which are declared in mm.h
and defined in both mm-implicit.c
and mm-explicit.c
.
Because we are running on a 64-bit machine, your allocators must be coded accordingly. Notably, your allocators must work for heaps of up to \(2^{64}\) bytes. We will check this requirement manually and deduct a significant number of points if you ignore it.
-
bool mm_init(void)
Performs any necessary initializations, such as allocating the initial heap area. The return value should be
false
if there was a problem in performing the initialization,true
otherwise. You must reinitialize all of your data structures in this function, because the drivers call yourmm_init
function every time they begin a new trace to reset to an empty heap. -
void *mm_malloc(size_t size)
The
mm_malloc
function returns a pointer to an allocated block payload of at leastsize
bytes. The entire allocated block should lie within the heap region and should not overlap with any other allocated block. Yourmm_malloc
implementation must always return 16-byte aligned pointers. -
void mm_free(void *ptr)
The
mm_free
function frees the block pointed to byptr
. It returns nothing. This function is only guaranteed to work when the passed pointer was returned by an earlier call tomm_malloc
,mm_calloc
, ormm_realloc
and has not yet been freed.mm_free(NULL)
has no effect. -
void *mm_realloc(void *oldptr, size_t size)
The
mm_realloc
function returns a pointer to an allocated region of at leastsize
bytes with the following constraints:- If
ptr
isNULL
, the call is equivalent tomm_malloc(size)
- If
size
is equal to zero, the call is equivalent tomm_free(ptr)
and should returnNULL
; -
If
ptr
is notNULL
, it must have been returned by an earlier call tomm_malloc
ormm_realloc
and not yet have been freed. The call tomm_realloc
takes an existing block of memory, pointed to byptr
— the old block.Upon return, the contents of the new block should be the same as those of the old block, up to the minimum of the old and new sizes. Everything else is uninitialized. Achieving this involves either copying the old bytes to a newly allocated region or reusing the existing region.
For example, if the old block is 16 bytes and the new block is 64 bytes, then the first 16 bytes of the new block are identical to the first 16 bytes of the old block. Similarly, if the old block is 32 bytes and the new block is 16 bytes, then the contents of the new block are identical to the first 16 bytes of the old block.
The function returns a pointer to the resulting region. The return value might be the same as the old block—perhaps there is free space after the old block, or
size
is smaller than the old block size—or it might be different.If the call to
mm_realloc
does not fail and the returned address is different than the address passed in, the old block has been freed and should not be used, freed, or passed tomm_realloc
again.
- If
-
void *mm_calloc(size_t nmemb, size_t size)
Allocates memory for an array of
nmemb
elements ofsize
bytes each and returns a pointer to the allocated memory. The memory is set to zero before returning. Yourmm_calloc
will not be graded on throughput or performance. A correct, simple implementation will suffice. -
void mm_checkheap()
This function is optional, but, we strongly recommend you write it early.
The
mm_checkheap
function (the heap consistency checker, or simply heap checker) scans the heap and checks it for possible errors (e.g., by making sure the headers and footers of each block are identical). Your heap checker should run silently until it detects some error in the heap. Then, and only then, should it print a message and exit the program. It is very important that your heap checker run silently; otherwise, it will produce too much output to be useful on the large traces.A quality heap checker is essential for debugging your
malloc
implementation. Manymalloc
bugs are too subtle to debug using conventionalgdb
techniques. The only effective technique for some of these bugs is to use a heap consistency checker. When you encounter a bug, you can isolate it with repeated calls to the consistency checker until you find the operation that corrupted your heap.
The semantics for mm_malloc
, mm_realloc
, mm_calloc
, and mm_free
match those of the corresponding libc
functions.
Programming Rules
- You are writing a general purpose allocator. You may not solve specifically for any of the traces—we will be checking for this. Any allocator that attempts to explicitly determine which trace is running (e.g., by executing a sequence of tests at the beginning of the trace) and change its behavior in a way that is appropriate only for that specific trace will receive a penalty of 20 points.
- You should not change any of the interfaces in
mm.h
, and your program must compile with the provided Makefile. However, we strongly encourage you to use helper functions inmm-implicit.c
andmm-explicit.c
to break up your code into small, easy-to-understand segments. -
You are not allowed to define any large global data structures such as large arrays, trees, or lists in your
mm.c
program. However, you are allowed to declare small global arrays, structs and scalar variables such as integers, floats, and pointers inmm.c
.In total, your global data should sum to at most 128 bytes. Global values defined with the
const
qualifier are not counted in the 128 bytes.The reason for this restriction is that the driver cannot account for such global variables in its memory utilization measure. If you need space for large data structures, you can allocate space for them within the heap.
The 128-byte limit will not be tested by automated means. The TAs will check whether your code is within this limit by inspecting your code. Provide documentation in your comments on your use of global variables. Ideally, they should be declared in one single part of your program, and you should provide comments giving a tally of the number of bytes used.
-
It is okay to look at any high-level descriptions of algorithms found in the textbook or elsewhere, but it is not acceptable to copy or look at any code of
malloc
implementations found online or in other sources, in K&R, or in CS:APP. You may, however, use any of starter code as you see fit. The allocators in K&R and CS:APP are, well, really bad. They are very difficult to read, and, as such, you should probably not even bother trying to understand them. Instead, look at the example implicit allocator from lecture. -
For consistency with the
libc
malloc
package, which returns blocks aligned on 16-byte boundaries, your allocator must always return pointers that are aligned to 16-byte boundaries. The drivers will check this requirement. -
Your code must compile without warnings. Warnings often point to subtle errors in your code; whenever you get a warning, you should double-check the corresponding line to see if the code is really doing what you intended. If it is, then you should eliminate the warning by tweaking the code (for instance, one common type of warning can be eliminated by adding a type-cast where a value is being converted from one type of pointer to another). We have added flags in the Makefile to force your code to be error-free.
-
You may not use function macros. Every definition and usage of a function macro in this project will receive a penalty of 20 points. We are not kidding; please do not test us.
- If you do not write helper functions, we will refuse to help you debug your code.
The Trace-Driven Driver Programs
The driver programs generated when you run make
test your
mm-implicit.c
and mm-explicit.c
code for correctness, space utilization, and
throughput. This driver programs are controlled by a set of trace files
that are included in the traces
subdirectory of your repository. Each trace
file contains a sequence of commands that instruct the driver to
call your malloc
, realloc
, and free
functions in some
sequence. The drivers and the trace files are the same ones we will
use when we grade your assignment.
When the driver programs are run, they will run each trace file multiple times: once to make sure your implementation is correct, once to determine the space utilization, and between 3 and 20 times to determine the throughput.
The tester for the implicit list is called bin/mdriver-implicit
, and the tester for the
explicit list is called bin/mdriver-explicit
.
You can use the -h
flag to get a bunch of options on both drivers. Specifically:
-
-D
will runmm_checkheap
before and after every operation -
-c traces/<file>.rep
will run the tracetraces/<file>.rep
once and only check correctness
Implicit List (30 points)
For the first section of this project, the only file you will need to edit is mm-implicit.c
in which you will write an
implicit list allocator. For this part, we will guide you through doing this in several steps that we believe make up
the most efficient approach. In this part of the project, we ONLY care about space utilization. We will add in throughput
in the next section.
Task 1.
The mm-implicit.c
file we have given you implements the very simple allocator we discussed in class. Unfortunately, it’s not quite complete.
In particular, the realloc
and calloc
implementations are missing which makes the tester fail due to “correctness”. Write realloc
and calloc
to make the tester pass the correctness tests.
Task 2.
Now that your mm-implicit.c
is passing the correctness tests, you should add block splitting as discussed in class. A working implementation of block
splitting should give a score around 23/30.
Task 3. Finally, improve the implicit list further by adding delayed coalescing (e.g., coalescing in malloc instead of free) as discussed in class. You will likely want to loop through the blocks in the heap before looking for a fit. A correct implementation of coalescing combined with splitting from the previous step should give a score of 30/30.
Great job! You’ve now completed a pretty reasonable implicit list implementation. Next up is to convert it to an explicit list.
Explicit List (60 points)
Assuming your explicit list passes all correctness tests, and the graders do not detect any errors in your source code, two metrics are used to evaluate performance:
-
Space utilization: The peak ratio between the total amount of memory requested by the driver (i.e., allocated via
malloc
but not yet freed viafree
) and the size of the heap used by your allocator. The optimal (but unachievable) ratio is 100%. You should find good policies to minimize fragmentation in order to make this ratio as close as possible to the optimal. -
Throughput: The average number of operations completed per second, expressed in kilo-operations per second or KOPS. A trace that takes \(T\) seconds to perform \(n\) operations will have a throughput of \(n/(1000\cdot T)\) KOPS. Throughput measurements vary according to the type of CPU running the program.
The driver will report a preliminary score after running all the tests.
The score you see is not a guaranteed grade. We will re-run the driver on your code during grading.
We recommend running the driver multiple times to ensure that performance is consistent.
When grading, we will use the labradoodle.caltech.edu
machine. So, we recommend you test there.
The gitlab
tests do not test performance; so, passing those is not a gurantee of any grade.
Task 4.
Copy your implicit list over to mm-explicit.c
. Then, modify it to be an explicit list. You may make any design decisions you want
subject to the requirements in this specification. This task will be the bulk of the work on this project.
Did you remember to submit your answers to the OpenEnded questions on Gradescope???
Extra Credit (+15 points)
Learn about balanced binary search trees (either red-black trees or AVL trees) and implement a malloc that uses one of these data structures as the free list. For extra-extra credit, develop traces that demonstrate that your tree is better than either list.
Performance Index
It is not necessary to read this section, but if you are curious how the performance scores are calculated, the details are here. Observing that both memory and CPU cycles are expensive system resources, we combine these two measures into a single performance index \(P\), with \(0 \leq P \leq 100\), computed as a weighted sum of the space utilization and throughput:
where \(U\) is the space utilization (averaged across the traces),
and \(T\) is the throughput (averaged across the traces, but weighted by
the number of operations in each trace). \(U_{Y max}\) and
\(T_{Y max}\) are the estimated space utilization and throughput of a well-optimized
malloc
package, and \(U_{Y min}\) are \(T_{Y min}\) are minimum space utilization
and throughput values, below which you will receive 0 points. The
weight \(w\) defines the relative weighting of utilization versus
performance in the score. We have chosen \(w = 0.60\).
The values for \(U_{Y min}\), \(U_{Y max}\), \(T_{Y min}\), and \(T_{Y max}\) are constants in the driver (0.61, 0.85, 0 Kops/s, and 40,000 Kops/s) that we chose when we configured the program. This means that once you beat 85\% utilization and 40,000 Kops/s, your performance index is perfect. We computed these values relative to the performance of reference solutions.
Acknowledgements
This project is adapted from an assignment by R. Bryant and D. O’Hallaron at CMU.