# Introduction

In this project you will be writing a general purpose dynamic storage allocator for C programs; that is, your own version of the malloc, free, realloc, and calloc functions.

You are encouraged to explore the design space creatively and implement an allocator that is correct, efficient, and fast.

In order to get the most out of this project, we strongly encourage you to start early. The total time you spend designing and debugging can easily eclipse the time you spend coding.

Bugs can be especially pernicious and difficult to track down in an allocator, and you will probably spend a significant amount of time debugging your code. Buggy code will not get any credit.

# Setup

Like usual, you should register for the assignment by going to https://grinch.caltech.edu/register which will create you a repository on gitlab with the starter code. We recommend doing all testing on labradoodle.caltech.edu, because the driver is tuned to the reference solution subject to the machine characteristics.

In previous terms, we have noticed that students who jump directly to trying to write a performant allocator usually spend a disproportionate amount of time on the project (in debugging-land where nobody wants to be). So, this term, we’ve provided a roadmap to keep you from creating a mess of code that is impossible to debug. Specifically, we’ve laid out smaller chunks at the beginning (“implement parts of an implicit list”) followed by a larger chunk at the end (“convert your implicit list to an explicit list”). Our hope is that this gets you to the point where you understand everything that’s going on by the time you try to write your explicit list. As a result, we’ve attached points to the first few tasks which also increases the possibility of partial credit.

# Provided Support Functions

The memlib.c package simulates the memory system for your dynamic memory allocator. You can invoke the following functions in memlib.c:

• void *mem_sbrk(ssize_t incr)

Expands the heap by incr bytes, where incr is a non-negative integer, and returns a generic pointer to the first byte of the newly allocated heap area. The semantics are identical to the Unix sbrk function, except that mem_sbrk will fail for negative arguments.

• void *mem_heap_lo(void)

Returns a generic pointer to the first byte in the heap.

• void *mem_heap_hi(void)

Returns a generic pointer to the last byte in the heap.

• size_t mem_heapsize(void)

Returns the current size of the heap in bytes.

You are also allowed to use the following libc library functions: memcpy, memset, printf, fprintf, and sprintf. Other than these functions and the support functions, your mm-implicit.c and mm-explicit.c code may not call any externally-defined function.

# Required Functions

Your dynamic storage allocators will consist of the standard functions defined in an allocator library, which are declared in mm.h and defined in both mm-implicit.c and mm-explicit.c.

Because we are running on a 64-bit machine, your allocators must be coded accordingly. Notably, your allocators must work for heaps of up to $$2^{64}$$ bytes. We will check this requirement manually and deduct a significant number of points if you ignore it.

• bool mm_init(void)

Performs any necessary initializations, such as allocating the initial heap area. The return value should be false if there was a problem in performing the initialization, true otherwise. You must reinitialize all of your data structures in this function, because the drivers call your mm_init function every time they begin a new trace to reset to an empty heap.

• void *mm_malloc(size_t size)

The mm_malloc function returns a pointer to an allocated block payload of at least size bytes. The entire allocated block should lie within the heap region and should not overlap with any other allocated block. Your mm_malloc implementation must always return 16-byte aligned pointers.

• void mm_free(void *ptr)

The mm_free function frees the block pointed to by ptr. It returns nothing. This function is only guaranteed to work when the passed pointer was returned by an earlier call to mm_malloc, mm_calloc, or mm_realloc and has not yet been freed. mm_free(NULL) has no effect.

• void *mm_realloc(void *oldptr, size_t size)

The mm_realloc function returns a pointer to an allocated region of at least size bytes with the following constraints:

• If ptr is NULL, the call is equivalent to mm_malloc(size)
• If size is equal to zero, the call is equivalent to mm_free(ptr) and should return NULL;
• If ptr is not NULL, it must have been returned by an earlier call to mm_malloc or mm_realloc and not yet have been freed. The call to mm_realloc takes an existing block of memory, pointed to by ptr — the old block.

Upon return, the contents of the new block should be the same as those of the old block, up to the minimum of the old and new sizes. Everything else is uninitialized. Achieving this involves either copying the old bytes to a newly allocated region or reusing the existing region.

For example, if the old block is 16 bytes and the new block is 64 bytes, then the first 16 bytes of the new block are identical to the first 16 bytes of the old block. Similarly, if the old block is 32 bytes and the new block is 16 bytes, then the contents of the new block are identical to the first 16 bytes of the old block.

The function returns a pointer to the resulting region. The return value might be the same as the old block—perhaps there is free space after the old block, or size is smaller than the old block size—or it might be different.

If the call to mm_realloc does not fail and the returned address is different than the address passed in, the old block has been freed and should not be used, freed, or passed to mm_realloc again.

• void *mm_calloc(size_t nmemb, size_t size)

Allocates memory for an array of nmemb elements of size bytes each and returns a pointer to the allocated memory. The memory is set to zero before returning. Your mm_calloc will not be graded on throughput or performance. A correct, simple implementation will suffice.

• void mm_checkheap()

This function is optional, but, we strongly recommend you write it early.

The mm_checkheap function (the heap consistency checker, or simply heap checker) scans the heap and checks it for possible errors (e.g., by making sure the headers and footers of each block are identical). Your heap checker should run silently until it detects some error in the heap. Then, and only then, should it print a message and exit the program. It is very important that your heap checker run silently; otherwise, it will produce too much output to be useful on the large traces.

A quality heap checker is essential for debugging your malloc implementation. Many malloc bugs are too subtle to debug using conventional gdb techniques. The only effective technique for some of these bugs is to use a heap consistency checker. When you encounter a bug, you can isolate it with repeated calls to the consistency checker until you find the operation that corrupted your heap.

The semantics for mm_malloc, mm_realloc, mm_calloc, and mm_free match those of the corresponding libc functions.

# Programming Rules

• You are writing a general purpose allocator. You may not solve specifically for any of the traces—we will be checking for this. Any allocator that attempts to explicitly determine which trace is running (e.g., by executing a sequence of tests at the beginning of the trace) and change its behavior in a way that is appropriate only for that specific trace will receive a penalty of 20 points.
• You should not change any of the interfaces in mm.h, and your program must compile with the provided Makefile. However, we strongly encourage you to use helper functions in mm-implicit.c and mm-explicit.c to break up your code into small, easy-to-understand segments.
• You are not allowed to define any large global data structures such as large arrays, trees, or lists in your mm.c program. However, you are allowed to declare small global arrays, structs and scalar variables such as integers, floats, and pointers in mm.c.

In total, your global data should sum to at most 128 bytes. Global values defined with the const qualifier are not counted in the 128 bytes.

The reason for this restriction is that the driver cannot account for such global variables in its memory utilization measure. If you need space for large data structures, you can allocate space for them within the heap.

The 128-byte limit will not be tested by automated means. The TAs will check whether your code is within this limit by inspecting your code. Provide documentation in your comments on your use of global variables. Ideally, they should be declared in one single part of your program, and you should provide comments giving a tally of the number of bytes used.

• It is okay to look at any high-level descriptions of algorithms found in the textbook or elsewhere, but it is not acceptable to copy or look at any code of malloc implementations found online or in other sources, in K&R, or in CS:APP. You may, however, use any of starter code as you see fit. The allocators in K&R and CS:APP are, well, really bad. They are very difficult to read, and, as such, you should probably not even bother trying to understand them. Instead, look at the example implicit allocator from lecture.

• For consistency with the libc malloc package, which returns blocks aligned on 16-byte boundaries, your allocator must always return pointers that are aligned to 16-byte boundaries. The drivers will check this requirement.

• Your code must compile without warnings. Warnings often point to subtle errors in your code; whenever you get a warning, you should double-check the corresponding line to see if the code is really doing what you intended. If it is, then you should eliminate the warning by tweaking the code (for instance, one common type of warning can be eliminated by adding a type-cast where a value is being converted from one type of pointer to another). We have added flags in the Makefile to force your code to be error-free.

• You may not use function macros. Every definition and usage of a function macro in this project will receive a penalty of 20 points. We are not kidding; please do not test us.

• If you do not write helper functions, we will refuse to help you debug your code.

# The Trace-Driven Driver Programs

The driver programs generated when you run make test your mm-implicit.c and mm-explicit.c code for correctness, space utilization, and throughput. This driver programs are controlled by a set of trace files that are included in the traces subdirectory of your repository. Each trace file contains a sequence of commands that instruct the driver to call your malloc, realloc, and free functions in some sequence. The drivers and the trace files are the same ones we will use when we grade your assignment.

When the driver programs are run, they will run each trace file multiple times: once to make sure your implementation is correct, once to determine the space utilization, and between 3 and 20 times to determine the throughput.

The tester for the implicit list is called bin/mdriver-implicit, and the tester for the explicit list is called bin/mdriver-explicit.

You can use the -h flag to get a bunch of options on both drivers. Specifically:

• -D will run mm_checkheap before and after every operation
• -c traces/<file>.rep will run the trace traces/<file>.rep once and only check correctness

# Implicit List (30 points)

For the first section of this project, the only file you will need to edit is mm-implicit.c in which you will write an implicit list allocator. For this part, we will guide you through doing this in several steps that we believe make up the most efficient approach. In this part of the project, we ONLY care about space utilization. We will add in throughput in the next section.

Great job! You’ve now completed a pretty reasonable implicit list implementation. Next up is to convert it to an explicit list.

# Explicit List (60 points)

Assuming your explicit list passes all correctness tests, and the graders do not detect any errors in your source code, two metrics are used to evaluate performance:

• Space utilization: The peak ratio between the total amount of memory requested by the driver (i.e., allocated via malloc but not yet freed via free) and the size of the heap used by your allocator. The optimal (but unachievable) ratio is 100%. You should find good policies to minimize fragmentation in order to make this ratio as close as possible to the optimal.

• Throughput: The average number of operations completed per second, expressed in kilo-operations per second or KOPS. A trace that takes $$T$$ seconds to perform $$n$$ operations will have a throughput of $$n/(1000\cdot T)$$ KOPS. Throughput measurements vary according to the type of CPU running the program.

The driver will report a preliminary score after running all the tests.

The score you see is not a guaranteed grade. We will re-run the driver on your code during grading. We recommend running the driver multiple times to ensure that performance is consistent. When grading, we will use the labradoodle.caltech.edu machine. So, we recommend you test there.

The gitlab tests do not test performance; so, passing those is not a gurantee of any grade.

# Extra Credit (+15 points)

Learn about balanced binary search trees (either red-black trees or AVL trees) and implement a malloc that uses one of these data structures as the free list. For extra-extra credit, develop traces that demonstrate that your tree is better than either list.

# Performance Index

It is not necessary to read this section, but if you are curious how the performance scores are calculated, the details are here. Observing that both memory and CPU cycles are expensive system resources, we combine these two measures into a single performance index $$P$$, with $$0 \leq P \leq 100$$, computed as a weighted sum of the space utilization and throughput:

$P\left(U,T\right) = 100\,\left(w \cdot {\min}\left(1, \frac{U - U_{Y min}}{U_{Y max} - U_{Y min}}\right) + (1-w)\cdot {\min}\left(1, \frac{T - T_{Y min}}{T_{Y max} - T_{Y min}}\right)\right)$

where $$U$$ is the space utilization (averaged across the traces), and $$T$$ is the throughput (averaged across the traces, but weighted by the number of operations in each trace). $$U_{Y max}$$ and $$T_{Y max}$$ are the estimated space utilization and throughput of a well-optimized malloc package, and $$U_{Y min}$$ are $$T_{Y min}$$ are minimum space utilization and throughput values, below which you will receive 0 points. The weight $$w$$ defines the relative weighting of utilization versus performance in the score. We have chosen $$w = 0.60$$.

The values for $$U_{Y min}$$, $$U_{Y max}$$, $$T_{Y min}$$, and $$T_{Y max}$$ are constants in the driver (0.61, 0.85, 0 Kops/s, and 40,000 Kops/s) that we chose when we configured the program. This means that once you beat 85\% utilization and 40,000 Kops/s, your performance index is perfect. We computed these values relative to the performance of reference solutions.

# Acknowledgements

This project is adapted from an assignment by R. Bryant and D. O’Hallaron at CMU.