Why Bother?
In 2020, it seems kind of silly to learn assembly, because it really only shows up in very low level systems. We argue that it’s still worth learning for the following reasons:
- Almost all assembly is written by compilers or in the operating system, but who writes those? [N.B. in a week or so, you will build your own small compiler]
- The high-level language model occasionally breaks down and you have to read the assembly to understand the machine’s behavior! [N.B. in a few weeks, you will write some exploits that rely on understanding assembly]
- It’s important to understand the types of optimizations a compiler is capable of making–and those it isn’t! [N.B. when you write the compiler, you will also write some optimizations]
- Software is generally distributed in binary form; if you want to reverse engineer or security audit software, it’s going to be assembly!
From C to Binary
To give you a brief preview of where we’re headed, we will go through the process of taking C code and lowering it all the way to machine code. The toy program we will work with is
identity.c
:
identity.c
int identity(int x) {
return x;
}
The .s
extension is the standard way of indicating that a file contains assembly code. To get clang
to output an assembly file, you give the -S
switch as shown here:
The next step is taking the assembly file and turning it into an object file (or .o
) file which is a partial executable. You can use objects files to separately compile modules
and then link them together later. To get an object file, from an assembly file, you use the as
command:
And Back…
When reverse-engineering a binary, it is super useful to be able to go back to the assembly from a binary or object file. To do this, you can use the objdump
tool:
Additionally, we’ve written a wrapper for objdump
called cs24-dasm
which can take a binary and objdump
a single function. You can invoke it as follows:
For this course, you’ll find cs24-dasm
to be more useful than directly using objdump
.