Consider the example stream of 0’s and 1’s:
111000101001100010000100
What exactly do these 0’s and 1’s mean?
Raw Bits and Bytes
A bit is a “raw” 0 or 1. We call a series of eight bits in a stream a byte. Bytes are one of the smallest useful chunks of bits we work with.
Splitting our target bitstream into bytes looks like this:
11100010 10011000 10000100
A Shorthand for Bitstreams
Because bits are so primitive, we need many of them together to do anything useful; as a result, it quickly becomes annoying to use binary. We often write things (numbers, addresses, instructions) in base 16 instead.
Base 16 is called hexadecimal and it uses the symbols , where through represent 10 through 15. Notably, we can convert each hex digit into exactly four bits (and vice versa). For example, we can do the following conversion in both directions:
In computer systems contexts, we usually use prefixes (as above) instead of the mathematical notation. 0b
means binary, 0x
means hex, etc.
Converting our bytes to hex gives us:
0xe2 0x98 0x84
Since these bytes are contiguous, we can collapse them together as follows:
0xe29884
Bitstream Interpretations
At this point, we have a concise representation of the bit stream (0xe29884
), but it still doesn’t mean anything. We need to assign an interpretation to the bit stream.
Unsigned Numbers
One possible interpretation of the hex digits is as an unsigned 28-bit number. In this case, we read the number as follows:
Similarly, if we want to convert a binary number, e.g., 0b1101
to a decimal number, we’d apply the same process using powers of 2 instead of 16:
Colors
Another interpretation of our hex digits is as a color. We write colors as three bytes: the first represents “how much red”, the second represents “how much green”, and the
third represents “how much blue”. In web development, we write this prefixed with a #
:
#e29884
Strings
Another common interpretation of bytes is as letters and symbols. The simplest encoding is called ASCII, but, nowadays, to support all the fancy emojis, we use another
encoding called utf-8 or “unicode”. In python 3
, we can get a raw byte string by prefixing the string with a b
. To write a literal byte, we write \x
followed by the
hex digits of that byte. Finally, to tell python 3
to interpret the stream in a particular encoding, we use the decode
function. For our particular string, it creates a “comet”
in unicode but isn’t a valid ASCII sequence:
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
Assembly
Finally, one last way to view our bytes is as actual instructions that the machine can run (“machine code”). In this course, we will be working with x86-64, and the translation to this encoding looks like the following:
0: e2 98 loop 0xffffffffffffff9a
2: 84 .byte 0x84
Conclusion
We’ve discovered the important fact that the same bitstream can mean many different things! The context is just as important as the stream itself! Keep this in mind throughout the course as you see bitstreams!