cs24-20faRepresentation

Introduction to Computing Systems (Fall 2020)

To communicate with your computer, we must work with its representation of data. We explore various interpretations of a particular stream of 0’s and 1’s as a proxy for exploring what data representation generally looks like on a computer. Later, we’ll go into more details on most of these interpretations.

Consider the example stream of 0’s and 1’s:

111000101001100010000100

What exactly do these 0’s and 1’s mean?

Raw Bits and Bytes

A bit is a “raw” 0 or 1. We call a series of eight bits in a stream a byte. Bytes are one of the smallest useful chunks of bits we work with.

Splitting our target bitstream into bytes looks like this:

A Shorthand for Bitstreams

Because bits are so primitive, we need many of them together to do anything useful; as a result, it quickly becomes annoying to use binary. We often write things (numbers, addresses, instructions) in base 16 instead.

Base 16 is called hexadecimal and it uses the symbols $$\{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F\}$$, where $$A$$ through $$F$$ represent 10 through 15. Notably, we can convert each hex digit into exactly four bits (and vice versa). For example, we can do the following conversion in both directions:

In computer systems contexts, we usually use prefixes (as above) instead of the mathematical notation. 0b means binary, 0x means hex, etc.

Converting our bytes to hex gives us:

0xe2 0x98 0x84

Since these bytes are contiguous, we can collapse them together as follows:

Bitstream Interpretations

At this point, we have a concise representation of the bit stream (0xe29884), but it still doesn’t mean anything. We need to assign an interpretation to the bit stream.

Unsigned Numbers

One possible interpretation of the hex digits is as an unsigned 28-bit number. In this case, we read the number as follows:

\begin{align} \texttt{0xe29884} &= \,\,\,\texttt{e} \times 16^5 + 2 \times 16^4 + 9 \times 16^3 + 8 \times 16^2 + 8 \times 16^1 + 4 \times 16^0\\
&= 14 \times 16^5 + 2 \times 16^4 + 9 \times 16^3 + 8 \times 16^2 + 8 \times 16^1 + 4 \times 16^0\\
&= 14680064 + 131072 + 36864 + 2048 + 128 + 4\\\ &= 14850180 \end{align}

Similarly, if we want to convert a binary number, e.g., 0b1101 to a decimal number, we’d apply the same process using powers of 2 instead of 16: \begin{align} \texttt{0b1101} &= 1 \times 2^3 + 1 \times 2^2 + 0\times 2^1 + 1\times 2^0\\
&= 2^3 + 2^2 + 0 + 2^0\\\ &= 8 + 4 + 0 + 1\\\ &= 13 \end{align}

Colors

Another interpretation of our hex digits is as a color. We write colors as three bytes: the first represents “how much red”, the second represents “how much green”, and the third represents “how much blue”. In web development, we write this prefixed with a #:

#e29884

Strings

Another common interpretation of bytes is as letters and symbols. The simplest encoding is called ASCII, but, nowadays, to support all the fancy emojis, we use another encoding called utf-8 or “unicode”. In python 3, we can get a raw byte string by prefixing the string with a b. To write a literal byte, we write \x followed by the hex digits of that byte. Finally, to tell python 3 to interpret the stream in a particular encoding, we use the decode function. For our particular string, it creates a “comet” in unicode but isn’t a valid ASCII sequence:

Python 3.7.3 Interpreter

>>> comet = b'\xe2\x98\x84'
>>> print(comet)
b'\xe2\x98\x84'
>>> comet.decode('utf-8')
'☄'
>>> comet.decode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
>>> comet.decode('latin-1')
'â\x98\x84'

Assembly

Finally, one last way to view our bytes is as actual instructions that the machine can run (“machine code”). In this course, we will be working with x86-64, and the translation to this encoding looks like the following:

0:  e2 98                   loop   0xffffffffffffff9a
2:  84                      .byte 0x84


Conclusion

We’ve discovered the important fact that the same bitstream can mean many different things! The context is just as important as the stream itself! Keep this in mind throughout the course as you see bitstreams!