Consider the following `hashcode`

function. (Note that this is a horrible implementation
for a variety of reasons. We’ll fix it in a bit. (Haha bit.))

hashcode.c

```
int8_t hashcode(char *data) {
int8_t h = 0;
for (int i = 0; i < strlen(data); i++) {
h += pow(31, i) * data[i];
}
return h;
}
```

When you run this function on some inputs, the outputs are somewhat peculiar:

First, why is the second one smaller than the first? Second, why is the third one negative??

The answers to these questions lies in the specification of the `int8_t`

type, but we would observe similar behavior for a more “normal”
type like `int`

or `long`

.

# Integer Types

The C specification provides quite…strange definitions of the “standard” integer types. All integer types are “fixed-width” meaning they are represented by a fixed number of bits and cannot be expanded beyond that range. Specifically, a -bit fixed-width integer can represent distinct numbers–which numbers those are depends on the type!

## What is an `int`

?

An `int`

is defined by the C specification to be at least 16 bits in width. Given that we work mostly with 64-bit machines, this is not a particularly useful definition for us.
In fact, you’ll find that on most (but not all!) modern machines `int`

’s are 32 bits and `long`

s are 64 bits. Instead of using these loosely defined types, we will use types
with explicit sizes instead:

- If we want to work with an
**unsigned**(non-negative) integer type, we will use`uintx_t`

(found in`inttypes.h`

) where`x`

is the width we want (8, 16, 32, or 64). - If we want to work with an
**signed**(capable of negative numbers) integer type, we will use`intx_t`

(found in`inttypes.h`

) where`x`

is the width we want (8, 16, 32, or 64). - For indices into memory or sizes, we will use the
`size_t`

(found in`stddef.h`

) type which will have a range including all valid sizes for the corresponding machine.

### Why not just use `int`

and `long`

?

In general, we want to use the special types to write portable code. However, in particular circumstances (e.g., the `jvm`

project), certain fields are guranteed to be exactly
a particular width; if we want to represent these values correctly, we need to ensure that we are capable of representing exactly the right range.

## Unexpected Results?

Now, we get back to the unexpected results from before. Unfortunately, a reality with fixed-width integer types is that we must define arithmetic to “work” even if the result is not storable in the number of bits we have. (More formally, we want (Z/nZ, +, x) to form a ring to get the “normal” properties we expect.)

For unsigned integers, this works out to modular arithmetic. For signed integers, we use a representation called “two’s complement” which we explain below.

## Unsigned Representation

Unsigned integers use modular arithmetic. This means if we **overflow** or **underflow**, we just normalize and take a modulus to get back a valid answer.
Here are three examples each of which demonstrates a facet of how fixed-width **unsigned** integers work:

unsigned-overflow.c

```
uint8_t x = 100;
uint8_t y = 200;
uint8_t z = x + y;
printf("%d %d %d\n", x,y,z);
```

100 200 `44`

b/c

unsigned-widening.c

```
uint8_t x = 100;
uint8_t y = 200;
uint16_t z = x + y;
printf("%d %d %d\n", x,y,z);
```

100 200 `300`

The pieces get “upcast”.

unsigned-narrowing.c

```
uint16_t x = 1000;
uint16_t y = 2000;
uint8_t z = x + y;
printf("%d %d %d\n", x,y,z);
```

1000 2000 `184`

b/c

Since the smallest number we can represent is , and there are total distinct representable numbers, it follows that the largest number we can represent is . A binary number can be converted to decimal using the formula as described in the notes on representation.

## Signed Representation (“Two’s Complement”)

The formula for a signed number is slightly different:

That is, the **most significant bit** is has a *negative* weight (instead of a positive one like all the other bits). This definition seems a little strange; so, we’ll go through
some theory on why it makes sense along with some examples of its use.

There are several reasons why this definition is convenient:

- We can use the same exact operations (i.e., circuits) for addition and multiplication for two’s complement as for unsigned numbers. This has implications for what assembly instructions and types are necessary; this representation makes the processor agnostic to if the number is signed or unsigned!
- There is only one representation for each number in the range. In particular, other representations often end up with “positive zero” and “negative zero”.
- We can immediately determine if a number is negative or non-negative by checking the highest bit (1 means negative; 0 means non-negative).

There are also some drawbacks to this definition:

- There are not the same number of positive numbers as negative numbers! There is one more representable negative number than positive number. (This can be quickly seen by noting that there are an equal number of bit patterns that start with 0 as 1 and 0 starts with a 0.)
- Inverting all the bits (written
`~x`

) does not give us the negation of the number. Instead, we have to add 1; that is,`-x = ~x + 1`

.

Processor designers have almost universally decided that the convenience is more compelling than the drawbacks; so, we’re stuck with it! Might as well learn how it works. :)

signed-overflow.c

```
int8_t x = 100;
int8_t y = 99;
int8_t z = x + y;
printf("%d %d %d\n", x,y,z);
```

100 99 `-57`

We’ll explain later.