Encoding Data¶

What is Data anyway?¶

In this lecture, we will focus on data, the stuff stuff we want to manipulate in our programs.

When you start working on any problem, you probably have some idea as to what this data stuff is. Suppose you are teaching a class, and have to assign grades to a bunch of projects, then calculate the final grade for each student and assign a letter grade depending on all of that. (Sound familiar? Welcome to the work of teachers!)

In this case, the raw data is all of those grades, which are just numbers. What kind of numbers? Isn’t a number just a number?

Not really, some numbers represent whole things, like students. I doubt I will ever only see half a student in my class (unless they are sitting there half awake. I count students only using “whole” numbers. No fractions needed, or allowed.

Grades might be the same, I never only give a half of a point, so my grades can be expressed as whole numbers as well. But when I calculate the average of those grades, I might be interested in fractional parts. Those are completely different kinds of numbers.

Boy, this can get messy. How is our poor computer supposed to deal with all of this. Let’s explore that.

Data is what you need to work with¶

We need to start off by making one thing clear: Data is anything you want to manipulate:

words

numbers (maybe different kinds of numbers),

colors

phases of the moon

However, we must get those things into the computer, so ultimately, data is a pile of bits in the machine. Often we say data is just zeros and ones. How does that work?

Encoding things¶

Suppose we want to manipulate colors for a game program. How are we going to set things up so our program can do this?

Simple, we encode things from our human world into zeros and ones so the computer can work on them! This encoding is just a mapping between worlds. Something in one world is equivalent to something in another world.

As humans, we try to figure out how our world works. That is how we become scientists, or engineers, (That is how I did it anyway.)

Long ago, scientists figured out that colors are just different frequencies of basically something called electromagnetic energy (same thing as radio, by the way - only much higher frequency). Now, we do not care about all that science here, but it occurred to someone that we could record a number somewhere (maybe the frequency) and equate that number to a specific color. When we line up a list of numbers next to a list of colors, we are encoding colors as numbers. We can go back and forth between the two lists as we want. This is the basic idea of representing colors in a computer. Once we know the encoding, we can take a number and figure out the color, or take a color and figure out the number. We just look things up in the right table!

Where will we put this data?¶

The computer has memory where we will record things. That memory is a series of bytes, eight bits per byte. We use as many bytes as we need to store data. Typical data sizes are 8, 16, 32 or 64 bits. (Notice the pattern here, multiples of 8 bits. Also, powers of 2!) If our data needs more than one byte, we set up more than one byte to store is (usually, one of those common sizes we just listed!)

Pick a place in memory and store our encoded data there. It will end up as zeros and ones.

How do we find the data¶

Each memory location has an address, like a post office box number. These addresses start at 0 and count up to some big number, depending on how much money you had when you bought your computer! My laptop computer has 16GB of memory. That is 16 billion bytes of storage (yikes). We store a piece of data at some address in memory. We must be careful not to step on other data when we do this. This is a problem if we need multiple bytes of memory to store our data.

How many bytes do we need?¶

That depends on the data you want to use! How many characters are there in the alphabet? How many phases of the moon do you want to track?

We can create only so many different “patterns” of zeros and ones in any given number of bits.

Suppose we have only one bit. It is either a zero or a one. Two possible patterns. If we have two bits, we can have four patterns:

00

01

10

11

Hmmm, Two raised to the first power = 2. Two raised to the second power = 4. What about 3 bits? Two raised to the third power = 8, and that is how many different patterns we can come up with! Neat, for 8 bits, the number of different patterns is 256 (two raised to the eighth power). How about 16? (65536)

Most computers cannot manipulate anything in memory that is smaller than one byte, so we pick the minimum number of bytes needed to hold our data bits. If we have a few bit left over, we ignore them. (Our program ignores then, by the way. The computer could not care less!)

So, we can choose some number of bits in memory (or some number of bytes), and place a number in those bits and call it a color. The number of bits controls how many different colors we can deal with in our program. Eight bits gives us 256 colors, 16 bits will give us 65536 colors and so on.

Big problem!¶

How does the machine know what the bits represent?

It does not know that! That sounds silly, but it is true! Once data is in memory, it is the program that knows what the bits represent. Your program has to use the bits correctly. You have to make sure that when you manipulate those bits in your program, the result makes sense. Do wrong things and you get silly answers!

Rules for manipulating numbers¶

In the computer everything is basically just a number, so we need to set up the rules that control what happens. You already know all about one set of rules, you learned them in elementary school and use them every day. We even gave those rules an ugly name: math (ugh!)

Our math rules¶

These rules are simple:

take one or more numbers and perform some operation on then

you get some other number as a result.

We can add two numbers

we certainly hope the result is the sum of those two numbers

We can subtract two numbers

And so on.

The rules tell us what we should get for any of those math operations.

Computers know math rules¶

Fortunately, the designers of the computer taught the machine our math, so those rules work as expected.

But suppose we add two numbers that we are using to encode colors. What should the answer be?

Good question! The answer depends on what we humans think it should be, and (know what) adding two colors may not make any sense! But, we might think of adding like mixing two colors together (like cans of paint) and then we have an idea what the result should be. Hmmm! Maybe we cna teach our computer that when we tell it to “add” tow colors, the result should be the number that represents the color we humans would get when we mix them together at the paint store! That might be fun!

Warning

The physics of color mixing are pretty complex, so I am not thinking that would be a good lab project. I may revisit that idea later, though!

In most machines, colors are actually encoded in a different way. There are three electronic devices, call light-emitting diodes (LEDs), that can generate colors you see on a monitor.

One can only generate shades of red, another shades of green, and the last shades of blue. We put these three tiny diodes really close together, and when we light all of up in different shades, your eyeball(s) (Sorry, I only have one) mixes them up and your brain sees a specific color. Typically, we use 8 bits for each color, 24 bit total, for every “pixel” on your computer screen. My hotrod iMac has 5000 dots across and around 3000 down. That is one honking huge pile of pixels, and every one of those takes three bytes to record the color to display. If I am blowing up space aliens, I have to change all of those bytes really fast! Good thing our computers are really fast!

Recording our data - er - numbers¶

Once we have figured out our encoding scheme and have defined our rules, all we need to do is save away the number into memory. We need to remember where we put the number! It will be convenient to give the place a name. ( More on that later.

We might need millions of these memory containers to represent the colors on each dot on the monitor screen. (Boy, is that a lot of bits?) We apply the rules we set up for these colors, and change them as we want using our computer program. Than is where we actually make the rules work!

Manipulating the bits¶

Programs are all about manipulating a pile of bits! What makes things interesting is how we manipulate them.

BOOM! Another space alien bytes the dust (bad pun, there!)

When we actually record a number representing a color in memory, we can manuipulate that number to turn it into a different color.

It can really cause you to stop and think when you realize that most everything we ask computers to do is just fooling around with a pile of bits!

Another example¶

We will not go too deep into this one, yet.

Consider how we can get a computer to manipulate text. We need to break text down into fundamental chunks, then think how we will encode those chunks as bits in the computer. Fortunately, this is pretty easy (at least for us English-speaking folks).

Encoding text¶

We have an alphabet of 26 letters. We use these to form words, sentences, and paragraphs. So, we can come up with an encoding for each letter Then, we can teach a computer to play with text. Cool.

Not good enough!¶

But, thinking about this a bit more, we realize a problem. We have other things to consider as well. For example, we usually distinguish between upper and lower case letters, and we have punctuation symbols. Oh yeah, we also have those digit things we use to write down numbers!

Yikes! Is a number on a piece of paper the same thing as a number in the computer? NO!

The encoding of a number (meaning a quantity of things we can apply math rules to) is simple - we express the number as a bunch of 0’s and 1’s and record the result. But, when we want to display the number on a piece of paper, we need to convert the internal number into something that causes an output device to print out the right symbol.

Make sure you see the difference between these two ideas before you move on! Things in the computer have been encoded` so the computer can manipulate them. Those encodings are not the same thing as the original thing from our human world. Only the program knows what to do with the encodings.

Fundamental problem with computers¶

Suppose you want to build a program that manipulates both colors and text. How do you know what a place in memory holds? Is it a color, or a piece of text? The answer is you (the human) cannot tell. The memory just holds a simple sequence of zeros and ones. They all look the same. It is up to our program to remember which is which, and apply the right rules for each kind of data.

The formal term we will use in identifying what set of rules to apply is called the data type. We will explore this next.

A final example¶

As a final reminder, we still need rules for dealing with text.

What does “A” x “<” mean?

absolutely nothing,

So using math on letters or symbols is a meaningless activity. We need our programs to be smarter!

Python has an answer to multiplying a single letter by some number. It sort of makes sense, and can be useful at times: 5 * ‘A’ = ‘AAAAA’!

Neat. If I need a line of 80 ‘*’ characters, I can create one using 80 * ‘*’