.. _cosc1337-data-intro: What is Data anyway? #################### .. include:: /references.inc Programs are all about manipulating data of all kinds. As we start learning C++, we need to examine how it handles data. C++ is not like the Python world most of you came from, so pat attention! What is data? ************** We need to start off by making one thing clear: :term:`data` is anything you want to manipulate: words, numbers, colors, phases of the moon, or anything at all. However, we must get those things into the computer, so ultimately, :term:`data` is just zeros and ones. How does that work? Encoding things =============== Suppose we want to manipulate colors for a game program. How are we going to set things up so our program can do this? As humans, we tend to try to figure out how our world works. That is how we become scientists, or engineers (that is how I did it anyway). Long ago, scientists figured out that colors are just different frequencies of basically something called ``electromagnetic energy`` (same thing as radio, by the way - only much higher frequency). Now, we do not care about all that science here, but it occurred to someone that we could record a number somewhere (maybe the frequency) and equate that number to a specific color. When we line up a list of numbers next to a list of colors, we are :term:`encoding` colors as numbers. We can go back and forth between the two lists as we want. This is the basic idea of representing colors in a computer. Once we know the :term:`encoding`, we can take a number and figure out the color, or take a color and figure out the number. How will we set up the numbers? =============================== Based on our previous discussions about how memory in our computer is set up, we know that we are going to record the number in memory, and we can choose any number of bits to store that number. The computer likes to group bits in eight-bit chunks, so we usually choose multiples of eight-bits when we decide how big a piece of memory we want to use to record the number. So, we can choose some number of bits in memory (or some number of bytes), place a binary number in those bits and call it a number. The number of bits controls how many different colors we can deal with in our program. Eight bits gives us 256 colors, 16 bits will give us 65536 colors and so on. Rules for manipulating numbers ------------------------------ Here comes an important concept! When we manipulate our numbers, we need to set up the rules that control what happens. You already know all about one set of rules - you learned then in elementary school and use them every day. We even gave those rules an ugly name - ``math`` (ugh!) The rules are simple: if you take one or more numbers and perform some operation on then, you get some other number as a result. When we ``add`` two numbers, we certainly hope the result is the sum of those two numbers - since that is what the rules say we should get. Fortunately, the designers of the computer set the machine up so those rules work as expected. But suppose we add two numbers that we are using to encode colors, what should the answer be? Good question! The answer depends on what we humans think it should be, and (know what) adding two colors may not make any sense! But, we might think of adding like mixing two colors together (like cans of paint) and then we have an idea what the result should be. Hmmm! Fortunately, the scientists who studied colors figured out how combining colors works at a deeper level, and we finally came up with an encoding scheme that makes more sense - and even let us build machines that can display those colors. (We call those machines ``monitors``). Recording our data - er - numbers ================================= Once we have figured out our :term:`encoding` scheme, and have defined our rules, all we need to do is save away the number we want into some place in memory. We will probably want more than one color. We might need millions of these containers to represent the colors on each dot on the ``monitor`` screen. (Boy, is that a lot of bits?) We apply the rules we set up to the colors to change them as we want using our ``computer program`` that is where we actually make the rules work! Manipulating the bits ===================== Programs are all about manipulating a pile of bits! What makes things interesting is what we see when we watch the program go! BOOM! Another space alien bytes the dust (bad pun, there!) When we actually record a number representing a color in memory, we can ``manuipulate`` that number to turn it into a different color. .. note:: It can really cause you to stop and think when you realize that most everything we ask computers to do is just fooling around with a pile of bits! Another example ---------------- We will not go too deep into this one, but let's consider how we can get a computer to manipulate text. Once again, we need to break text down into fundamental chunks and think how we will encode those chunks as bits in the computer. Fortunately, this is pretty easy (at least for us English-speaking folks). We have an alphabet of 26 letters we use to form words, sentences, paragraphs and so forth. We all know that. So, if we come up with an :term:`encoding` for each letter, we can teach a computer to play with text. Cool. But, thinking about this a bit more, we realize that we have other things to consider as well. For example, we distinguish between upper and lower case letters, we have punctuation symbols, and (oh yeah) those digit things we use to write down numbers! .. note:: Yikes! Is a number on a piece of paper the same thing as a number in the computer? NO! The encoding of a number (meaning a quantity of things we can apply math rules to) is simple - we express the number in binary and record the result. But, the ``display`` of a number on a piece of paper is just the series of symbols we want to print on paper. Each digit has its own encoding and the number on paper is just a sequence of those digit encodings - as many as it takes to display the number the way we want to see it. Clear? .. warning: Make sure you see the difference between these two ideas before you move on! A Fundamental problem with computers ==================================== Suppose you want to build a program that manipulates both colors and text, how do you know that a place in memory holds a color or a piece of text? The answer is you cannot - they are both sitting in memory as a sequence of zeros and ones. They look the same. It is up to our program to remember which is which and apply the right rules for each kind of data. .. note:: The formal term we will use in identifying what set of rules to apply is ``data type``. We will explore this next. As a final reminder, we still need rules for dealing with text. What does "A" x "<" mean - absolutely nothing, so using math on letters or symbols is a meaningless activity. We need our programs to be smarter!