Storing Data in a Computer ########################## .. seealso:: Text: Chapter 1: 1.3-1.4 .. include:: /header.inc Computers store data in fixed-size containers located in the memory of the machine. Some folks like to say this:: It is all zeros and ones inside the machine! In fact, this is almost right! There really are no zeros or ones anywhere in the machine. Instead there are `voltages` we interpret as zeros or ones. Thank the electronics engineers for dealing with that problem! Those containers are not infinitely big, so there are some numbers in our human world we just cannot represent accurately in a computer. Programmers need to learn how to deal with that fact. For example, how many digits do you have to write down to represent 1/3 exactly? (Don't try that, there is not enough paper to do that!). The computer gets as close as it can, then just lops off any extra digits. Anything we want the computer to process: numbers, colors, text, has to be "encoded" somehow and we need to teach the computer how to manipulate our encodings in a way that works the way we expect things to work. Math on numbers should work! We encode data in simple ways. We assign binary codes to each kind of data we want to manipulate. Almost all computer programming languages support a simple set of standard kinds of data: * Unsigned (positive) integers (with no decimal point) * Signed (positive or negative integers (with no decimal point) * Floating point (fractional data - with a decimal point) * Characters (simple letters from the keyboard) * Boolean (true/false only) Data Types ********** We call these kinds of data `data types`. It is important to realize that the encoding for each different kind of data is different. If the computer looks at an encoding, it really cannot tell what kind of data it is looking at. It is up to the `program` to make sure you are manipulating those data properly. Think about it. What does it mean to multiply the letter `A` by the letter `Q`? To the computer, those letters are stored in the machine as codes which look exactly like the codes for numbers. The computer knows how to multiply numbers! But to us, multiplying those two letters is probably meaningless, so we cannot interpret the resulting number in any meaningful way! Good programming languages protect us against such silliness! We can build up more complex things by storing multiple copies of these standard `data types` in bigger containers. We call these more complex data containers `data structures`. For now, let's keep things simple. Variables ========= We create containers to hold our data. In most languages, you have to `declare` these containers before you can use them. Part of the `declaration` process involves telling the computer how big the container should be, and what `operations` are legal for that kind of data. The language processor will make sure you do not do silly things. (Unless you really want to see "A" * "Q"!) We call these `containers` `variables` because the actual value we store in that container can change as processing takes place. `Variables` are always named by the programmer. The name chosen must follow some simple rules, and it really should help the reader of the program understand what the variable is all about. `max_value` is a much better name than `x24c`. (Unless you are actually working on the NASA X24C, which was a research vehicle that looked like the Space Shuttle, long before they built that vehicle!) Constants ========= Some values should not be allowed to change while processing takes place. These containers will be marked as `constants` which tells the computer not to mess with the value stored there. Things like the standard math value for `pi` (3.1415926...) would logically be treated as a constant. You would never want to change it! Note that it is important that you tell the computer to treat certain containers as constants. The computer really has no idea what `pi` is, or why it should not change. Only you know that! How Big Can Numbers Be? *********************** There is one huge issue with working with numbers in a computer. The computer can only hold numbers of some fixed size. As an example, let's say we use 8 bits (binary digits) to store a number. How big can that number be? Well, according to our rules for forming binary numbers that is this: .. code-block:: text 1 * 2^0 = 1 + 1* 2^1 = 2 + 1 * 2^2 = 4 + 1 * 2^3 = 8 + 1 * 2^4 = 16 + 1 * 2^5 = 32 + 1 * 2^5 = 64 + 1 * 2^6 = 128 --------------- 255 An easier way to figure this out is this: .. code-block:: text N = 2^(number of bits used) -1 = 2^8 -1 = 256 - 1 = 255 Not a very big number, is it? If we use 32 bits, we can store a number around 4 billion. Much better. Manuals for language processing tools will tell you how big the numbers can be for the different data types. It is up to the programmer to decide how big the numbers should be, and how big a container is needed. That is something we will look at when we study at a real programming language later in the course. BTW, I suspect storing the number of students only needs a few bits. I doubt that I will ever see a class with 4 billion students! However, those Massive online Courses do get to tens of thousands of students! Yikes! Who is going to grade all that work. Guess who! Computers! Negative numbers **************** So far, we have only discussed positive whole numbers. We call those numbers unsigned ``integers``. We know that some numbers are negative, so we need a way to tell if a number is negative. The scheme for dealing with negative numbers is a bit complex, and we will not get into that in this course. Basically, we end up using the left-most binary bit to tell us if the rest of the digits show a positive number, or a negative number. That will be enough for now. Floating Point Numbers ********************** Scientists and Engineers are always working with huge numbers and tiny numbers. They want lots of digits and maybe tiny fractions. So, how do we represent a number that suits these strange folks? We only have so many bits. Let's try an experiment. Suppose we only have eight bits to work with. According to our binary number study that means we can only have 256 different codes stuffed into those eight bits. We could use those 256 codes to represent whole numbers for 0 to 255, or from -127 to +128 (is that right? Remember that zero number!) What if we need fractions? What does this number mean (human number, not computer number): * 25.625 That dot in the middle is the "decimal point" (We are in the decimal system.) The columns to the right of that point are still referring to powers of ten, but the digits to the right of the decimal point refer to negative powers of ten, meaning one over the base raised to that power. So the first column to the right is one over ten, or the tenths column, the next one is one over 100, the hundredths column, and so on. When we switch to binary, we do the same thing. However, we are still dealing with 0's and 1's and the base is two! So, here is a binary fractional number we could pack into eight bits: * 1101.0101 In this encoding, we do not need to record the decimal point, we assume it is in the middle. We can represent decimal numbers between 0 and 15 on the left of the "binary point", ans if we need to handle negative numbers. We can handle fractions from 0 to whatever those right bits give us. Hmmm, what do they give us? Let's see: * 0.0101 = 1*2^(-2) + 1*2^(-4) Which is: * 1 * 1/4 + 1 * 1/16 * 0.25 + 0.0625 = 0.3125 Let's tweak that binary fractional number a bit and see what we get. * 1001.0110 = 13.375 What if we need to represent a number between 13.3125 and 13.375? Well, we can't do that. If we really need to handle numbers between those two values, we will need some other encoding. By the way, our eight bit fractional scheme would allow for fractions from 0 to 1/2 + 1/4 + 1/8 + 1/68, which works out to 0.5 + 0.25 + 0.125 +0.0625 = 0.9375. Notice that we still cannot represent all the possible numbers in that range, only those numbers where we can encode them properly. For some problems, that may well be enough, but not in general. Another Scheme ============== Engineers came up with another scheme, years ago to deal with this. Suppose we have a number like 123.4567. This can be written as: * 0.1234567 * 10^3 Using this idea, we can take any fractional number and move the decimal point left or right so that it sits to the left of the most significant digit (the one on the left). We do not need the decimal point, or that leading zero. We save the other digits, as many as we can, then add something that says how to move the decimal point. That part will be either a positive amount, or a negative amount. The "10^3" stuff tells us how to adjust that decimal point to get to the real number. Let's break a number up into several parts: * The sign of the number (positive or negative) * The number of digits we can use for the number (unsigned here) * The number needed to adjust the decimal point. This is called ``scientific notation`` and is often written as +0.1234567E+03, which is the same as 123.4567. In some programming languages, there are really big containers available for floating point numbers, and the number of "significant" digits can be bigger. That gives us more accuracy in the numbers we work with. Engineers like that! One final note. In engineering we may get close to a number we want in a computer, but not be able to store that number exactly. For example, we know 1/3 goes on forever, but basically, we need to stop writing those 3's at some point. We stop when we cannot encode the extra digits in the container we have. Chopping off the rest introduces an error in the number, and as you do math with these numbers error can grow. There is a whole branch of mathematics that studies the effects of this kind of issue. Computer Scientists study that ``Discrete Math`` in learning how to deal with these errors. Storing Text ************ We love writing text (like this lecture). We do not store a bunch of numbers in the computer's memory when we store test there. Well, that is not quite true. What is in the memory is still a bunch of binary numbers, but when we store text, every eight bits represents one letter in our English alphabet. Groups of computer folks get together and set up the codes for text, so everyone can process files containing text. The group that encoded our English characters was called the ``American Standard Committee for Information Interchange``, and the encoding is called ``ASCII``. Here is the encoding they came up with. (Don't worry about this it will not appear on a test!) .. image:: images/ascii-chart.png :align: center There is one problem with only using 8 bits for each symbol. Our English alphabet only has a tad over 100 symbols (look at your keyboard). In some languages, Chinese for example, there are thousands of distinct symbols. SO in the last few years there has been a push to move to an encoding that allows up to 65536 (16 bits) symbols. The encoding is called ``Unicode``, which we will not study in this class. That should do for any language on this planet. I have no idea how many symbols those Klingons have. Encoding Other Things ********************* Basically, anything we want to process in a computer can be handled if we come up with a suitable encoding scheme for that data. We then need to teach the computer how to manipulate those encodings to do the processing we want. While is does not make sense to multiply two letters, it does make sense to provide a way to capitalize letters, or convert them to lower case. Computer folks are always working on schemes like these. It makes programming both challenging, and a lot of fun!