Zeros and Ones
##############

..  include::   /header.inc

Surely, you have heard this:

::

    It is all zeros and ones inside of the machine!

Really?

Let's see!

C++ Integers
************

Here is a some code, not too much past "Hello, World!":

..  literalinclude::    code/ex01/main.cpp
    :linenos:

This simple code creates an array of integers, and displays them. Without
knowing anything else, we know that at runtime, this creates a chunk of space
in the system's memory where this array is stored. Those integer values are
stored in those locations until we fetch them for output.

Let's run this code:

..  command-output:: g++ -o test main.cpp
    :cwd: code/ex01

..  command-output:: ./test
    :cwd: code/ex01

Looks good (if somewhat boring), but we have a start.

Locating Data
=============

What if you really want to know where this data lives in your memory. Can we
figure that out?

Sure, we are programmers, we can figure things out.

C++ provides a neat operator: ``&``, which will tell you the address of something.

Try this:

..  literalinclude::    code/ex02/main.cpp
    :linenos:

..  command-output:: g++ -o test main.cpp
    :cwd: code/ex02

..  command-output:: ./test
    :cwd: code/ex02

..  note::

    You should look up the ``iomanip`` library, it provides a lot of handy
    tools for making you output look nicer.

You see the addresses of each data item here. The notation is called
``hexadecimal``, something we will get soon. That big pile of funny characters
is the actual address in your machine where this data item was stored. (Well,
actually, that is a tiny bit of a lie!, But we will assume is is the real
address for now!) The actual number on your machine might be different.

Addresses in the modern Pentium are actually 64-bit things, and those
hexadecimal strings are not quite right, but they give us an idea where the
data is located. If only we can figure out this hexadecimal stuff!

Show Data in Binary
===================

Let's get rid of weird notation, and see the real zeros and ones! To do this in
C++, we need some help from another tool available in the C++ ``Standard Template
Library``.

..  warning::

    What you will see here involves something called a ``template`` in C++.
    That is an advanced concept, but you do need to see it, and learn to use
    it. For us, just follow the example as best as you can, It is not too hard:

..  literalinclude::    code/ex03/main.cpp
    :linenos:

Phew! That funny line (8) is doing nothing but converting our data item
(``data[i]`` into a 16-bit binary thing that we can send to the ``cout``
method. The conversion creates a new data item named ``bin`` with interesting
properties we will look at later. For now, this lets us see exactly what the
data container is storing, this time looking at all the zeros and ones!

Here is the output:

..  command-output:: g++ -o test main.cpp
    :cwd: code/ex03

..  note::

    I am going to omit the compile step in the future examples, they are all
    exactly the same.

..  command-output:: ./test
    :cwd: code/ex03

Now we see the ``binary`` representation of our ``short`` C++ data values.

Now, I can see all those zeros and ones! Neat!

Once we can see the binary representation of a number, we can learn a bit more
about the this weird binary world:

Counting in Binary
******************

You know how to count, right? All you do is add one to each number. 

..  note:: 

    How do you count in a floating point world? That does not make sense.
    Counting has to do with whole objects, not parts of whole objects.

Integers are about whole things! So add one to count. (Wait, what about
subtracting? We will deal with that in due time!

..  literalinclude::    code/ex04/main.cpp
    :linenos:

..  command-output:: ./test
    :cwd: code/ex04

Patterns
========

Counting in binary is pretty easy. The pattern you see here is worth a closer
look. 

Binary is all about numbers where the only "digits" we have are zero and one. In
our human "decimal" world, you know about the number "columns":

    * The ones column

    * The tens column

    * The hundreds column

When we write a number like ``123`` down, what we really mean is this:

    * 3 ones

    * 2 tens

    * 1 hundred

All of those numbers are powers of 10. 

..  math::

    10^{0} = 1

    10^{1} = 10

    10^{2} = 100

Base of a Number System
-----------------------

We have ten symbols, which we call ``digits``, available in our `decimal`
system.

The number of symbols available in that system is called the ``base`` of that
system

In `binary` we restrict the number of symbols to two, and we choose to use zero
and one, not for any special reason, other than we are most comfortable with
those.

In fact, the symbols themselves are not important. It is how we use them that
is important. 

If I give you a symbol, and ask what comes before that one, or after that one,
you know the answer. Well, maybe you need to know a bit more, but you get the
idea.

Running Out of Bits
===================

What happens when we run out of bits to store something in? Well, we are stuck.
A fixed size container can only hold a fixed number of bits. Keep counting in a
16-bit container, and you run out at 65535. Why?

Every column represents a power of two, Sixteen bits means that the last column
is the :math:`2^{15}` column. (Why not 16? (We started at zero!)

Add up all those powers of two and you get 65535. We can represent a total of
:math:`2^{16}` distinct whole numbers in 16 bits! (0-65535).

If you need a bigger number, you need more bits!

Binary Counting Pattern
=======================

Look at the series of binary numbers above. The "ones column", which in binary
is two raised to the zero power, alternates every count. We run out of symbols
quickly in binary, and "carry" a one into the next column.

The "twos column" (two raised to the first power) alternates every two counts.

The "fours column" (two raised to the second power) alternates every four
counts.

And so on. Each "column" is another power of our base, two in this case.

Once you have that pattern down, counting in binary is easy!

Hexadecimal
===========

Now, writing down a 64-bit number is going to take a lot of zeros and ones
(duh!). Being lazy, programmers came up with a shorter way to represent them.

Suppose we can come up with a symbol system that has 16 symbols (yikes!) We
could use one symbol for every four bits, and shorten this up a bunch. Here is
the coding scheme:

..  csv-table::
    :header: decimal,  Binary,  HEX

    0      ,  0000  , 0
    1      ,  0001  , 1
    2      ,  0010  , 2
    3      ,  0011  , 3
    4      ,  0100  , 4
    5      ,  0101  , 5
    6      ,  0110  , 6
    7      ,  0111  , 7
    8      ,  1000  , 8
    9      ,  1001  , 9
    10     ,  1010  , A
    11     ,  1011  , B
    12     ,  1100  , C
    13     ,  1101  , D
    14     ,  1110  , E
    15     ,  1111  , F


Now, that 64 bit number is down to 16 characters.

Negativity!
***********

What happens if we count backward, subtracting one each time?

That is not so bad, except for one thing. How do we "borrow" from the next column?

As long as there is a one there, we can do that. Actually, you borrow the same
way you did when you learned subtraction in your early days!

But, go back and look at the binary representation of minus five! That one
looked very strange!

Twos Complement
===============

We humans are kind of picky. If we add minus five to positive five, we want the
result to be zero! No exceptions!

How are we going to represent a negative number, using only bits?

One idea is to steal one of the bits and make it represent the sign of the number.

    * if 0000000000000101 is five
    
    * then  1000000000000101 is minus five

There is a problem with this! Try adding them together. 

..  warning::

    Remember, this is binary, so when you get a two. you write down a zero, and
    "carry" a one into the next column. If you get a three, write down a one,
    and carry a one into the next colun. You know how this works!

..  code-block:: text

    0000000000000101
    1000000000000101
    ================
    1000000000000110

Boy, that does not look anything like zero.

Math folks pondered this a while, and came up with this idea:

To convert any positive number into the negative of that number, do this:

    * Complement the number

    * Add one

No complement does not mean to say "you are a fine number". Instead is says
flip every zero into a one, and every one into a zero. Weird, but watch what
happens:

    * +5         = 0000000000000101

    * Complement = 1111111111111010

    * Add one    = 1111111111111011

This looks a bit odd. But lets try adding them together:

..  code-block:: text

    0000000000000101
    1111111111111011
    ================
   10000000000000000

Look closely, there is a 17th bit showing there. It is a one! But, since the
result must be squashed into a 16-bit container, we just put the lower 16 bit
in, and toss that extra bit. 

Hey, we end up with zero!

As odd as this seems, this is exactly how we store negative whole numbers in
the computer. In ``Twos Complement`` form!

Obviously, with more bits, the machine can store bigger numbers!

Floating-Point Numbers
**********************

Now that we have a handle on integer numbers, let's look at how we store a
number with a decimal point.

The idea starts off with a simple thought.

What happens when we use that decimal point thingy?

Each digit we place to the right of the decimal point represents the *base*
raised to a negative power:

    * 1.25 

    * one times :math:`10^{0}`
    
    * plus two times :math:`10^{-1}`

    * plus five times :math:`10^{-2}`

If we switch to binary, we get this:

    * 1.01

    * one times :math:`2^{0}`
    * zero times :math:`2^{-1}`
    * one times :math:`2^{-2}`

Hey, those numbers are the same!

    * (1 * 1) + (0 * 0.5) + (1 * 0.25) = 1.25!

We could stay with this idea and do fairly well with simple floating-point numbers,
but we run into problems quickly.  Some folks, like engineers and physicists
want to use numbers with an insane number of digits before or after that
decimal point. Like 0.00000000000123!

Real, human, floating point numbers are struggling to represent something that
has an infinity of values. Unlike our simple integers, there are an infinite
number of values between any two other values. That gets messy very fast. 

If we cannot write a number down, how can we possibly store it in our computer
exactly? The answer is we can't, but we can get close1

We need a way to represent these numbers well enough to be able to use them.
Those pure math types will just have to learn to deal with that if they want to
use our computers!

Folks came up with a different way to express a floating point number to deal
with this:

    * 123450000000000.0 = 1.2345 * :math:`10^{14}`

Let's write this this way:

    ( 1.2345E14

The ``E`` means "exponent". The number after the ``E`` is a number, which can
be negative). That number tells you how to slide the decimal point into the
right position. ``E14`` says slide it 14 columns to the right , ``E-14)`` would
say slide it 14 columns to the left!


    * 1.2345E-14 = 0.000000000000012345

This is called "Scientific Notation".

Truncating Numbers
==================

When we deal with floating-point, we accept the fact that the right-most digits
are not so important, compared to the left-most digits. If we have a big
number, and we lost that right-most digit, we still have a pretty big number,
close to the original number, but off by some tiny amount.

In engineering and science, we accept the fact that some calculations are not
going to be exact.  Still, we want them to be as close to exact as we can make
them.

A team of folks, all of whom belong to the IEEE, got together, and came up with
this idea for representing floating-point numbers in computers. Their work is
now known as `IEEE Standard 754
<https://en.wikipedia.org/wiki/IEEE_754#Representation_and_encoding_in_memory>`_

Basic Representation
====================

Take a fixed size container, say 32-bits big. Cut it into two basic parts, one
24-bits big, and the other 8-bits big.

We will encode the "significant digits", the ones we are keeping in our
"Scientific Notation" - after truncating, in the 24-bit part. Since that
leftmost bit should always be a one (to be considered significant), we will not
store that bit. Instead, we will assume there is another bit in the number, and
it is a one. Hey, every little bit helps!

The entire number is either positive or negative. Because of this, they stole
one bit from the big part and called that one the "sign bit". The rest of the
big chunk is called the "mantissa", and the small part is called the "exponent"

Naming the parts
----------------

The "exponent" is a whole number, so they simply took those eight bits and
encoded it as an 8-but twos complement number. That give us the ability to move
the decimal about 128 columns left or right. Should be fine for calculating the
interest on your bank balance.

We Need More Bits!
------------------

But what if 32-bits is not be enough. The answer is simple, use more bits! We
will not go there, this is enough for an introduction.

Here is a diagram showing the IEEE layout for a 32-bit "single precision"
floating point number:

..  image:: images/IEEE754.png
    :align: center

Encoding Characters
*******************

We only have a finite number of "symbols" in our English character set. It
should be pretty easy to come up with a way to assign a code for each
character. That was exactly the mission of another group. They came up with the
"American Standard Code for Information Interchange", ``ASCII`` for short. Here
is their encoding:

..  image:: images/ascii-chart.png
    :align: center

Look at this encoding, it has nice properties. 

All of the digits are in order. Whatever the code is for a five, add one to
that code and you have the code for a six.

The code for a "B" is one more than the code for a "A". Add 32 to the code for
"A", and you get the code for "a". We can use that to capitalize something,
or sort things according to the order of characters in our alphabet.

Actually, the codes below the code for a space character, are called "control
codes". They are not "printable". They were used to control printing devices
in the "old days". Today, the only ones we tend to deal with are "CR" (carriage
return) and "LF" (line feed). Most of you do not even know where those terms
came from!

How Big are ACSII Codes?
========================

ASCII is a set of 128 symbols. That takes seven bits to store. Since seven is
an UGLY number, we use 8! That means each character takes one byte. 

We could do something with that extra bit. In fact, IBM did just that, creating
a bunch of symbols we could use to build forms on the screen. Remember, they
wanted to sell machines to accountants!

They managed to fill up all 256 slots available in a chunk of memory,
each chunk addressed using an 8-bit binary number.

Unicode
=======

We will not explore this here, but you should be aware that there is a push to
stop using ASCII in our programs, and switch to Unicode.

Why!

How many characters are there in the Chinese "alphabet"? I put that word in
quotes, because in Chinese, the symbols may represent an entire sentence in
English. A book I bought in Shanghai told me I had to learn 4000 symbols to
read a Chinese newspaper. (I never finished that book).  4000 is too big to fit
in 8-bits. But it will fit in 16-bits.

Unicode was invented to deal with all human languages, even Chinese! There are
codes (and graphical symbols) available for most of the languages we are aware
of, even Klingon!

I will let you explore that on your own.

Encoding other Things
*********************

We humans are good at encoding things. We have come up with ways to encode
sounds, colors, phases of the moon. Just abut anything you might need to
process in a computer can be encoded somehow, then stored as zeros and ones
inside of the machine

Your real challenge is to come up with code that makes those encoded gadgets
work the way you want!

..  vim:ft=rst spell: