..  _processing-programs:

How programs are processed
##########################

..  wordcount::
..  vim:ft=rst spell:
..  include::   /references.inc

..  seealso::

    Text: Chapter 1 (section 1.4)

In our discussions up to now, we have been talking about programs in a general
way. Ultimately, your program is abunch of text tht has to be converted, somehow, into something the processor can use. That usually means your code will end up as a bunch on bits (0's and 1's) stored in the
computer's memory along with a bunch of data (more 0's and 1's). That drum we
talked about starts up and the dance begins!  Well, that is a nice picture, and
it is not really that bad a picture. But, it will help if we talk about how
programs get into that final form in a bit more detail. This will help you
understand what is going on!

Writing programs
****************

As programmers, we do a lot of writing. However this writing is in a different
language. Specifically, it is in a programming language which has its own set
of rules. You need to learn the rules. This is no different than learning to
write in English, only the language is more precise, and not full of fuzziness
like English!

Programming languages
=====================

Programming languages have their own form. Instead of sentences, we talk about
statements, punctuation becomes special characters needed at specific points.
Words become identifiers, which come in two forms: those you invent, and those
required by the language. We can create other things like numbers and strings
as needed. Formally, we call all these rules ``syntax``!

Syntax
------

The formal rules describing how you form a correct program define what is legal
and what is not. Some of these rules might seem strange, but they are there for
a reason. Instead of asking a human to understand the language, we are asking a
computer to understand the language. Actually, we are asking a computer program
also written in some language (not necessarily the same one we are using) to
process your program text and make sure you followed all the ``syntax`` rules.
If so, we say your program is ``syntactically`` correct. But what does it mean?

Semantics
---------

A ``syntactically`` correct program written in a specific programming language
has meaning. We call this meaning the ``semantics`` of the program. Exactly
what that meaning is is the tough part. You have an idea what you want the
computer to do, and you hope the ``semantics`` of your program causes the
computer to do what you want. Unfortunately, that is not always what happens.
If your understanding of what each ``statement`` causes the machine to do is
not quite up to par, you end up having the machine do something that does not
seem right.  To the computer is IS right, but to you it is totally wrong. Guess
who is right!

There is an old saying among programmers:

    The computer never does what I want it to do, only what I tell it to do!
    Phooey!

Our challenge is to improve your understanding of these language things and the
computer so your meanings match the program's meanings!

Typs of Languages
*****************

Languages vary in their level of complexity, some are very close to the
machine, some very close to the programmer.

Machine language
================

Computers only really know how to process a very specific language taught to
them by their parents (OK, by their manufacturer!). That language is called
``machine language`` and it is that bunch of 0's and 1's we could put into
memory and get the computer to dance to. Only the computer likes that
language, us humans hate all those 0's and 1's. So, humans created a bunch of
other languages that help them write programs. Languages come in many different
forms, some very simple, and some very complex. 

Assembly language
=================

Since ``machine language`` is pretty hard to deal with, early programmers
invented simple codes for each simple operation the computer could do. These
codes form a language called ``assembly language``. Humans could remember the
codes, but not the 0's and 1's. In order to convert the ``assembly language``
codes into those 0's and 1's they created a tool that could read ``assembly
language`` programs and convert then into the correct set of 0's and 1's. They
called these tools ``assemblers``, and they were in common use for quite a
time. Today, programming in ``assembly language`` is far less common, but it is
the closet language to the machine, and gives you the most power you will ever
have as a programmer to control the machine. Unfortunately, each machine has a
different ``assembly language`` which makes becoming an expert in this kind of
programming a bit hard to do. I can write ``assembly language`` programs for
anout a dozen different machines, and teach this form of programming in COSC
2425.

High-level languages
====================

After some time, programmers discovered that they were building common
sequences of a``assembly language`` statements, and started constructing
statements that made more sense to them. After several languages were written,
they discovered that they needed only three basic kinds of programming
constructs:

* A ``sequence``, which is just a series of statements processed one after the other
* A ``decision`` statement, which asks a question about what is going on
* A ``loop`` which does the same thing over and over (usually on different data)

The languages developed in this era were called ``high-level`` languages, and
they made programmers much more productive. Examples of languages developed in
this era include C/C++, Basic, Pascal, Ada, and our soon to be favorite, Python!

Even Higher levels
==================

Programmers are never satisfied, it seems Today, there are attempts to move up
to even more powerful languages. Some of these research efforts are even trying
to move into graphical languages where our wishes can be constructed by
dragging constructs together in a graphical environment. These languages have
not made it into the mainstream of programming, so we will not discuss them
further. Instead, let's focus on today's most common high-level languages.

Converting programs to machine language
***************************************

We will be creating ``files`` containing simple text - characters and
punctuation, put together according to the ``syntax`` rules of the language we
are using. We need a tool to check our ``syntax`` to make sure it follows the
rules. The tool we use does ``syntax analysis`` on our program by reading it
character by character, checking all the rules as it goes. The tool might form
a different version of our program for internal use later. There are two common
types of tools used in processing your program: compilers and interpreters.

Compilers
=========

A ``compiler`` wants to convert your high-level program into a set of 0's and
1's for the specific machine you are working on. The conversion process is
important, because we want your meaning (``semantics``) to be the same once the
program has been converted. Writing compilers is fun, but hard to do well. It
is interesting to note that some compilers translate programs into ``assembly
language`` then use ``assemblers`` to convert that to ``machine language``.
Compilers generate the pretty fast final programs. Some ``assembly language``
programmers can beat the compilers, but compilers are pretty smart these days,
so it is hard to beat a good modern compiler.

Interpreters
============

An ``interpreter`` must also check the ``syntax`` of your program, but it does not
convert your program into ``machine language``. Instead, it reads one statement
at a time, checks the ``syntax`` and then invokes its own program to cause the
machine to do the thing needed according to the ``semantics`` of that
statement. This approach has advantages and disadvantages. It does not need to
be translated, so interpreted programs can be run quickly to see if they work.
However, the same line of code will be checked each time it is processed, so
interpreted programs can be pretty slow. For some purposes, this is not really
a problem.

Python
======

Python (and other languages like Java) take a middle approach. They are a form
of interpreter, but in this case, the actual program code is checked once, then
converted into a set of instructions called ``byte codes`` that are similar to
``machine language``, but for another program not a real machine. This program
can run the ``byte codes`` very quickly, so Python programs can be pretty
quick. Advanced Python programmers can further process the ``byte codes`` into
real ``machine language`` or even write parts of their programs in another
language to speed up critical parts of their programs.

A simple example
****************

Early in my persona computer days, I bought a computer as a kit and built it
myself. The machine was an Imsai 8080 which cost about $400 in 1974. That was a
bunch of money back then!

The Imsai 8080 had a bunch of simple switches on the front panel and we entered
programs into the memory by flipping switches to input 0's and 1's directly.
Boy was that fun!

Here is my very first program:

* 110000110000000000000000

This program was loaded into memory at an address of 0. (Actually, we specified
addresses as 16 bits, so the real address was 0000000000000000!

What does this program do?
==========================

Obviously, this is impossible to figure out from the 0's and 1's directly. So,
let's move up a level to ``assembly language``.

Here is the same program written in assembly language:

..  code-block:: text

    0000:   jmp     0000

Hmmm, not as many 0's and 1's this time, but we still have a bunch of 0's. What
is that ``jmp`` thing?

Actually, the codes invented by early programmers seldom were complete words. I
have my own theory for why - programmer's hate to type! (So, they leave out
letters!)

The ``jmp`` actually should be ``jump``. Where to jump is the puzzle. The stuff
after that code says where. The four 0's tell the machine to go back to address
0000 for the next instruction. If you think about it that was where the ``jmp``
code was found. So the computer does this one statement over and over -
forever! This is called an ``infinite loop``, something we tend to avoid in the
PC, but do all the time in small computers embedded in hardware!

In the assembly language world, we can let the assembler or the operating
system decide where our program will live in memory. So, it is common to name
addresses rather than spell them out as a number. In this case, our program
looks like this:

..  code-block:: text

    start:  jmp start

Still pretty simple.

High-level code
===============

Just so you can see this same program in Python, here is what it would look like:

..  code-block:: python

    while(True): pass

..  note::

    Do not worry about all this now, we will get to Python soon!

Now that you have a bit of understanding about how programs are processed,
let's look at the data part of the puzzle, then we will get to code!