.. _processing-programs: How programs are processed ########################## .. wordcount:: .. vim:ft=rst spell: .. include:: /references.inc .. seealso:: Text: Chapter 1 (section 1.4) In our discussions up to now, we have been talking about programs in a general way. Ultimately, your program is abunch of text tht has to be converted, somehow, into something the processor can use. That usually means your code will end up as a bunch on bits (0's and 1's) stored in the computer's memory along with a bunch of data (more 0's and 1's). That drum we talked about starts up and the dance begins! Well, that is a nice picture, and it is not really that bad a picture. But, it will help if we talk about how programs get into that final form in a bit more detail. This will help you understand what is going on! Writing programs **************** As programmers, we do a lot of writing. However this writing is in a different language. Specifically, it is in a programming language which has its own set of rules. You need to learn the rules. This is no different than learning to write in English, only the language is more precise, and not full of fuzziness like English! Programming languages ===================== Programming languages have their own form. Instead of sentences, we talk about statements, punctuation becomes special characters needed at specific points. Words become identifiers, which come in two forms: those you invent, and those required by the language. We can create other things like numbers and strings as needed. Formally, we call all these rules ``syntax``! Syntax ------ The formal rules describing how you form a correct program define what is legal and what is not. Some of these rules might seem strange, but they are there for a reason. Instead of asking a human to understand the language, we are asking a computer to understand the language. Actually, we are asking a computer program also written in some language (not necessarily the same one we are using) to process your program text and make sure you followed all the ``syntax`` rules. If so, we say your program is ``syntactically`` correct. But what does it mean? Semantics --------- A ``syntactically`` correct program written in a specific programming language has meaning. We call this meaning the ``semantics`` of the program. Exactly what that meaning is is the tough part. You have an idea what you want the computer to do, and you hope the ``semantics`` of your program causes the computer to do what you want. Unfortunately, that is not always what happens. If your understanding of what each ``statement`` causes the machine to do is not quite up to par, you end up having the machine do something that does not seem right. To the computer is IS right, but to you it is totally wrong. Guess who is right! There is an old saying among programmers: The computer never does what I want it to do, only what I tell it to do! Phooey! Our challenge is to improve your understanding of these language things and the computer so your meanings match the program's meanings! Typs of Languages ***************** Languages vary in their level of complexity, some are very close to the machine, some very close to the programmer. Machine language ================ Computers only really know how to process a very specific language taught to them by their parents (OK, by their manufacturer!). That language is called ``machine language`` and it is that bunch of 0's and 1's we could put into memory and get the computer to dance to. Only the computer likes that language, us humans hate all those 0's and 1's. So, humans created a bunch of other languages that help them write programs. Languages come in many different forms, some very simple, and some very complex. Assembly language ================= Since ``machine language`` is pretty hard to deal with, early programmers invented simple codes for each simple operation the computer could do. These codes form a language called ``assembly language``. Humans could remember the codes, but not the 0's and 1's. In order to convert the ``assembly language`` codes into those 0's and 1's they created a tool that could read ``assembly language`` programs and convert then into the correct set of 0's and 1's. They called these tools ``assemblers``, and they were in common use for quite a time. Today, programming in ``assembly language`` is far less common, but it is the closet language to the machine, and gives you the most power you will ever have as a programmer to control the machine. Unfortunately, each machine has a different ``assembly language`` which makes becoming an expert in this kind of programming a bit hard to do. I can write ``assembly language`` programs for anout a dozen different machines, and teach this form of programming in COSC 2425. High-level languages ==================== After some time, programmers discovered that they were building common sequences of a``assembly language`` statements, and started constructing statements that made more sense to them. After several languages were written, they discovered that they needed only three basic kinds of programming constructs: * A ``sequence``, which is just a series of statements processed one after the other * A ``decision`` statement, which asks a question about what is going on * A ``loop`` which does the same thing over and over (usually on different data) The languages developed in this era were called ``high-level`` languages, and they made programmers much more productive. Examples of languages developed in this era include C/C++, Basic, Pascal, Ada, and our soon to be favorite, Python! Even Higher levels ================== Programmers are never satisfied, it seems Today, there are attempts to move up to even more powerful languages. Some of these research efforts are even trying to move into graphical languages where our wishes can be constructed by dragging constructs together in a graphical environment. These languages have not made it into the mainstream of programming, so we will not discuss them further. Instead, let's focus on today's most common high-level languages. Converting programs to machine language *************************************** We will be creating ``files`` containing simple text - characters and punctuation, put together according to the ``syntax`` rules of the language we are using. We need a tool to check our ``syntax`` to make sure it follows the rules. The tool we use does ``syntax analysis`` on our program by reading it character by character, checking all the rules as it goes. The tool might form a different version of our program for internal use later. There are two common types of tools used in processing your program: compilers and interpreters. Compilers ========= A ``compiler`` wants to convert your high-level program into a set of 0's and 1's for the specific machine you are working on. The conversion process is important, because we want your meaning (``semantics``) to be the same once the program has been converted. Writing compilers is fun, but hard to do well. It is interesting to note that some compilers translate programs into ``assembly language`` then use ``assemblers`` to convert that to ``machine language``. Compilers generate the pretty fast final programs. Some ``assembly language`` programmers can beat the compilers, but compilers are pretty smart these days, so it is hard to beat a good modern compiler. Interpreters ============ An ``interpreter`` must also check the ``syntax`` of your program, but it does not convert your program into ``machine language``. Instead, it reads one statement at a time, checks the ``syntax`` and then invokes its own program to cause the machine to do the thing needed according to the ``semantics`` of that statement. This approach has advantages and disadvantages. It does not need to be translated, so interpreted programs can be run quickly to see if they work. However, the same line of code will be checked each time it is processed, so interpreted programs can be pretty slow. For some purposes, this is not really a problem. Python ====== Python (and other languages like Java) take a middle approach. They are a form of interpreter, but in this case, the actual program code is checked once, then converted into a set of instructions called ``byte codes`` that are similar to ``machine language``, but for another program not a real machine. This program can run the ``byte codes`` very quickly, so Python programs can be pretty quick. Advanced Python programmers can further process the ``byte codes`` into real ``machine language`` or even write parts of their programs in another language to speed up critical parts of their programs. A simple example **************** Early in my persona computer days, I bought a computer as a kit and built it myself. The machine was an Imsai 8080 which cost about $400 in 1974. That was a bunch of money back then! The Imsai 8080 had a bunch of simple switches on the front panel and we entered programs into the memory by flipping switches to input 0's and 1's directly. Boy was that fun! Here is my very first program: * 110000110000000000000000 This program was loaded into memory at an address of 0. (Actually, we specified addresses as 16 bits, so the real address was 0000000000000000! What does this program do? ========================== Obviously, this is impossible to figure out from the 0's and 1's directly. So, let's move up a level to ``assembly language``. Here is the same program written in assembly language: .. code-block:: text 0000: jmp 0000 Hmmm, not as many 0's and 1's this time, but we still have a bunch of 0's. What is that ``jmp`` thing? Actually, the codes invented by early programmers seldom were complete words. I have my own theory for why - programmer's hate to type! (So, they leave out letters!) The ``jmp`` actually should be ``jump``. Where to jump is the puzzle. The stuff after that code says where. The four 0's tell the machine to go back to address 0000 for the next instruction. If you think about it that was where the ``jmp`` code was found. So the computer does this one statement over and over - forever! This is called an ``infinite loop``, something we tend to avoid in the PC, but do all the time in small computers embedded in hardware! In the assembly language world, we can let the assembler or the operating system decide where our program will live in memory. So, it is common to name addresses rather than spell them out as a number. In this case, our program looks like this: .. code-block:: text start: jmp start Still pretty simple. High-level code =============== Just so you can see this same program in Python, here is what it would look like: .. code-block:: python while(True): pass .. note:: Do not worry about all this now, we will get to Python soon! Now that you have a bit of understanding about how programs are processed, let's look at the data part of the puzzle, then we will get to code!