C Program Tear-Down

Read time: 79 minutes (19781 words)

../../_images/Edsger_Wybe_Dijkstra.jpg
"The programmer must understand the computer, because the computer will not
understand the programmer" -- E. W. Dijkstra

You should know this man’s name:

Back when you first started learning about programming, you should have been taught that there are three fundamental structured forms you had to learn. They were the Sequence, the Decision, and the Loop. Blame Edsger for that. In 1968, Edsger wrote this now famous letter to the editor of the Association of Computing Machinery Journal:

In this letter, Dijkstra proposed eliminating the goto statement, which was an essential part of programming up until then. He had another idea. Dijkstra coined the phrase Structured Programming as a replacement for this “harmful” statement, and his paper caused quite a storm in the world of software development. He also is considered one of the guiding forces who moved Software Engineering into the mainstream of professions.

With apologies to his memory, I am going to violate the cardinal rule of modern programming, and restore the goto statement - but only for one specific reason.

Warning

Never admit that you have used this statement in any gathering of programming professionals. They may lock you in a closet and throw away the key!

Simple C Program

Let’s put together a simple program that incorporates all three basic structures. I am going to use C (not C++) since we do not need the baggage of modern object-oriented programming.

What should the program do? I know! Let’s add up a bunch of numbers. Just to add a twist, I am only going to add up every other number in a list of numbers.

Here is the code I came up with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>

unsigned int data[] = {
    5, 3, 7, 10, 42, 6, 22, 15, 32
};

int cnt;
int sum;
unsigned int odd = 0;

int main( void ) {
    cnt = 0;
    while( cnt < 9 ) {
        if( odd )        
            sum += data[ cnt ];
        odd = !odd;
        cnt++;
        printf( "%d %d\n", cnt, sum );
    }
}

Now, this is not the finest example of a C program, but it does have all three basic structures. I chose to use a While Loop since that is probably the first loop you were introduced to. It also uses Global Variables, something we are taught to avoid. We are using those to for a simple reason. All the code can access all those variables, something closer to what goes on in the machine. We will learn about that later!

Other than that, this is pretty simple stuff. You should have no problem getting this running:

$ gcc -o sum1 sum1.c
$ ./sum1
1 0
2 3
3 3
4 13
5 13
6 19
7 19
8 34
9 34

Note

Every version of this program that we create in this tear-down should produce exactly this same output. I will not repeat what you see above, but you should run the code and prove that for yourself!

Our Goal

What I want to do in this lecture is exactly what a compiler does to your program code. I want to convert it into a form the modern Pentium processor can understand. To accomplish this, I am going to use a real standard C compiler, and I will not alter the simple fact that we will be looking at is a fully standard compliant C program. The program is going to morph into something that does not look the same, but it is really is still a C program.

My goal is to convert exactly this program to one that looks so much like Pentium assembly language that the final conversion to a real Pentium assembly language file is trivial!

Note

See what happens when you wake up a 3am with an idea in your head? The first draft of this idea was done by 4am! I really need more sleep!

Objective-C

Back in the late 1980s, a computer scientist named Brad Cox came up with an idea on how to add object orientation to the C programming language. He used the C preprocessor to add new features to the language, that would be transformed into conventional C code by that preprocessor. This was a pretty cool idea, and I visited Brad at his company home in Sandy Hook Connecticut shortly after he founded the company to release his product, now known as Objective-C. That language, like one I was involved in designing: Ada, had the misfortune of coming out at the same time as C++. You know the result, which one are we using today? Still, Brad’s ideas stuck with me, and led to this lecture!

The Processor Does Not Understand Structures

To do our “morphing” work, we need to view the program from the processor’s point of view. Can the processor make sense of this pile of code?

The first problem the processor has with this code is that there is a lot of stuff going on that it has no idea how to handle. Take the while line for example. There is an expression inside some parentheses that needs to be evaluated before we can even decide if we want to loop or not. Then there is the issue of exactly where is the loop body. You see curly braces, the processor has no clue what they mean. You know they surround that loop body! We need to tell the processor where the loop starts and ends.

Based on our training, we know what will happen. We will evaluate that logical expression inside the parentheses, see if it produces a true or false and then either branch around the loop body, or drop into the loop. When we get to the end of the loop body, we will branch back up and evaluate that logical expression all over.

So, in designing the chip, the Pentium architects had to concoct instructions that would cause the processor to do what you think it should do when processing this code. (It really is an illusion - there is no while in the chip!)

Branching

Branching is absolutely vital to being able to create our structures. Well, maybe not the Sequence structure. That one is too simple!

We need two kinds of branching statements, one conditional and one absolute. We will use a conditional branch at the top of the loop, and an absolute branch at the bottom. The conditional one will let us skip over the loop body, if needed. The unconditional one will take us from the bottom of the loop body back to the conditional test at the top.

Sorry, Edsger!

I am going to be forced to use a GOTO statement now. Yes, even though you are NEVER supposed to use it, it is still there. There are rules on exactly where you can use this statement, but basically they make sense. We will add a Label to some line in our code, and direct the processor to GOTO that label. If we need a conditional branch, we will use an if statement to control the goto statement.

Note

I am not going to teach you exactly how to use this statement. They would drum me out of teaching if I did that. Instead, I will show you what works, and leave it at that!

Let’s modify the program so it does this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11

int main( void ) {
        cnt = 0;
label1: if( cnt >= 9 ) goto label2;
        if( odd )        
            sum += data[ cnt ];
        odd = !odd;
        cnt++;
        printf( "%d %d\n", cnt, sum );
        goto label1;
label2: printf( "%d %d\n", cnt, sum );

Do you see how this is set up. We evaluate the logical expression and use that to decide if we branch around the loop body. We have two goto statements here, the absolute goto at the bottom of the loop body takes us back to the place where we evaluate that expression again.

We managed to get rid of those ugly curly braces, and the while statement and it still works as before.

Unfortunately, we got rid of one structure, but we used another structure in its place. Now, we have two if statements. Hey, we are making progress here!

Simplifying the Logical Expression

Once again, we see a statement with some inner work to do. The first if statement has a complex (!) expression to evaluate before it can decide what to do. Let’s pull that expression out of the statement and evaluate it first. (This will cause us to introduce a new variable to store the result. I know, let’s call it flag:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
int cnt;
int sum;
int flag;
unsigned int odd = 0;

int main( void ) {
        cnt = 0;
label1: flag = ( cnt == 9 );
        if( flag ) goto label2;
        if( odd )        
            sum += data[ cnt ];
        odd = !odd;
        cnt++;
        printf( "%d %d\n", cnt, sum );
        goto label1;
label2: printf( "%d %d\n", cnt, sum );
}

Now, our conditional jump is much simpler. Let’s do the same thing with the second if statement:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int main( void ) {
        cnt = 0;
label1: flag = ( cnt == 9 );
        if( flag ) goto label2;
        if( !odd ) goto label3;
        sum += data[ cnt ];
label3: odd = !odd;
        cnt++;
        printf( "%d %d\n", cnt, sum );
        goto label1;
label2: printf( "%d %d\n", cnt, sum );
}

Nothing Up My Sleeve!

Now for some trickery! We will use a feature of the C preprocessor called Macros to change this code and make it look more like assembly language. One step at a time.

The #define Macro

A Macro is a text substitution trick. You may have seen something like this in C code:

#define MAX 100
...
if(count < MAX) ...

It is important that you realize that MAX is not a “variable”, here. It is a name associated with a pile of characters. The macro associates the name MAX with all of the text beginning with the first non-blank character after the name and continuing all the way to the end of the line.

Warning

Putting extra spaces after the text you want can lead to problems later, watch that. You canot see them in your code, but they might be there!

Here the Preprocessor, which actually reads your code first, will convert any occurrences of MAX to 100 (by substituting that text following the name). When the compiler finally gets the output from the Preprocesser* the name MAX is gone from your code! We will create two macros to hide part of our current code, but when the smoke clears, the compiler will still see exactly what you see now!

Converting to Pentium Instructions

Here are our first macros, placed in a new file we will include named macros.h:

#define CMP( d1, d2 )       flag = ( d1 == d2 )
#define JZ( label )         if( flag ) goto label
#define JMP( lab )          goto lab

Note

Notice that the macro requires the use of parentheses to mark where parameters can be used when you use the macro name. In real assembly language, we will not see parentheses. The actual Pentium line will just read “CMP d1, d2”. That is as close as I can get in this morphing experiment.

Now, we can get rid of those two logical expressions and replace them with something that looks like a Pentium CMP instruction. The CMP is just a code for “compare”. We can also replace the conditional branches with something that looks like a JZ instruction, and the absolute branch with something that looks like a JMP instruction. Again JMP is a code for “absolute jump”, and JZ is a code for “conditional jump” (jump if the result of that last comparison produced a zero result):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int main( void ) {
        cnt = 0;
label1: CMP( cnt, 9 );
        JZ( label2 );
        CMP( odd, 0 );
        JZ( label3 );;
        sum += data[ cnt ];
label3: odd = !odd;
        cnt++;
        printf( "%d %d\n", cnt, sum );
        JMP( label1 );
label2: printf( "%d %d\n", cnt, sum );
}

Now that is nice! We are finally starting to see something more like assembly code in this file!

Note

If you really want to prove that the compiler still sees the same code, you can get gcc to show you what the preprocessor produced by doing this:

$ gcc -E sum.c > sum.preprocessed

Warning

If this is the first time you have ever looked at the output of the Preprocessor be warned, it is UGLY!

Here is a seriously abbreviated chunk of the output (after around 450 lines of junk!):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
unsigned int data[] = {
    5, 3, 7, 10, 42, 6, 22, 15, 32
};

int cnt;
int sum;
int flag;
unsigned int odd = 0;

int main( void ) {
        cnt = 0;
label1: flag = ( cnt == 9 );
        if( flag ) goto label2;
        flag = ( odd == 0 );
        if( flag ) goto label3;;
        sum += data[ cnt ];
label3: odd = !odd;
        cnt++;
        printf( "%d %d\n", cnt, sum );
        goto label1;
label2: printf( "%d %d\n", cnt, sum );
}

Incrementing and Absolute Branching

Let’s get rid of the assignment that initializes cnt. We can also deal with the simple not operation, and the increment line:

#define MOV( a,b )          a = b;
#define NOT( val )          val = !val
#define INC( val )          val++

With these, our code becomes this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int main( void ) {
        MOV( cnt,0 );
label1: CMP( cnt, 9 );
        JZ( label2 );
        CMP( odd, 0 );
        JZ( label3 );;
        sum += data[ cnt ];
label3: NOT( odd );
        INC( cnt );
        printf( "%d %d\n", cnt, sum );
        JMP( label1 );
label2: printf( "%d %d\n", cnt, sum );
}

Wow!, only a few more lines to go!

Declaring variables

In this next step, let’s change the data setup area. We define a few uninitialized variables there, and they need to change:

..warning:

In the rest of this note, I will show names for things that come from the
Pentium instruction set, something we will look at later in the course. For
now, the names simply do what the original code (shown in the macro
definition) said to do.

Here are our new macros:

#define DD( n, v)           int n = v
#define DB(n, v)            char *n = v
#define RESD( n )           int n

Here, DD stands for “define double”, a 32-bit container. DB stands for “define byte”, an 8-bit container. Both of these provide an initial value for the container. RESD reserves a 32-bit uninitialized container

And here is the new code:

1
2
3
4
        RESD( cnt );
        RESD( sum );
        RESD( flag );
        DD( odd, 0 );

Eliminating the outer “main” code

You should know that the main function is required in every C/C++ program. That function marks the entry point for any C/C++ program. Since it is a Function it must be called by something to start it up. That something is actually the operating system (on your behalf, after you cause the program to be run somehow). At the end of that function, we need to return control to the operating system. We can do this by marking the entry point with a proper name, and ending the function with a RET instruction. (Guess where that name came from! Lazy programmers!)

Here is our new code:

#define MAIN                int main(void) {
#define RET                 }

And now, the modified program code:

#include <stdio.h>
#include "macros.h"

unsigned int data[] = {
    5, 3, 7, 10, 42, 6, 22, 15, 32
};

        RESD( cnt );
        RESD( sum );
        RESD( flag );
        DD( odd, 0 );

MAIN
        MOV( cnt,0 );
label1: CMP( cnt, 9 );
        JZ( label2 );
        CMP( odd, 0 );
        JZ( label3 );;
        sum += data[ cnt ];
label3: NOT( odd );
        INC( cnt );
        printf( "%d %d\n", cnt, sum );
        JMP( label1 );
label2: printf( "%d %d\n", cnt, sum );
        RET

For now, a function is just a block of code with a name attached to it. We use a label to mark the first statement in the block, and give it the name. We end the function with a RET, which is what we now see in our code!

Arrays

We next need to consider dealing with that array! You should know that, basically, an array is just a sequence of data blocks, all the same size, stored in sequential memory locations. In C (and (C++) we access an array using an index variable of some sort. We are cheating and using our loop counter(cnt) for this access. How will the processor deal with that?

Well, the processor knows nothing about arrays, but it can use a container holding an address (does the term pointer pop into your mind) and use that to access some place in memory. We modify that index and we access another element in the array. We run into a fundamental problem here. You might want your indexes to increment by one in your code, but the actual addresses must increment by four! Why?

Each integer in C takes four bytes to store, and addresses are associated with each byte of storage!) When you code in C and use pointers, the compiler knows how big the item is that you are pointing to with a pointer. So, adding one to the pointer is internally converted to adding four (the size of an int).

Warning

This may make our head hurt!

Here is a change to our code that uses pointers to access the elements in the array:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <stdio.h>
#include "macros.h"

unsigned int data[] = {
    5, 3, 7, 10, 42, 6, 22, 15, 32
};

unsigned int *dptr = &data[0];

        RESD( cnt );
        RESD( sum );
        RESD( flag );
        DD( odd, 0 );
MAIN
        MOV( cnt,0 );
label1: CMP( cnt, 9 );
        JZ( label2 );
        CMP( odd, 0 );
        JZ( label3 );;
        sum += *(dptr + cnt);
label3: NOT( odd );
        INC( cnt );
        printf( "%d %d\n", cnt, sum );
        JMP( label1 );
label2: printf( "%d %d\n", cnt, sum );
        RET

If you are weak on pointers, do a little research before working this one out. It still produces the same output. (Phew!)

Eliminating the Pointer Notation

All of that funky pointer notation needs to go away. Pointers are just address holding variables, and the processor knows what to do with addresses. We will see exactly what that code looks like later, but for now, let’s create two more macros to get rid of the array declaration, and the associated pointer declaration:

#define DD_(lab, ...)       unsigned int lab [] = { __VA_ARGS__ }
#define DDp_( lab, val)     unsigned int *lab = &val[0]

Warning

There is some tricky macro work going on here. Look into it if you like, we will not explore all that now.

And here is the new code using these:

        DD_(data, 5, 3, 7, 10, 42, 6, 22, 15, 32 );
        DDp_( dptr,data );

Now, the Hard Part

We are down to three lines to convert Two are identical function calls, and the last an ugly assignment statement. Both of these forms need to become more than one line of assembly code, so we need to rearrange those lines a bit. To simplify the output (in the assembly version), I am going to pull them out of the main code and build a function that does the same thing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <stdio.h>
#include "macros.h"

        DD_(data, 5, 3, 7, 10, 42, 6, 22, 15, 32 );
        DDp_( dptr,data );

        RESD( cnt );
        RESD( sum );
        RESD( flag );
        DD( odd, 0 );

void display() {
        printf("%d", cnt);
        printf(" ");
        printf("%d", sum);
        printf("\n");
}

MAIN
        MOV( cnt,0 );
label1: CMP( cnt, 9 );
        JZ( label2 );
        CMP( odd, 0 );
        JZ( label3 );;
        sum += *(dptr + cnt);
label3: NOT( odd );
        INC( cnt );
        display();
        JMP( label1 );
label2: display();
        RET

Now, we can add another macro to replace those calls to display. We can also introduce a couple of simple fake macros to mimic telling the assembler what section of code we are working through:

#define DDps( lab)          char *lab
#define CALL( lab )         lab();
#define CALLp0( a )         a( fmt )
#define CALLp1( a )         a( fmt, pdata)
#define DISPLAY             void display() {
#define ADD( a, b)          a += *b
#define SEG( a )            // segment a

And our new code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <stdio.h>
#include "macros.h"

        SEG(.data)

        DD_(data, 5, 3, 7, 10, 42, 6, 22, 15, 32 );
        DDp_( dptr,data );

        RESD( cnt );
        RESD( sum );
        RESD( flag );
        DD( odd, 0 );

        DB( msg1,"%d" );
        DB( msg2," " );
        DB( msg3,"\n" );
        DDps( fmt );
        DD( pdata,0 );

        SEG(.text)

DISPLAY
        MOV( fmt,msg1 );
        MOV( pdata,cnt )
        CALLp1( printf );
        MOV( fmt,msg2 );
        CALLp0( printf );
        MOV( fmt,msg1 );
        MOV( pdata, sum );
        CALLp1( printf );
        MOV( fmt,msg3 );
        CALLp0( printf );
        RET

MAIN
        MOV( cnt,0 );
label1: CMP( cnt, 9 );
        JZ( label2 );
        CMP( odd, 0 );
        JZ( label3 );;
        ADD( sum, (dptr + cnt) );
label3: NOT( odd );
        INC( cnt );
        CALL( display );
        JMP( label1 );
label2: CALL( display );
        RET

Warning

There will be warning about the format string not being a string literal. You can ignore that here.

Working through the DISPLAY routine is fairly straightforward, but a bit tedious. Basically, when we code in assembly language, each step we take is pretty small. This code is not ideal, but it should work. The three different versions of CALL handle different kinds of parameters. This is not ideal, but it works for now.

That funny line replacing the assignment to sum is close to a form the Pentium supports. We will study that later.

Wrapping Up the Translation

Here is the final macro file:

macros.h

#define CMP( d1, d2 )       flag = ( d1 == d2 )
#define JZ( label )         if( flag ) goto label
#define JMP( lab )          goto lab

#define MOV( a,b )          a = b;
#define NOT( val )          val = !val
#define INC( val )          val++

#define DD( n, v)           int n = v
#define DB(n, v)            char *n = v
#define RESD( n )           int n

#define MAIN                int main(void) {
#define RET                 }

#define DD_(lab, ...)       unsigned int lab [] = { __VA_ARGS__ }
#define DDp_( lab, val)     unsigned int *lab = &val[0]

#define DDps( lab)          char *lab
#define CALL( lab )         lab();
#define CALLp0( a )         a( fmt )
#define CALLp1( a )         a( fmt, pdata)
#define DISPLAY             void display() {
#define ADD( a, b)          a += *b
#define SEG( a )            // segment a

And the final C file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <stdio.h>
#include "macros.h"

        SEG         ( .data )

        DD_         (data,   5, 3, 7, 10, 42, 6, 22, 15, 32 );
        DDp_        ( dptr,data );
        DD          ( odd, 0 );

        DB          ( msg1,"%d" );
        DB          ( msg2," " );
        DB          ( msg3,"\n" );
        DDps        ( fmt );
        DD          ( pdata,0 );

        SEG         ( .bss )

        RESD        ( cnt );
        RESD        ( sum );
        RESD        ( flag );
        SEG         ( .text )

DISPLAY
        MOV         ( fmt,msg1 );
        MOV         ( pdata,cnt )
        CALLp1      ( printf );
        MOV         ( fmt,msg2 );
        CALLp0      ( printf );
        MOV         ( fmt,msg1 );
        MOV         ( pdata, sum );
        CALLp1      ( printf );
        MOV         ( fmt,msg3 );
        CALLp0      ( printf );
        RET

MAIN
        MOV         ( cnt,0 );
label1: CMP         ( cnt, 9 );
        JZ          ( label2 );
        CMP         ( odd, 0 );
        JZ          ( label3 );;
        ADD         ( sum, (dptr + cnt) );
label3: NOT         ( odd );
        INC         ( cnt );
        CALL        ( display );
        JMP         ( label1 );
label2: CALL        ( display );
        RET

It is nice that C is pretty loose about pasting white space in the code. I moved all of the “operands” over to where they are supposed to be in assembly language! Assembly language “style” is very different from proper C style!

This sure looks like assembly language to me! Not a trace of C code to be seen. (Well, discounting those header lines at the top

What the compiler saw

Stripped of the preprocessor clutter and cleaned up a bit, this is the final code the compiler saw

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
unsigned int data [] = { 5,3,7,10,42,6,22, 15, 32 };
unsigned int *dptr = &data[0];
int odd = 0;

char *msg1 = "%d";
char *msg2 = " ";
char *msg3 = "\n";
char *fmt;
int pdata = 0;

int cnt;
int sum;
int flag;

void display() {
    fmt = msg1;;
    pdata = cnt;
    printf( fmt, pdata);
    fmt = msg2;;
    printf( fmt );
    fmt = msg1;;
    pdata = sum;;
    printf( fmt, pdata);
    fmt = msg3;;
    printf( fmt );
}

int main(void) {
    cnt = 0;;
label1: flag = ( cnt == 9 );
    if( flag ) goto label2;
        flag = ( odd == 0 );
    if( flag ) goto label3;;
        sum += *(dptr + cnt);
label3: odd = !odd;
    cnt++;
    display();;
    goto label1;
label2: display();;
}

Pentium Assembly Code

Just for reference, here is a proper Pentium assembly language file. IT is written for an assembler (A translator that reads assembly language, and produces machine code for the Pentium processor in this case. That tool is called NASM.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
; simple summer
;    compile with:
;	nasm -f elf64 -o sum.o sum.asm
;	gcc -o sum sum.o
;	./sum
;
; initialized data section --------------------------------
	section .data
data	dd	5,3,7,10,42,6,22,15,32
dptr	dq	data		; pointer to arrat
odd	dd	0
sum	dd	0
msg1:	db	"%d", 0		; integer format string
msg2:	db	" ", 0		; space
msg3:	db	0ah, 0		; newline
fmt	dd	0		; pointer to format string
pdata	dq	0		; pointer 

; uninitialized data section ------------------------------
	section	.bss
cnt	resd	1
flag	resd	1

; code section --------------------------------------------
	section .text
	global	main
	extern	printf
display:
	mov	rdi, msg1	; rdi has format string address
	mov	rsi, [cnt]	; rsi has value to print
	mov	rax, 0		; required
	call	printf		; print cnt
	mov	rdi, msg2
	mov	rax, 0
	call 	printf		; then a spaceDD
	mov	rdi, msg1
	mov	rsi, [sum]	
	mov	rax, 0
	call 	printf		; print sum
	mov	rdi, msg3
	mov 	rax, 0
	call 	printf		; newline
	ret
main:
	mov	eax, 0		; get a 32 bit zero
	mov	[cnt], eax	; initialize cnt
	;
l1:	mov	eax, [cnt]	; get current count
	cmp	eax, 9		; hit the end yet?
	jz	l3		; end if so
	mov	eax, [odd]	; fetch current value of odd
	cmp	eax, 0		; is it zero
	jz	l2;
	;
	mov	rax, 0		; make 64 bit zero
	mov	eax, [cnt]	; load in current count
	mov	ecx, 4		; multiply by 4 bytes
	mul	ecx
	mov	rbx, [dptr]	; fetch pointer to array
	add	rbx, rax	; add offset to number
	mov	eax, dword [rbx]	; fetch item from array 
	add	[sum], eax	; add this one in
	;
l2:
	not	dword [odd]		; flip the boolean
	inc	dword [cnt]		; bump the counter
	call	display
	jmp	l1
l3:	mov	rax, 0
	ret

The final result of torturing your simple C program should very close to what you see here. Just for fun, let’s see what this program produces:

$ sudo apt-get install -y nasm
$ nasm sum1.asm

Wow!

AVR Assembly Code

Your compiler takes a program written in the designated programming language, like “C”, and translates it into another language. That second language is the language of the machine where you intend to run your program. For everything we normally do as developers, that machine is the same on you do your development work on.

There is no reason why that second machine has to be the one you intend to run your code on. Suppose you have a tiny machine that is not powerful enough to run a compiler. All you need is a program that will run on your development machine, but this one produces something that can run on your tiny machine. Of course, we will need to find a way to get the output from this compiler onto that tiny machine, but that is often quite simple, as we will see. We call this kind of compiler a cross compiler, and there is a nice one for the real chip we are trying to simulate in our class project.

In your class VM do this:

$ sudo apt-get install -y gcc-avr

When this command completes, you will have a second version of the exact same compiler you have been running up to now, only the target language of this machine is for the Atmel AVR processor family.

When we get to the point where we want to try to run real programs on our simulator, having this tool will be extremely handy.

Nasm Version

Just for reference, here is a proper Pentium assembly language file, for the Nasm assembler, that will do exactly what our new file says to do.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
; simple summer
;    compile with:
;	nasm -f elf64 -o sum.o sum.asm
;	gcc -o sum sum.o
;	./sum
;
; initialized data section --------------------------------
	section .data
data	dd	5,3,7,10,42,6,22,15,32
dptr	dq	data		; pointer to arrat
odd	dd	0
sum	dd	0
msg1:	db	"%d", 0		; integer format string
msg2:	db	" ", 0		; space
msg3:	db	0ah, 0		; newline
fmt	dd	0		; pointer to format string
pdata	dq	0		; pointer 

; uninitialized data section ------------------------------
	section	.bss
cnt	resd	1
flag	resd	1

; code section --------------------------------------------
	section .text
	global	main
	extern	printf
display:
	mov	rdi, msg1	; rdi has format string address
	mov	rsi, [cnt]	; rsi has value to print
	mov	rax, 0		; required
	call	printf		; print cnt
	mov	rdi, msg2
	mov	rax, 0
	call 	printf		; then a spaceDD
	mov	rdi, msg1
	mov	rsi, [sum]	
	mov	rax, 0
	call 	printf		; print sum
	mov	rdi, msg3
	mov 	rax, 0
	call 	printf		; newline
	ret
main:
	mov	eax, 0		; get a 32 bit zero
	mov	[cnt], eax	; initialize cnt
	;
l1:	mov	eax, [cnt]	; get current count
	cmp	eax, 9		; hit the end yet?
	jz	l3		; end if so
	mov	eax, [odd]	; fetch current value of odd
	cmp	eax, 0		; is it zero
	jz	l2;
	;
	mov	rax, 0		; make 64 bit zero
	mov	eax, [cnt]	; load in current count
	mov	ecx, 4		; multiply by 4 bytes
	mul	ecx
	mov	rbx, [dptr]	; fetch pointer to array
	add	rbx, rax	; add offset to number
	mov	eax, dword [rbx]	; fetch item from array 
	add	[sum], eax	; add this one in
	;
l2:
	not	dword [odd]		; flip the boolean
	inc	dword [cnt]		; bump the counter
	call	display
	jmp	l1
l3:	mov	rax, 0
	ret

And this is exactly what your tortured C file does as well.

Wow!

Just for fun, let’s see what this code looks like when we let avr-gcc compile it:

$ avr-gcc -mmcu=attiny85 -S sum.c -o sum-avr.asm

This command tells the compiler to just compile, and dump the assembly language it produced so humans can look it over. This is extremely handy when learning how to write assembly language for some chip! (Or for building a simulator, It tells us what instructions we really need to simulate!)

The result of this, stripped of junk we do not need to worry about here, looks like this;

sum-avr.asm
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
; set up global data area =============================================
	.data

	; set up "data" array
data:
	.word	5
	.word	3
	.word	7
	.word	10
	.word	42
	.word	6
	.word	22
	.word	15
	.word	32

	; set up uninitialized "cnt" and "sum" variables
	.comm	cnt,2,1
	.comm	sum,2,1

	; set up initialized variable "odd"
	
	.section .bss
odd:
	.zero	2	; initialize 16-bits with zero


	; program code starts here ====================================
	.text
main:
	rjmp .L2
.L5:
	lds r24,odd		; load "odd" into r24,r25
	lds r25,odd+1
	or r24,r25		; OR the two bytes (why?)
	breq .L3		; branch if equal to zero

	lds r24,cnt		; load "cnt" into r24,r25
	lds r25,cnt+1

	lsl r24			; left shift, high bit into carry flag
	rol r25			; rotate left, carry enters low bit (16-bit *2)

	subi r24,lo8(-(data))	; 16-bit subtract r24,r25 from (-data[i]) HUH?
	sbci r25,hi8(-(data))

	movw r30,r24		; save result in "z" (r30,r21)
	ld r24,Z		; 16-bit move (data[Z] -> r24,r25)
	ldd r25,Z+1

	lds r18,sum		; get current sum into r18,r19
	lds r19,sum+1

	add r24,r18		;add lo(sum) + lo(data[i])
	adc r25,r19		; add with carry (hi(sum) + hi(data[i])

	sts sum+1,r25		; put result back in r24,r24
	sts sum,r24
.L3:
	lds r24,odd		; load 16_bit odd into r24,r24
	lds r25,odd+1

	ldi r18,lo8(1)		; set up 16_bit "1"
	or r24,r25		; see if this is zero
	breq .L4		; if so, branch

	ldi r18,0		; set r18 to zero
.L4:
	mov r24,r18		; save r18 inro r24
	ldi r25,0		; set r25 to zero

	sts odd+1,r25		; save result in "odd"
	sts odd,r24

	lds r24,cnt		; load "cnt" into r24,r25
	lds r25,cnt+1		
	adiw r24,1		; add 1 to 16-bit value in r24,e25
	sts cnt+1,r25		; save result back in "cnt"
	sts cnt,r24
.L2:
	lds r24,cnt		; load 16-bit z'cntz' into r24, r25
	lds r25,cnt+1
	sbiw r24,9		; 16-bit subtract "9" from r24,r26
	brlt .L5		; branch if less
; end of program, but where do we go? =================================
	ret

Final Comments

This was an interesting exercise (especially at 3am one morning). Fortunately, I have a lot of experience using the C preprocessor to modify code. This is not a form of programming you should do much of. Obviously, the end result will hide a lot of what is really going on. But, in this case, we are trying to see what the code looks like inside of the machine, and that code is definitely not C!

If you look closely at these last two files, you will see one fundamental difference that I am not going to fix. (You might try that as an exercise). There are several places where I am referencing variables in memory directly. In fact, the assembly language moves some data into internal storage places called registers, and I did not set up any of those in this example. The Pentium processor cannot reference two memory items in the same line. There are instances in this code that clearly need to be fixed. That should be easy enough, but I have accomplished my basic goal in what we see here.

What Instructions Did We Need

It is worthwhile examining exactly what assembly language constructs we needed from all those available in the Pentium to get this code running. Here is a summary:

  • ADD - add two data items as integers

  • CALL - branch to a function

  • CMP - see how two data items relate

  • INC - increment a data item (very common action)

  • JMP - absolute branch

  • JZ - conditional branch

  • MOV - copy data from one place to another

  • MUL - multiply two data items as integers

  • NOT - logical compliment of a binary value

  • RET - return from a function

There are different forms of a few of these, but that is not many for a moderate C program.

See! Assembly language is not so scary!

The key concept to take away from all of this warping of code is this:

The processor does very simple things. Those things look nothing like the code you originally wrote. Some tool, normally called a compiler, is responsible for translating what you wrote into something the processor can digest. Our final form was not quite that. We introduced a “humanized” form of what the processor can really digest (0s and 1s). We call that form assembly language.

This is done because us humans do not work well with piles of 0s and 1s!. We add in one more tool, called an assembler, to translate assembly language into the processor’s real language.

In fact, the gcc compiler does exactly that:

  • It reads your program using the preprocessor to get rid of those lines starting with a “#”

  • It then translates the resulting file into assembly language

  • Next, it invokes the assembler to translate that code into 0s and 1s.

  • Finally, it saves that stuff in an executable file.

Running Your Code

Remember that the executable file produced by your build process is really a data file used by the operating system’s loader program to set up the basic memory areas needed by your program. The loader requests a block of memory for initialized data items, and writes into that memory all of your initial values. It them asks for a block of uninitialized data, and leaves that block alone, with whatever junk was there before still there. Finally, it requests a block of memory for your code (instructions) and copies your code into place. It must then patch all references to addresses in the data memory areas so things work right. Once all of this setup is done, the loader hands this new program back to the operating system and the OS will run your program when it finds the time to do so!