Decoding AVR Instructions

The AVR Instruction summary details how individual instructions are encoded, and from that information and the ATtiny85 datasheet, I put together a file containing all of the instructions available in this processor, and the encoding of each. A bit of Python analysis turned out a summary of the Instruction set, ordered by the bit patterns we will see in the final hex file.

0000                
  11rd dddd rrrr ADD        
  10rd dddd rrrr SBC        
  00rd dddd rrrr CPC        
  11dd dddd dddd LSL        
  0001 dddd rrrr MOVW        
  0000 0000 0000 NOP        
0001                
  11rd dddd rrrr ADC        
  10rd dddd rrrr SUB        
  00rd dddd rrrr CPSE        
  01rd dddd rrrr CP        
  11dd dddd dddd ROL        
0010                
  00rd dddd rrrr AND        
  10rd dddd rrrr OR        
  01rd dddd rrrr EOR        
  00dd dddd dddd TST        
  01dd dddd dddd CLR        
  11rd dddd rrrr MOV        
0011                
  KKKK dddd KKKK CPI        
0100                
  KKKK dddd KKKK SBCI        
0101                
  KKKK dddd KKKK SUBI        
0110                
  KKKK dddd KKKK ORI        
  KKKK dddd KKKK SBR        
0111                
  KKKK dddd KKKK ANDI        
  KKKK dddd KKKK CBR        
1001                
  0110 KKdd KKKK ADIW        
  0111 KKdd KKKK SBIW        
  010d dddd 0000 COM        
  010d dddd 0001 NEG        
  010d dddd 0011 INC        
  010d dddd 1010 DEC        
  0100 0000 1001 IJMP        
  0101 0000 1001 ICALL        
  0101 0000 1000 RET        
  0101 0001 1000 RETI        
  1001 AAAA Abbb SBIC        
  1011 AAAA Abbb SBIS        
  1010 AAAA Abbb SBI        
  1000 AAAA Abbb CBI        
  010d dddd 0110 LSR        
  010d dddd 0111 ROR        
  010d dddd 0101 ASR        
  010d dddd 0010 SWAP        
  0100 0110 1000 SET        
  0100 0sss 1000 BCLR        
  0100 0000 1000 SEC        
  0100 1000 1000 CLC        
  0100 0010 1000 SEN        
  0100 1010 1000 CLN        
  0100 0001 1000 SEZ        
  0100 1001 1000 CLZ        
  0100 0111 1000 SEI        
  0100 1111 1000 CLI        
  0100 0100 0100 1100 1000 1000 SES CLS        
  0100 0011 1000 SEV        
  0100 1011 1000 CLV        
  0100 1110 1000 CLT        
  0100 0101 1000 SEH        
  0100 1101 1000 CLH        
  000d dddd 0010 LD        
  000d dddd 0000 kkkk kkkk kkkk kkkk LDS
  001r rrrr 0010 ST        
  001r rrrr 0000 kkkk kkkk kkkk kkkk STS
  000d dddd 0101 LPM        
  0101 1110 1000 SPM        
  001r rrrr 1111 PUSH        
  000d dddd 1111 POP        
  0101 1000 1000 SLEEP        
  0101 1010 1000 WDR        
  0101 1001 1000 BREAK        
1011                
  0AAd dddd AAAA IN        
  1AAr rrrr AAAA OUT        
10q0                
  qq0d dddd 0qqq LDD        
  qq1r rrrr 0qqq STD        
1100                
  kkkk kkkk kkkk RJMP        
1101                
  kkkk kkkk kkkk RCALL        
1110                
  1111 dddd 1111 SER        
  KKKK dddd KKKK LDI        
1111                
  110r rrrr 0bbb SBRC        
  111r rrrr 0bbb SBRS        
  00kk kkkk ksss BRBS        
  01kk kkkk ksss BRBC        
  00kk kkkk k001 BREQ        
  01kk kkkk k001 BRNE        
  00kk kkkk k000 BRCS        
  01kk kkkk k000 BRCC        
  01kk kkkk k000 BRSH        
  00kk kkkk k000 BRLO        
  00kk kkkk k010 BRMI        
  01kk kkkk k010 BRPL        
  01kk kkkk k100 BRGE        
  00kk kkkk k100 BRLT        
  00kk kkkk k101 BRHS        
  01kk kkkk k101 BRHC        
  00kk kkkk k110 BRTS        
  01kk kkkk k110 BRTC        
  00kk kkkk k011 BRVS        
  01kk kkkk k011 BRVC        
  00kk kkkk k111 BRIE        
  01kk kkkk k111 BRID        
  101d dddd 0bbb BST        
  100d dddd 0bbb BLD        

Warning

The data file used to produce this table is still being proof-read. You will be able to play with it late rin this lecture.

Some things may strike you as odd. For instance, in that first part of the table, the ADD and LSL instructions have the same basic encoding. When you add a register to itself, you basically shift all the bits to the left, and the register fields, where we normally decode Rd, and Rr for a two operand instruction, are the same for LSL. That means these two are the same instructions, but the chip designers decided to give you a more common instruction for use in assembly coding.

Decode Output Signals

From the table above, we can figure out what signals the decode unit needs to produce:

Rd 5-bits 0<=d<=31
Rr 5-bits 0<=r<=31
K 6-bits 0<=K<=63
A 6-bits 0<=A,=63
s 3-bits 0<=s<=63
k 16-bits 0<=k<=655365
k 7-bits -64<=k<=63
q 6-bits 0<=q<=63
b 3-bits 0<=b<=7

The legal values allowed on each signal are shown, deduced from the AVR instruction documentation.

Basically, the job of decoding instructions amounts to a tedious exercise in checking patterns. The most common way to do this involves the C++ switch statement. We can use the bitset data type to play around with individual bits.

decode.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Test Decoding Logic
#include <iostream>
#include <string>
#include <cstdint>
#include<bitset>

int getRd(std::bitset<4>b1, std::bitset<4>b2) {
    std::bitset<5> Rd;
    Rd[4] = b1[1];    
    Rd[3] = b2[3];    
    Rd[2] = b2[2];    
    Rd[1] = b2[1];    
    Rd[0] = b2[0]; 
    return Rd.to_ulong();
}   

int getRr(std::bitset<4>b1, std::bitset<4>b2) {
    std::bitset<5>Rr;
    Rr[4] = b1[0];    
    Rr[3] = b2[3];    
    Rr[2] = b2[2];    
    Rr[1] = b2[1];    
    Rr[0] = b2[0]; 
    return Rr.to_ulong();
}

int main(void) {

    std::string code[6][4] = {
        {"0000", "1110", "1010", "1101"},	// ADD
        {"0001", "1011", "1011", "1011"},	// SUB
        {"0010", "0000", "1101", "1100"},	// AND
        {"0010", "1001", "1100", "1110"},	// OR
        {"1001", "0101", "1100", "0000"},	// COM
        {"0010", "0101", "1010", "0111"} 	// EOR
    };

	for(int i=0; i<6; i++) {
        std::bitset<4> b1((std::string(code[i][0])));
        std::bitset<4> b2((std::string(code[i][1])));
        std::bitset<4> b3((std::string(code[i][2])));
        std::bitset<4> b4((std::string(code[i][3])));

        switch (b1.to_ulong()) {
            case 0b0000:
                if (b2.test(3) && b2.test(2)) {
                    std::cout << "ADD "
                        << "R" 
                        << getRd(b2,b3) 
                        << "," 
                        << "R" 
                        << getRr(b2,b4) 
                        << std::endl;
                }
                break;
            case 0b0001:
                if (b2.test(3) && !b2.test(2)) {
                    std::cout << "SUB "
                        << "R" 
                        << getRd(b2,b3) 
                        << "," 
                        << "R" 
                        << getRr(b2,b4) 
                        << std::endl;
                }
                break;
            case 0b0010:
                std::cout<< code[i][0] << std::endl;
                break;
            case 0b1001:
                std::cout<< code[i][0] << std::endl;
                break;
        }
    }
}

This code (which is incomplete) shows basically how to extract the bits that make up a register number. The encodings from the Le’t see what this does:

$ g++ -o demo decode.cpp

Note

The code shown here can convert a string on binary digits to a bitstring, and from a bitstring to an unsigned long. You should already know how to convert from an integer to a bitset.

Let’s see this code in action:

$ ./demo
ADD R26,R13
SUB R27,R27
0010
0010
1001
0010

The logic is figuring out the register numbers, bit I will leave it up to you to verify that things are corect.

Endians

Not Indians, “Endians”!

Computer systems are funny beasts. We want ot store data types of any size in our memory, but often that memory is just a bunch of bytes. How do we pack 16-bits into 8-bit contianers?

Simple, we use two of those bytes to hold the 16-bits. But as soon as we decide to do that, we have a question to answer. WHich byte comes first?

Little Endian

The most common scheme is :little endian”. In this scheme, the low byte it places in the lower address, and the upper byte is placed one byte above that one. This scheme also handles bigger data types. just keep placing successive bytes on top of the lower bytes until you are done.

The Pentium (and the AVR) are “little endian” systems.

Big Endian

As you might suspect, “big endian” uses the opposite scheme. High bytes in the data type end up at low addresses. The only system I am familiar with that uses this scheme is built by Sun Microsystems, now part of Oracle. I have not seen one of their systems in years.

What does this mean for our simulator?

I put together a short chunk of code and assembled it to see what the compiler produced:

2c: 82 0f   add r24, r18
2e:

This shows that instruction memeory is really byte addressed in this machine, something we do not need to worry about aslong as we get the right bits to decode.

Converting this to binary, and splitting the bits up into nibbles, we see this:

2c: 1000 0010 0000 1111 add r24, r18

Now, according to the instruction encoding table for this instruction, we should see this:

add -> 0000 11rd dddd rrrr

It looks like our “little endian” machine has swaped the bytes around.

1000 0010 0000 1111
dddd rrrr 0000 11rd

Which gives us these registers:

Rd -> 11000 -> R24
rr -. 10010 -> R18

Which is just what we want to find! Looks like the encoding table matches the bits in the code!

AVR Instruction Memory

The AVR instruction memory is actually a big array of bytes, but that memory is designed to deliver 16-bits n one operation. We will model this memory as a 16-bit data array, just to make things simpler.

When you fetch data from that memory, we need to check the order of the bytes, to make sure our instruction codes are in the right order.

That means we need careful testing to make sure that when we load a program from the bytes in a hex file, we get the right results when we docode things in our simulator.

Similar decoding code will handle the instructions we are going to include in our system.

Using Python to Explore Code

I used Python to take apart the instruction set, and to extract real code from th elisting file generated by the avr-gcc compiler.

Here is part of the code I used:

The json data file is something I extracted from the AVR documentation files I showed earlier. I actually am setting up this system to generate an example of every instruction in the chip in an assembly language file. When that file is processed, I will be able to ccheck the encoding tables for every instruction. See Python is durned handy!