Decoding AVR Instructions

The AVR Instruction summary details how individual instructions are encoded, and from that information and the ATtiny85 datasheet, I put together a file containing all of the instructions available in this processor, and the encoding of each. A bit of Python analysis turned out a summary of the Instruction set, ordered by the bit patterns we will see in the final hex file.

0000

11rd

dddd

rrrr

ADD

10rd

dddd

rrrr

SBC

00rd

dddd

rrrr

CPC

11dd

dddd

dddd

LSL

0001

dddd

rrrr

MOVW

0000

0000

0000

NOP

0001

11rd

dddd

rrrr

ADC

10rd

dddd

rrrr

SUB

00rd

dddd

rrrr

CPSE

01rd

dddd

rrrr

CP

11dd

dddd

dddd

ROL

0010

00rd

dddd

rrrr

AND

10rd

dddd

rrrr

OR

01rd

dddd

rrrr

EOR

00dd

dddd

dddd

TST

01dd

dddd

dddd

CLR

11rd

dddd

rrrr

MOV

0011

KKKK

dddd

KKKK

CPI

0100

KKKK

dddd

KKKK

SBCI

0101

KKKK

dddd

KKKK

SUBI

0110

KKKK

dddd

KKKK

ORI

KKKK

dddd

KKKK

SBR

0111

KKKK

dddd

KKKK

ANDI

KKKK

dddd

KKKK

CBR

1001

0110

KKdd

KKKK

ADIW

0111

KKdd

KKKK

SBIW

010d

dddd

0000

COM

010d

dddd

0001

NEG

010d

dddd

0011

INC

010d

dddd

1010

DEC

0100

0000

1001

IJMP

0101

0000

1001

ICALL

0101

0000

1000

RET

0101

0001

1000

RETI

1001

AAAA

Abbb

SBIC

1011

AAAA

Abbb

SBIS

1010

AAAA

Abbb

SBI

1000

AAAA

Abbb

CBI

010d

dddd

0110

LSR

010d

dddd

0111

ROR

010d

dddd

0101

ASR

010d

dddd

0010

SWAP

0100

0110

1000

SET

0100

0sss

1000

BCLR

0100

0000

1000

SEC

0100

1000

1000

CLC

0100

0010

1000

SEN

0100

1010

1000

CLN

0100

0001

1000

SEZ

0100

1001

1000

CLZ

0100

0111

1000

SEI

0100

1111

1000

CLI

0100 0100

0100 1100

1000 1000

SES CLS

0100

0011

1000

SEV

0100

1011

1000

CLV

0100

1110

1000

CLT

0100

0101

1000

SEH

0100

1101

1000

CLH

000d

dddd

0010

LD

000d

dddd

0000

kkkk

kkkk

kkkk

kkkk

LDS

001r

rrrr

0010

ST

001r

rrrr

0000

kkkk

kkkk

kkkk

kkkk

STS

000d

dddd

0101

LPM

0101

1110

1000

SPM

001r

rrrr

1111

PUSH

000d

dddd

1111

POP

0101

1000

1000

SLEEP

0101

1010

1000

WDR

0101

1001

1000

BREAK

1011

0AAd

dddd

AAAA

IN

1AAr

rrrr

AAAA

OUT

10q0

qq0d

dddd

0qqq

LDD

qq1r

rrrr

0qqq

STD

1100

kkkk

kkkk

kkkk

RJMP

1101

kkkk

kkkk

kkkk

RCALL

1110

1111

dddd

1111

SER

KKKK

dddd

KKKK

LDI

1111

110r

rrrr

0bbb

SBRC

111r

rrrr

0bbb

SBRS

00kk

kkkk

ksss

BRBS

01kk

kkkk

ksss

BRBC

00kk

kkkk

k001

BREQ

01kk

kkkk

k001

BRNE

00kk

kkkk

k000

BRCS

01kk

kkkk

k000

BRCC

01kk

kkkk

k000

BRSH

00kk

kkkk

k000

BRLO

00kk

kkkk

k010

BRMI

01kk

kkkk

k010

BRPL

01kk

kkkk

k100

BRGE

00kk

kkkk

k100

BRLT

00kk

kkkk

k101

BRHS

01kk

kkkk

k101

BRHC

00kk

kkkk

k110

BRTS

01kk

kkkk

k110

BRTC

00kk

kkkk

k011

BRVS

01kk

kkkk

k011

BRVC

00kk

kkkk

k111

BRIE

01kk

kkkk

k111

BRID

101d

dddd

0bbb

BST

100d

dddd

0bbb

BLD

Warning

The data file used to produce this table is still being proof-read. You will be able to play with it late rin this lecture.

Some things may strike you as odd. For instance, in that first part of the table, the ADD and LSL instructions have the same basic encoding. When you add a register to itself, you basically shift all the bits to the left, and the register fields, where we normally decode Rd, and Rr for a two operand instruction, are the same for LSL. That means these two are the same instructions, but the chip designers decided to give you a more common instruction for use in assembly coding.

Decode Output Signals

From the table above, we can figure out what signals the decode unit needs to produce:

Rd

5-bits

0<=d<=31

Rr

5-bits

0<=r<=31

K

6-bits

0<=K<=63

A

6-bits

0<=A,=63

s

3-bits

0<=s<=63

k

16-bits

0<=k<=655365

k

7-bits

-64<=k<=63

q

6-bits

0<=q<=63

b

3-bits

0<=b<=7

The legal values allowed on each signal are shown, deduced from the AVR instruction documentation.

Basically, the job of decoding instructions amounts to a tedious exercise in checking patterns. The most common way to do this involves the C++ switch statement. We can use the bitset data type to play around with individual bits.

decode.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// Test Decoding Logic
#include <iostream>
#include <string>
#include <cstdint>
#include<bitset>

int getRd(std::bitset<4>b1, std::bitset<4>b2) {
    std::bitset<5> Rd;
    Rd[4] = b1[1];    
    Rd[3] = b2[3];    
    Rd[2] = b2[2];    
    Rd[1] = b2[1];    
    Rd[0] = b2[0]; 
    return Rd.to_ulong();
}   

int getRr(std::bitset<4>b1, std::bitset<4>b2) {
    std::bitset<5>Rr;
    Rr[4] = b1[0];    
    Rr[3] = b2[3];    
    Rr[2] = b2[2];    
    Rr[1] = b2[1];    
    Rr[0] = b2[0]; 
    return Rr.to_ulong();
}

int main(void) {

    std::string code[6][4] = {
        {"0000", "1110", "1010", "1101"},	// ADD
        {"0001", "1011", "1011", "1011"},	// SUB
        {"0010", "0000", "1101", "1100"},	// AND
        {"0010", "1001", "1100", "1110"},	// OR
        {"1001", "0101", "1100", "0000"},	// COM
        {"0010", "0101", "1010", "0111"} 	// EOR
    };

	for(int i=0; i<6; i++) {
        std::bitset<4> b1((std::string(code[i][0])));
        std::bitset<4> b2((std::string(code[i][1])));
        std::bitset<4> b3((std::string(code[i][2])));
        std::bitset<4> b4((std::string(code[i][3])));

        switch (b1.to_ulong()) {
            case 0b0000:
                if (b2.test(3) && b2.test(2)) {
                    std::cout << "ADD "
                        << "R" 
                        << getRd(b2,b3) 
                        << "," 
                        << "R" 
                        << getRr(b2,b4) 
                        << std::endl;
                }
                break;
            case 0b0001:
                if (b2.test(3) && !b2.test(2)) {
                    std::cout << "SUB "
                        << "R" 
                        << getRd(b2,b3) 
                        << "," 
                        << "R" 
                        << getRr(b2,b4) 
                        << std::endl;
                }
                break;
            case 0b0010:
                std::cout<< code[i][0] << std::endl;
                break;
            case 0b1001:
                std::cout<< code[i][0] << std::endl;
                break;
        }
    }
}

This code (which is incomplete) shows basically how to extract the bits that make up a register number. The encodings from the Le’t see what this does:

$ g++ -o demo decode.cpp

Note

The code shown here can convert a string on binary digits to a bitstring, and from a bitstring to an unsigned long. You should already know how to convert from an integer to a bitset.

Let’s see this code in action:

$ ./demo
ADD R26,R13
SUB R27,R27
0010
0010
1001
0010

The logic is figuring out the register numbers, bit I will leave it up to you to verify that things are corect.

Endians

Not Indians, “Endians”!

Computer systems are funny beasts. We want ot store data types of any size in our memory, but often that memory is just a bunch of bytes. How do we pack 16-bits into 8-bit contianers?

Simple, we use two of those bytes to hold the 16-bits. But as soon as we decide to do that, we have a question to answer. WHich byte comes first?

Little Endian

The most common scheme is :little endian”. In this scheme, the low byte it places in the lower address, and the upper byte is placed one byte above that one. This scheme also handles bigger data types. just keep placing successive bytes on top of the lower bytes until you are done.

The Pentium (and the AVR) are “little endian” systems.

Big Endian

As you might suspect, “big endian” uses the opposite scheme. High bytes in the data type end up at low addresses. The only system I am familiar with that uses this scheme is built by Sun Microsystems, now part of Oracle. I have not seen one of their systems in years.

What does this mean for our simulator?

I put together a short chunk of code and assembled it to see what the compiler produced:

2c: 82 0f   add r24, r18
2e:

This shows that instruction memeory is really byte addressed in this machine, something we do not need to worry about aslong as we get the right bits to decode.

Converting this to binary, and splitting the bits up into nibbles, we see this:

2c: 1000 0010 0000 1111 add r24, r18

Now, according to the instruction encoding table for this instruction, we should see this:

add -> 0000 11rd dddd rrrr

It looks like our “little endian” machine has swaped the bytes around.

1000 0010 0000 1111
dddd rrrr 0000 11rd

Which gives us these registers:

Rd -> 11000 -> R24
rr -. 10010 -> R18

Which is just what we want to find! Looks like the encoding table matches the bits in the code!

AVR Instruction Memory

The AVR instruction memory is actually a big array of bytes, but that memory is designed to deliver 16-bits n one operation. We will model this memory as a 16-bit data array, just to make things simpler.

When you fetch data from that memory, we need to check the order of the bytes, to make sure our instruction codes are in the right order.

That means we need careful testing to make sure that when we load a program from the bytes in a hex file, we get the right results when we docode things in our simulator.

Similar decoding code will handle the instructions we are going to include in our system.

Using Python to Explore Code

I used Python to take apart the instruction set, and to extract real code from th elisting file generated by the avr-gcc compiler.

Here is part of the code I used:

The json data file is something I extracted from the AVR documentation files I showed earlier. I actually am setting up this system to generate an example of every instruction in the chip in an assembly language file. When that file is processed, I will be able to ccheck the encoding tables for every instruction. See Python is durned handy!