Decoding AVR Instructions¶

The AVR Instruction summary details how individual instructions are encoded, and from that information and the ATtiny85 datasheet, I put together a file containing all of the instructions available in this processor, and the encoding of each. A bit of Python analysis turned out a summary of the Instruction set, ordered by the bit patterns we will see in the final hex file.

0000
	11rd	dddd	rrrr	ADD
	10rd	dddd	rrrr	SBC
	00rd	dddd	rrrr	CPC
	11dd	dddd	dddd	LSL
	0001	dddd	rrrr	MOVW
	0000	0000	0000	NOP
0001
	11rd	dddd	rrrr	ADC
	10rd	dddd	rrrr	SUB
	00rd	dddd	rrrr	CPSE
	01rd	dddd	rrrr	CP
	11dd	dddd	dddd	ROL
0010
	00rd	dddd	rrrr	AND
	10rd	dddd	rrrr	OR
	01rd	dddd	rrrr	EOR
	00dd	dddd	dddd	TST
	01dd	dddd	dddd	CLR
	11rd	dddd	rrrr	MOV
0011
	KKKK	dddd	KKKK	CPI
0100
	KKKK	dddd	KKKK	SBCI
0101
	KKKK	dddd	KKKK	SUBI
0110
	KKKK	dddd	KKKK	ORI
	KKKK	dddd	KKKK	SBR
0111
	KKKK	dddd	KKKK	ANDI
	KKKK	dddd	KKKK	CBR
1001
	0110	KKdd	KKKK	ADIW
	0111	KKdd	KKKK	SBIW
	010d	dddd	0000	COM
	010d	dddd	0001	NEG
	010d	dddd	0011	INC
	010d	dddd	1010	DEC
	0100	0000	1001	IJMP
	0101	0000	1001	ICALL
	0101	0000	1000	RET
	0101	0001	1000	RETI
	1001	AAAA	Abbb	SBIC
	1011	AAAA	Abbb	SBIS
	1010	AAAA	Abbb	SBI
	1000	AAAA	Abbb	CBI
	010d	dddd	0110	LSR
	010d	dddd	0111	ROR
	010d	dddd	0101	ASR
	010d	dddd	0010	SWAP
	0100	0110	1000	SET
	0100	0sss	1000	BCLR
	0100	0000	1000	SEC
	0100	1000	1000	CLC
	0100	0010	1000	SEN
	0100	1010	1000	CLN
	0100	0001	1000	SEZ
	0100	1001	1000	CLZ
	0100	0111	1000	SEI
	0100	1111	1000	CLI
	0100 0100	0100 1100	1000 1000	SES CLS
	0100	0011	1000	SEV
	0100	1011	1000	CLV
	0100	1110	1000	CLT
	0100	0101	1000	SEH
	0100	1101	1000	CLH
	000d	dddd	0010	LD
	000d	dddd	0000	kkkk	kkkk	kkkk	kkkk	LDS
	001r	rrrr	0010	ST
	001r	rrrr	0000	kkkk	kkkk	kkkk	kkkk	STS
	000d	dddd	0101	LPM
	0101	1110	1000	SPM
	001r	rrrr	1111	PUSH
	000d	dddd	1111	POP
	0101	1000	1000	SLEEP
	0101	1010	1000	WDR
	0101	1001	1000	BREAK
1011
	0AAd	dddd	AAAA	IN
	1AAr	rrrr	AAAA	OUT
10q0
	qq0d	dddd	0qqq	LDD
	qq1r	rrrr	0qqq	STD
1100
	kkkk	kkkk	kkkk	RJMP
1101
	kkkk	kkkk	kkkk	RCALL
1110
	1111	dddd	1111	SER
	KKKK	dddd	KKKK	LDI
1111
	110r	rrrr	0bbb	SBRC
	111r	rrrr	0bbb	SBRS
	00kk	kkkk	ksss	BRBS
	01kk	kkkk	ksss	BRBC
	00kk	kkkk	k001	BREQ
	01kk	kkkk	k001	BRNE
	00kk	kkkk	k000	BRCS
	01kk	kkkk	k000	BRCC
	01kk	kkkk	k000	BRSH
	00kk	kkkk	k000	BRLO
	00kk	kkkk	k010	BRMI
	01kk	kkkk	k010	BRPL
	01kk	kkkk	k100	BRGE
	00kk	kkkk	k100	BRLT
	00kk	kkkk	k101	BRHS
	01kk	kkkk	k101	BRHC
	00kk	kkkk	k110	BRTS
	01kk	kkkk	k110	BRTC
	00kk	kkkk	k011	BRVS
	01kk	kkkk	k011	BRVC
	00kk	kkkk	k111	BRIE
	01kk	kkkk	k111	BRID
	101d	dddd	0bbb	BST
	100d	dddd	0bbb	BLD

Warning

The data file used to produce this table is still being proof-read. You will be able to play with it late rin this lecture.

Some things may strike you as odd. For instance, in that first part of the table, the ADD and LSL instructions have the same basic encoding. When you add a register to itself, you basically shift all the bits to the left, and the register fields, where we normally decode Rd, and Rr for a two operand instruction, are the same for LSL. That means these two are the same instructions, but the chip designers decided to give you a more common instruction for use in assembly coding.

Decode Output Signals¶

From the table above, we can figure out what signals the decode unit needs to produce:

Rd

5-bits

0<=d<=31

Rr

5-bits

0<=r<=31

K

6-bits

0<=K<=63

A

6-bits

0<=A,=63

s

3-bits

0<=s<=63

k

16-bits

0<=k<=655365

k

7-bits

-64<=k<=63

q

6-bits

0<=q<=63

b

3-bits

0<=b<=7

The legal values allowed on each signal are shown, deduced from the AVR instruction documentation.

Basically, the job of decoding instructions amounts to a tedious exercise in checking patterns. The most common way to do this involves the C++ switch statement. We can use the bitset data type to play around with individual bits.

decode.cpp¶

// Test Decoding Logic
#include <iostream>
#include <string>
#include <cstdint>
#include<bitset>

int getRd(std::bitset<4>b1, std::bitset<4>b2) {
    std::bitset<5> Rd;
    Rd[4] = b1[1];    
    Rd[3] = b2[3];    
    Rd[2] = b2[2];    
    Rd[1] = b2[1];    
    Rd[0] = b2[0]; 
    return Rd.to_ulong();
}   

int getRr(std::bitset<4>b1, std::bitset<4>b2) {
    std::bitset<5>Rr;
    Rr[4] = b1[0];    
    Rr[3] = b2[3];    
    Rr[2] = b2[2];    
    Rr[1] = b2[1];    
    Rr[0] = b2[0]; 
    return Rr.to_ulong();
}

int main(void) {

    std::string code[6][4] = {
        {"0000", "1110", "1010", "1101"},	// ADD
        {"0001", "1011", "1011", "1011"},	// SUB
        {"0010", "0000", "1101", "1100"},	// AND
        {"0010", "1001", "1100", "1110"},	// OR
        {"1001", "0101", "1100", "0000"},	// COM
        {"0010", "0101", "1010", "0111"} 	// EOR
    };

	for(int i=0; i<6; i++) {
        std::bitset<4> b1((std::string(code[i][0])));
        std::bitset<4> b2((std::string(code[i][1])));
        std::bitset<4> b3((std::string(code[i][2])));
        std::bitset<4> b4((std::string(code[i][3])));

        switch (b1.to_ulong()) {
            case 0b0000:
                if (b2.test(3) && b2.test(2)) {
                    std::cout << "ADD "
                        << "R" 
                        << getRd(b2,b3) 
                        << "," 
                        << "R" 
                        << getRr(b2,b4) 
                        << std::endl;
                }
                break;
            case 0b0001:
                if (b2.test(3) && !b2.test(2)) {
                    std::cout << "SUB "
                        << "R" 
                        << getRd(b2,b3) 
                        << "," 
                        << "R" 
                        << getRr(b2,b4) 
                        << std::endl;
                }
                break;
            case 0b0010:
                std::cout<< code[i][0] << std::endl;
                break;
            case 0b1001:
                std::cout<< code[i][0] << std::endl;
                break;
        }
    }
}

This code (which is incomplete) shows basically how to extract the bits that make up a register number. The encodings from the Le’t see what this does:

$ g++ -o demo decode.cpp

Note

The code shown here can convert a string on binary digits to a bitstring, and from a bitstring to an unsigned long. You should already know how to convert from an integer to a bitset.

Let’s see this code in action:

$ ./demo
ADD R26,R13
SUB R27,R27
0010
0010
1001
0010

The logic is figuring out the register numbers, bit I will leave it up to you to verify that things are corect.

Endians¶

Not Indians, “Endians”!

Computer systems are funny beasts. We want ot store data types of any size in our memory, but often that memory is just a bunch of bytes. How do we pack 16-bits into 8-bit contianers?

Simple, we use two of those bytes to hold the 16-bits. But as soon as we decide to do that, we have a question to answer. WHich byte comes first?

Little Endian¶

The most common scheme is :little endian”. In this scheme, the low byte it places in the lower address, and the upper byte is placed one byte above that one. This scheme also handles bigger data types. just keep placing successive bytes on top of the lower bytes until you are done.

The Pentium (and the AVR) are “little endian” systems.

Big Endian¶

As you might suspect, “big endian” uses the opposite scheme. High bytes in the data type end up at low addresses. The only system I am familiar with that uses this scheme is built by Sun Microsystems, now part of Oracle. I have not seen one of their systems in years.

What does this mean for our simulator?

I put together a short chunk of code and assembled it to see what the compiler produced:

2c: 82 0f   add r24, r18
2e:

This shows that instruction memeory is really byte addressed in this machine, something we do not need to worry about aslong as we get the right bits to decode.

Converting this to binary, and splitting the bits up into nibbles, we see this:

2c: 1000 0010 0000 1111 add r24, r18

Now, according to the instruction encoding table for this instruction, we should see this:

add -> 0000 11rd dddd rrrr

It looks like our “little endian” machine has swaped the bytes around.

1000 0010 0000 1111
dddd rrrr 0000 11rd

Which gives us these registers:

Rd -> 11000 -> R24
rr -. 10010 -> R18

Which is just what we want to find! Looks like the encoding table matches the bits in the code!

AVR Instruction Memory¶

The AVR instruction memory is actually a big array of bytes, but that memory is designed to deliver 16-bits n one operation. We will model this memory as a 16-bit data array, just to make things simpler.

When you fetch data from that memory, we need to check the order of the bytes, to make sure our instruction codes are in the right order.

That means we need careful testing to make sure that when we load a program from the bytes in a hex file, we get the right results when we docode things in our simulator.

Similar decoding code will handle the instructions we are going to include in our system.

Using Python to Explore Code¶

I used Python to take apart the instruction set, and to extract real code from th elisting file generated by the avr-gcc compiler.

Here is part of the code I used:

decoder.py

ATTiny85.json

The json data file is something I extracted from the AVR documentation files I showed earlier. I actually am setting up this system to generate an example of every instruction in the chip in an assembly language file. When that file is processed, I will be able to ccheck the encoding tables for every instruction. See Python is durned handy!

Decoding AVR Instructions¶

Decode Output Signals¶

Endians¶

Little Endian¶

Big Endian¶

AVR Instruction Memory¶

Using Python to Explore Code¶

Page contents

Previous page

Next page

This Page

Rd	5-bits	0<=d<=31
Rr	5-bits	0<=r<=31
K	6-bits	0<=K<=63
A	6-bits	0<=A,=63
s	3-bits	0<=s<=63
k	16-bits	0<=k<=655365
k	7-bits	-64<=k<=63
q	6-bits	0<=q<=63
b	3-bits	0<=b<=7