AVR Instructions¶
Read time: 21 minutes (5316 words)
The AVR processor family supports around 130 unique instructions. We could
simply start implementing each of them, one at a time, as we build our
simulator. But that would take more effort than we need in this class. Since we
have a nice example C
program to use as a starting point, let’s ask the
avr-gcc
assembler to help us select instructions to implement in our
simulator.
To generate AVR assembly language from a high-level C
program, we need a
few more components for our Modular Make
setup. For now, keep this
Makefile` setup seprate from the one we aare using for the C++ project (I
will get the two merged together soon!)
Here are the new Makefile
components we need:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # Makefile for AVR projects
MK := mk
AVRPROJ := sum
TARGET := avr-sum
MCU := attiny85
FREQ := 16000000L
PGMR := arduino
include mk/os_detect.mk
include mk/avr-tools.mk
include mk/avr-files.mk
# check these settings after plugging in board
ifeq ($(PLATFORM), Mac)
PORT := /dev/cu.usbmodem1411
else
ifeq ($(PLATFORM), Linux)
PORT := /dev/ttyACM0
else
PORT := COM6
endif
endif
# do not modify anything below this line
.SUFFIXES:
-include mk/avr-build.mk
-include mk/avr-utils.mk
-include mk/help.mk
-include mk/debug.mk
-include mk/version.mk
|
1 2 3 4 5 6 7 8 9 10 11 | # source files
CSRCS := $(shell python $(MK)/pyfind.py avr/$(AVRPROJ) .c)
CXXSRCS := $(shell python $(MK)/pyfind.py avr/$(AVRPROJ) .cpp)
SSRCS := $(shell python $(MK)/pyfind.py avr/$(AVRPROJ) .S)
# required object files
COBJS := $(CSRCS:.c=.o)
CXXOBJS := $(CXXSRCS:.cpp=.o)
SOBJS := $(SSRCS:.S=.o)
OBJS := $(COBJS) $(CXXOBJS) $(SOBJS)
LST := $(TARGET).lst
|
1 2 3 4 5 6 7 | # tools - these should be able to run on command line
GCC := avr-gcc
GXX := avr-g++
OBJDUMP := avr-objdump
OBJCOPY := avr-objcopy
DUDE := avrdude
|
1 2 3 4 5 6 7 8 | # utility targets
.PHONY: load
load: $(TARGET).hex ## Load hex file using avrdude
$(DUDE) $(DUDECONF) $(UFLAGS) -Uflash:w:$(TARGET).hex:i
.PHONY: clean
clean: ## remove build artifacts
$(RM) *.hex *.lst *.elf $(OBJS)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # loader flags
UFLAGS := -v -D -p$(MCU) -c$(PGMR)
UFLAGS += -P$(PORT)
UFLAGS += -b115200
# c compiler flags
CFLAGS := -Iinclude
CFLAGS += -c -Os -mmcu=$(MCU)
CFLAGS += -DF_CPU=$(FREQ)
# link flags
LFLAGS := -mmcu=$(MCU)
LFLAGS += -nostartfiles
# build targets
.PHONY all:
all: $(TARGET).hex $(LST)
# implicit build rules
%.hex: %.elf
$(OBJCOPY) -O ihex -R .eeprom $< $@
%.elf: $(OBJS)
$(GCC) $(LFLAGS) -o $@ $^
%.o: %.cpp
$(GXX) -c $(CFLAGS) -o $@ $<
%.o: %.c
$(GCC) -c $(CFLAGS) -o $@ $<
%.o: %.S
$(GCC) -c $(CFLAGS) -o $@ $<
%.lst: %.elf
$(OBJDUMP) -C -d $< > $@
|
You should add the debug.mk
, help.mk
and os_detect.mk
files, with
their Python helper files as well. Test your setup by running this command:
$ make help
load: Load hex file using avrdude
clean: remove build artifacts
help: display help messages
debug: display local make variables defined
debug-all: display all make variables defined
For now, we will skip discussing exactly what is happening in this build system.
Example AVR Assembly Language¶
To demonstrate how this system works, here is a streamlined example of AVR
assembly language distilled from compiling the C
code we looked at earlier:
Note
It is not important that you understand this code for now, we will get to that. For now, all we are focusing on is how we will build programms written in this strange new language!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ; set up global data area =============================================
.data
; set up "data" array
data:
.word 5
.word 3
.word 7
.word 10
.word 42
.word 6
.word 22
.word 15
.word 32
; set up uninitialized "cnt" and "sum" variables
.comm cnt,2,1
.comm sum,2,1
; set up initialized variable "odd"
.section .bss
odd:
.zero 2 ; initialize 16-bits with zero
; program code starts here ====================================
.text
main:
rjmp .L2
.L5:
lds r24,odd ; load "odd" into r24,r25
lds r25,odd+1
or r24,r25 ; OR the two bytes (why?)
breq .L3 ; branch if equal to zero
lds r24,cnt ; load "cnt" into r24,r25
lds r25,cnt+1
lsl r24 ; left shift, high bit into carry flag
rol r25 ; rotate left, carry enters low bit (16-bit *2)
subi r24,lo8(-(data)) ; 16-bit subtract r24,r25 from (-data[i]) HUH?
sbci r25,hi8(-(data))
movw r30,r24 ; save result in "z" (r30,r21)
ld r24,Z ; 16-bit move (data[Z] -> r24,r25)
ldd r25,Z+1
lds r18,sum ; get current sum into r18,r19
lds r19,sum+1
add r24,r18 ;add lo(sum) + lo(data[i])
adc r25,r19 ; add with carry (hi(sum) + hi(data[i])
sts sum+1,r25 ; put result back in r24,r24
sts sum,r24
.L3:
lds r24,odd ; load 16_bit odd into r24,r24
lds r25,odd+1
ldi r18,lo8(1) ; set up 16_bit "1"
or r24,r25 ; see if this is zero
breq .L4 ; if so, branch
ldi r18,0 ; set r18 to zero
.L4:
mov r24,r18 ; save r18 inro r24
ldi r25,0 ; set r25 to zero
sts odd+1,r25 ; save result in "odd"
sts odd,r24
lds r24,cnt ; load "cnt" into r24,r25
lds r25,cnt+1
adiw r24,1 ; add 1 to 16-bit value in r24,e25
sts cnt+1,r25 ; save result back in "cnt"
sts cnt,r24
.L2:
lds r24,cnt ; load 16-bit z'cntz' into r24, r25
lds r25,cnt+1
sbiw r24,9 ; 16-bit subtract "9" from r24,r26
brlt .L5 ; branch if less
; end of program, but where do we go? =================================
ret
|
Note
Notice that we name our assembly language with a .S
extension. This is
important because it tells the compiler that a human wrote this. Compiler
generated assembly language files end with a lower case .s
.
Let’s run the build system and get a look at these files:
$ make clean
rm -f *.hex *.lst *.elf avr/sum/avr-sum.o
$ make
avr-gcc -c -Iinclude -c -Os -mmcu=attiny85 -DF_CPU=16000000L -o avr/sum/avr-sum.o avr/sum/avr-sum.S
avr-gcc -mmcu=attiny85 -nostartfiles -o avr-sum.elf avr/sum/avr-sum.o
avr-objcopy -O ihex -R .eeprom avr-sum.elf avr-sum.hex
avr-objdump -C -d avr-sum.elf > avr-sum.lst
rm avr/sum/avr-sum.o avr-sum.elf
After these commands run, you will find two important files that have been constructed by the build system:
avr-sum.hex - a file ready to load on a real AVR board
avr-sum.lst - a listing file showing the assembly language produced from your code
Intel Hex File¶
Rather than produce some form of “executable” file for this processor, the compiler and linker produce a data file containing exactly the binary code to be loaded into the memory of the processor. We will eventually use a loader program to put our code onto a real board. For mow, we will use this data file to load up our simulator’s memory.
Here is the “hex” data file produced:
1 2 3 4 5 6 7 8 9 10 11 | :1000000032C08091720090917300892BA9F0809189
:10001000760090917700880F991F805A9F4FFC01BE
:10002000808191812091740030917500820F931F1F
:10003000909375008093740080917200909173008A
:1000400021E0892B09F020E0822F90E0909373004B
:100050008093720080917600909177000196909342
:1000600077008093760080917600909177000997D1
:0400700044F20895B9
:100074000500030007000A002A00060016000F000E
:0200840020005A
:00000001FF
|
The format of this data file is pretty simple. Each line in the file contains exactly 16 bytes to be loaded into the processors program memory area. Here are the basic parts of each line, which Intel calls a “record”.
Here is a document detailing this data file format. It has been in use since the 1970s!
Here is a start on a C++ class that can read this data file. You can use this code to buuild your simulators memory load routine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | // Copyright 2019 Roie R. Black
#pragma once
#include <fstream>
#include <string>
class Loader {
public:
explicit Loader(std::string fn);
void parse(void);
private:
void _parse_line(std::string line);
std::string fname;
};
|
And here is the implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | // Copyright 2019 Roie R. Black
#include <iostream>
#include <fstream>
#include <string>
#include "Loader.h"
void Loader::_parse_line(std::string n) {
int len = n.size();
// check record mark
if (n[0] != ':') {
std::cout << "bad data line" << std::endl;
return;
}
// check record length
std::string bytes = n.substr(1, 2);
std::cout << "Byte count: " << bytes << std::endl;
// get load offset
std::string offset = n.substr(3, 4);
std::cout << "Offset: " << offset << std::endl;
// check record type
std::string record = n.substr(7, 2);
std::cout << "Record Type: " << record << std::endl;
// get data bytes
std::string data = n.substr(9, len-11);
std::cout << "Record data: " << data << std::endl;
// get checksum (not checked)
std::string check = n.substr(len-2, 2);
std::cout << "Checksum: " << check << std::endl;
}
Loader::Loader(std::string fn) {
fname = fn;
}
void Loader::parse(void) {
std::ifstream fin;
std::string line;
fin.open(fname, std::ios::in);
if (fin.is_open()) {
fin >> line;
while (!fin.eof()) {
std::cout << line << std::endl;
_parse_line(line);
fin >> line;
}
} else {
std::cout << "error reading file" << std::endl;
}
}
|
Warning
This code is not complete, you still need to convert the hex data into data you can actually load into your simulator’s memeory.
The Listing File¶
In this build system, the compiler does not generate the assembly listing we
might like to see. Instead another tool, obj-dump
generates a listing file,
and that is what we will peek at next:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
avr-sum.elf: file format elf32-avr
Disassembly of section .text:
00000000 <__ctors_end>:
0: 32 c0 rjmp .+100 ; 0x66 <__ctors_end+0x66>
2: 80 91 72 00 lds r24, 0x0072 ; 0x800072 <_edata>
6: 90 91 73 00 lds r25, 0x0073 ; 0x800073 <_edata+0x1>
a: 89 2b or r24, r25
c: a9 f0 breq .+42 ; 0x38 <__ctors_end+0x38>
e: 80 91 76 00 lds r24, 0x0076 ; 0x800076 <cnt>
12: 90 91 77 00 lds r25, 0x0077 ; 0x800077 <cnt+0x1>
16: 88 0f add r24, r24
18: 99 1f adc r25, r25
1a: 80 5a subi r24, 0xA0 ; 160
1c: 9f 4f sbci r25, 0xFF ; 255
1e: fc 01 movw r30, r24
20: 80 81 ld r24, Z
22: 91 81 ldd r25, Z+1 ; 0x01
24: 20 91 74 00 lds r18, 0x0074 ; 0x800074 <sum>
28: 30 91 75 00 lds r19, 0x0075 ; 0x800075 <sum+0x1>
2c: 82 0f add r24, r18
2e: 93 1f adc r25, r19
30: 90 93 75 00 sts 0x0075, r25 ; 0x800075 <sum+0x1>
34: 80 93 74 00 sts 0x0074, r24 ; 0x800074 <sum>
38: 80 91 72 00 lds r24, 0x0072 ; 0x800072 <_edata>
3c: 90 91 73 00 lds r25, 0x0073 ; 0x800073 <_edata+0x1>
40: 21 e0 ldi r18, 0x01 ; 1
42: 89 2b or r24, r25
44: 09 f0 breq .+2 ; 0x48 <__ctors_end+0x48>
46: 20 e0 ldi r18, 0x00 ; 0
48: 82 2f mov r24, r18
4a: 90 e0 ldi r25, 0x00 ; 0
4c: 90 93 73 00 sts 0x0073, r25 ; 0x800073 <_edata+0x1>
50: 80 93 72 00 sts 0x0072, r24 ; 0x800072 <_edata>
54: 80 91 76 00 lds r24, 0x0076 ; 0x800076 <cnt>
58: 90 91 77 00 lds r25, 0x0077 ; 0x800077 <cnt+0x1>
5c: 01 96 adiw r24, 0x01 ; 1
5e: 90 93 77 00 sts 0x0077, r25 ; 0x800077 <cnt+0x1>
62: 80 93 76 00 sts 0x0076, r24 ; 0x800076 <cnt>
66: 80 91 76 00 lds r24, 0x0076 ; 0x800076 <cnt>
6a: 90 91 77 00 lds r25, 0x0077 ; 0x800077 <cnt+0x1>
6e: 09 97 sbiw r24, 0x09 ; 9
70: 44 f2 brlt .-112 ; 0x2 <__ctors_end+0x2>
72: 08 95 ret
|
Notice an important detail here. Each line of assembly code is preceeded with the address where that instruction will be located in the instruction memory area, and the exact binary bits recorded there (in hex, of course).
Looking through this listing shows us that most instructions are 16-bits long, shown as four hex characters, and a few instructions are 32-bits long, shown as eight hex characters.
For our present work, we simply need ot look at the instructions used in implementing this program. Here is an alphabetical listing of those instructions:
MNEM
Op1
Op2
ADD
Rd
Rr
ADC
Rd
Rr
ADIW
Rd
K
AND
Rd
Rr
BREQ
k
BRLT
label
CALL
K
LDI
Rd
K
LDS
Rd
K
LSL
Rd
LSR
Rd
MOV
Rd
Rr
MOVW
Rd
Rr
NOT
Rd
OR
Rd
Rr
RET
RJMP
K
ROL
Rd
SBCI
Rd
K
SBIW
Rd
K
STS
K
Rr
SUBI
Rd
k
Obviously, we need to set our Fetch Unit
up to load each instruction and
pass those data bytes to the Decode Unit
. As mentioned earlier, we will ask
the Fetch Unit
to grab atwo chunks from the instruction meneory and let the
decoder logic figure out if the second chunk is needed.
That is not that many instructions to cope with! But, before we can get very far in this adventure, we need to see how each has been encoded by the manufacturer of this chip.
AVR Documentation¶
We have the pieces needed to build our Fetch Unit
for the simulator.
Fortunately, “fetching” is independent of the actual instructions we will be
processing. However, before w can go much further, we really need to look at
the actual chip documentation. Here are the files you will need:
ATtiny85 Data Sheet¶
This is the master document detailing everything inside this tiny chip. For now, all we are interested in is the Instruction Setfor this chip;
AVR Instructuction Summary¶
This document is a summary of all instructions supported by the AVR family of processors. This is more detailed data on those instructions, but remember, our tiny chip does not support all of the instructuons listed in this reference.
AVR Instruction Encoding¶
This last document provides information that can be produced with a bit of coding and the tools provided above. Basically, we want to discover how each instruction is encoded by the manufacturer. This information is essential in building our decoder, which is coming up next!
Python Listing File Processor¶
Just for fun (!), I put together short Python program that reads a listing file produced by our build system, and shows the binary encoding for each instruction it find. Try this code out and see what it produces:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # Extract code from .lst file
import os
import sys
def bin_format(hex, bits):
ival = int(hex, 16)
return f'{ival:0>{bits}b}'
fname = sys.argv[1]
fin = open(fname)
lines = fin.readlines()
for line in lines:
if len(line) < 5 or line[4] != ":": continue
parts = line.split('\t')
address = parts[0].strip()[:-1]
hex_bytes = parts[1].split()
mnemonic = parts[2]
try:
operands = parts[3]
except:
operands = ""
out_str = ""
out_str += bin_format(address,16)
out_str += ": "
code_len = len(hex_bytes)
for b in range(4):
if b < code_len: out_str += bin_format(hex_bytes[b],8) + ' '
else: out_str + " "
out_str += " " + parts[2] + " " + operands
print(out_str)
|