# AVR Instructions¶

Read time: 21 minutes (5316 words)

The AVR processor family supports around 130 unique instructions. We could simply start implementing each of them, one at a time, as we build our simulator. But that would take more effort than we need in this class. Since we have a nice example C program to use as a starting point, let’s ask the avr-gcc assembler to help us select instructions to implement in our simulator.

To generate AVR assembly language from a high-level C program, we need a few more components for our Modular Make setup. For now, keep this Makefile setup seprate from the one we aare using for the C++ project (I will get the two merged together soon!)

Here are the new Makefile components we need:

Makefile
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 # Makefile for AVR projects TARGET := $(shell basename$(PWD)) MCU := attiny85 FREQ := 16000000L PGMR := arduino include mk/os_detect.mk include mk/avr-tools.mk include mk/avr-files.mk # check these settings after plugging in board ifeq ($(PLATFORM), Mac) PORT := /dev/cu.usbmodem1411 else ifeq ($(PLATFORM), Linux) PORT := /dev/ttyACM0 else PORT := COM6 endif endif # do not modify anything below this line .SUFFIXES: -include mk/avr-build.mk -include mk/avr-utils.mk -include mk/help.mk -include mk/debug.mk -include mk/version.mk 
mk/avr-files.mk
  1 2 3 4 5 6 7 8 9 10 11 # source files CSRCS := $(wildcard src/*.c) CXXSRCS :=$(wildcard src/*.cpp) SSRCS := $(wildcard src/*.S) # required object files COBJS :=$(CSRCS:.c=.o) CXXOBJS := $(CXXSRCS:.cpp=.o) SOBJS :=$(SSRCS:.S=.o) OBJS := $(COBJS)$(CXXOBJS) $(SOBJS) LST :=$(TARGET).lst 
mk/avr-tools.mk
 1 2 3 4 5 6 7 # tools - these should be able to run on command line GCC := avr-gcc GXX := avr-g++ OBJDUMP := avr-objdump OBJCOPY := avr-objcopy DUDE := avrdude 
mk/avr-utils.mk
 1 2 3 4 5 6 7 8 # utility targets .PHONY: load load: $(TARGET).hex ## Load hex file using avrdude$(DUDE) $(DUDECONF)$(UFLAGS) -Uflash:w:$(TARGET).hex:i .PHONY: clean clean: ## remove build artifacts$(RM) *.hex *.lst *.elf $(OBJS)  mk/avr-build.mk   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 # loader flags UFLAGS := -v -D -p$(MCU) -c$(PGMR) UFLAGS += -P$(PORT) UFLAGS += -b115200 # c compiler flags CFLAGS := -Iinclude CFLAGS += -c -Os -mmcu=$(MCU) CFLAGS += -DF_CPU=$(FREQ) # link flags LFLAGS := -mmcu=$(MCU) LFLAGS += -nostartfiles # build targets .PHONY all: all:$(TARGET).hex $(LST) # implicit build rules %.hex: %.elf$(OBJCOPY) -O ihex -R .eeprom $<$@ %.elf: $(OBJS)$(GCC) $(LFLAGS) -o$@ $^ %.o: %.cpp$(GXX) -c $(CFLAGS) -o$@ $< %.o: %.c$(GCC) -c $(CFLAGS) -o$@ $< %.o: %.S$(GCC) -c $(CFLAGS) -o$@ $< %.lst: %.elf$(OBJDUMP) -C -d $< >$@ 

You should add the debug.mk, help.mk and os_detect.mk files, with their Python helper files as well. Test your setup by running this command:

$make help load: Load hex file using avrdude clean: remove build artifacts help: display help messages debug: display local make variables defined debug-all: display all make variables defined  For now, we will skip discussing exactly what is happening in this build system. ## Example AVR Assembly Language¶ To demonstrate how this system works, here is a streamlined example of AVR assembly language distilled from compiling the C code we looked at earlier: Note It is not important that you understand this code for now, we will get to that. For now, all we are focusing on is how we will build programms written in this strange new language! src/avr-sum.S   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 ; set up global data area ============================================= .data ; set up "data" array data: .word 5 .word 3 .word 7 .word 10 .word 42 .word 6 .word 22 .word 15 .word 32 ; set up uninitialized "cnt" and "sum" variables .comm cnt,2,1 .comm sum,2,1 ; set up initialized variable "odd" .section .bss odd: .zero 2 ; initialize 16-bits with zero ; program code starts here ==================================== .text main: rjmp .L2 .L5: lds r24,odd ; load "odd" into r24,r25 lds r25,odd+1 or r24,r25 ; OR the two bytes (why?) breq .L3 ; branch if equal to zero lds r24,cnt ; load "cnt" into r24,r25 lds r25,cnt+1 lsl r24 ; left shift, high bit into carry flag rol r25 ; rotate left, carry enters low bit (16-bit *2) subi r24,lo8(-(data)) ; 16-bit subtract r24,r25 from (-data[i]) HUH? sbci r25,hi8(-(data)) movw r30,r24 ; save result in "z" (r30,r21) ld r24,Z ; 16-bit move (data[Z] -> r24,r25) ldd r25,Z+1 lds r18,sum ; get current sum into r18,r19 lds r19,sum+1 add r24,r18 ;add lo(sum) + lo(data[i]) adc r25,r19 ; add with carry (hi(sum) + hi(data[i]) sts sum+1,r25 ; put result back in r24,r24 sts sum,r24 .L3: lds r24,odd ; load 16_bit odd into r24,r24 lds r25,odd+1 ldi r18,lo8(1) ; set up 16_bit "1" or r24,r25 ; see if this is zero breq .L4 ; if so, branch ldi r18,0 ; set r18 to zero .L4: mov r24,r18 ; save r18 inro r24 ldi r25,0 ; set r25 to zero sts odd+1,r25 ; save result in "odd" sts odd,r24 lds r24,cnt ; load "cnt" into r24,r25 lds r25,cnt+1 adiw r24,1 ; add 1 to 16-bit value in r24,e25 sts cnt+1,r25 ; save result back in "cnt" sts cnt,r24 .L2: lds r24,cnt ; load 16-bit z'cntz' into r24, r25 lds r25,cnt+1 sbiw r24,9 ; 16-bit subtract "9" from r24,r26 brlt .L5 ; branch if less ; end of program, but where do we go? ================================= ret  Note Notice that we name our assembly language with a .S extension. This is important because it tells the compiler that a human wrote this. Compiler generated assembly language files end with a lower case .s. Let’s run the build system and get a look at these files: $ make clean
rm -f *.hex *.lst  *.elf   src/avr-sum.o

\$ make
avr-gcc -c -Iinclude -c -Os -mmcu=attiny85 -DF_CPU=16000000L -o src/avr-sum.o src/avr-sum.S
avr-gcc -mmcu=attiny85 -nostartfiles -o cosc2325.elf src/avr-sum.o
avr-objcopy -O ihex -R .eeprom cosc2325.elf cosc2325.hex
avr-objdump -C -d cosc2325.elf > cosc2325.lst
rm src/avr-sum.o cosc2325.elf


After these commands run, you will find two important files that have been constructed by the build system:

• avr-sum.hex - a file ready to load on a real AVR board
• avr-sum.lst - a listing file showing the assembly language produced from your code

## Intel Hex File¶

Rather than produce some form of “executable” file for this processor, the compiler and linker produce a data file containing exactly the binary code to be loaded into the memory of the processor. We will eventually use a loader program to put our code onto a real board. For mow, we will use this data file to load up our simulator’s memory.

Here is the “hex” data file produced:

avr-sum.hex
  1 2 3 4 5 6 7 8 9 10 11 :1000000032C08091720090917300892BA9F0809189 :10001000760090917700880F991F805A9F4FFC01BE :10002000808191812091740030917500820F931F1F :10003000909375008093740080917200909173008A :1000400021E0892B09F020E0822F90E0909373004B :100050008093720080917600909177000196909342 :1000600077008093760080917600909177000997D1 :0400700044F20895B9 :100074000500030007000A002A00060016000F000E :0200840020005A :00000001FF 

The format of this data file is pretty simple. Each line in the file contains exactly 16 bytes to be loaded into the processors program memory area. Here are the basic parts of each line, which Intel calls a “record”.

Here is a document detailing this data file format. It has been in use since the 1970s!

Here is a start on a C++ class that can read this data file. You can use this code to buuild your simulators memory load routine:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // Copyright 2019 Roie R. Black #pragma once #include #include class Loader { public: explicit Loader(std::string fn); void parse(void); private: void _parse_line(std::string line); std::string fname; }; 

And here is the implementation:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 // Copyright 2019 Roie R. Black #include #include #include #include "Loader.h" void Loader::_parse_line(std::string n) { int len = n.size(); // check record mark if (n[0] != ':') { std::cout << "bad data line" << std::endl; return; } // check record length std::string bytes = n.substr(1, 2); std::cout << "Byte count: " << bytes << std::endl; // get load offset std::string offset = n.substr(3, 4); std::cout << "Offset: " << offset << std::endl; // check record type std::string record = n.substr(7, 2); std::cout << "Record Type: " << record << std::endl; // get data bytes std::string data = n.substr(9, len-11); std::cout << "Record data: " << data << std::endl; // get checksum (not checked) std::string check = n.substr(len-2, 2); std::cout << "Checksum: " << check << std::endl; } Loader::Loader(std::string fn) { fname = fn; } void Loader::parse(void) { std::ifstream fin; std::string line; fin.open(fname, std::ios::in); if (fin.is_open()) { fin >> line; while (!fin.eof()) { std::cout << line << std::endl; _parse_line(line); fin >> line; } } else { std::cout << "error reading file" << std::endl; } } 

Warning

This code is not complete, you still need to convert the hex data into data you can actually load into your simulator’s memeory.

## The Listing File¶

In this build system, the compiler does not generate the assembly listing we might like to see. Instead another tool, obj-dump generates a listing file, and that is what we will peek at next:

avr-sum.lst
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47  cosc2325.elf: file format elf32-avr Disassembly of section .text: 00000000 <__ctors_end>: 0: 32 c0 rjmp .+100 ; 0x66 <__ctors_end+0x66> 2: 80 91 72 00 lds r24, 0x0072 ; 0x800072 <_edata> 6: 90 91 73 00 lds r25, 0x0073 ; 0x800073 <_edata+0x1> a: 89 2b or r24, r25 c: a9 f0 breq .+42 ; 0x38 <__ctors_end+0x38> e: 80 91 76 00 lds r24, 0x0076 ; 0x800076 12: 90 91 77 00 lds r25, 0x0077 ; 0x800077 16: 88 0f add r24, r24 18: 99 1f adc r25, r25 1a: 80 5a subi r24, 0xA0 ; 160 1c: 9f 4f sbci r25, 0xFF ; 255 1e: fc 01 movw r30, r24 20: 80 81 ld r24, Z 22: 91 81 ldd r25, Z+1 ; 0x01 24: 20 91 74 00 lds r18, 0x0074 ; 0x800074 28: 30 91 75 00 lds r19, 0x0075 ; 0x800075 2c: 82 0f add r24, r18 2e: 93 1f adc r25, r19 30: 90 93 75 00 sts 0x0075, r25 ; 0x800075 34: 80 93 74 00 sts 0x0074, r24 ; 0x800074 38: 80 91 72 00 lds r24, 0x0072 ; 0x800072 <_edata> 3c: 90 91 73 00 lds r25, 0x0073 ; 0x800073 <_edata+0x1> 40: 21 e0 ldi r18, 0x01 ; 1 42: 89 2b or r24, r25 44: 09 f0 breq .+2 ; 0x48 <__ctors_end+0x48> 46: 20 e0 ldi r18, 0x00 ; 0 48: 82 2f mov r24, r18 4a: 90 e0 ldi r25, 0x00 ; 0 4c: 90 93 73 00 sts 0x0073, r25 ; 0x800073 <_edata+0x1> 50: 80 93 72 00 sts 0x0072, r24 ; 0x800072 <_edata> 54: 80 91 76 00 lds r24, 0x0076 ; 0x800076 58: 90 91 77 00 lds r25, 0x0077 ; 0x800077 5c: 01 96 adiw r24, 0x01 ; 1 5e: 90 93 77 00 sts 0x0077, r25 ; 0x800077 62: 80 93 76 00 sts 0x0076, r24 ; 0x800076 66: 80 91 76 00 lds r24, 0x0076 ; 0x800076 6a: 90 91 77 00 lds r25, 0x0077 ; 0x800077 6e: 09 97 sbiw r24, 0x09 ; 9 70: 44 f2 brlt .-112 ; 0x2 <__ctors_end+0x2> 72: 08 95 ret 

Notice an important detail here. Each line of assembly code is preceeded with the address where that instruction will be located in the instruction memory area, and the exact binary bits recorded there (in hex, of course).

Looking through this listing shows us that most instructions are 16-bits long, shown as four hex characters, and a few instructions are 32-bits long, shown as eight hex characters.

For our present work, we simply need ot look at the instructions used in implementing this program. Here is an alphabetical listing of those instructions:

MNEM Op1 Op2
AND Rd Rr
BREQ k
BRLT label
CALL K
LDI Rd K
LDS Rd K
LSL Rd
LSR Rd
MOV Rd Rr
MOVW Rd Rr
NOT Rd
OR Rd Rr
RET
RJMP K
ROL Rd
SBCI Rd K
SBIW Rd K
STS K Rr
SUBI Rd k

Obviously, we need to set our Fetch Unit up to load each instruction and pass those data bytes to the Decode Unit. As mentioned earlier, we will ask the Fetch Unit to grab atwo chunks from the instruction meneory and let the decoder logic figure out if the second chunk is needed.

That is not that many instructions to cope with! But, before we can get very far in this adventure, we need to see how each has been encoded by the manufacturer of this chip.

## AVR Documentation¶

We have the pieces needed to build our Fetch Unit for the simulator. Fortunately, “fetching” is independent of the actual instructions we will be processing. However, before w can go much further, we really need to look at the actual chip documentation. Here are the files you will need:

### ATtiny85 Data Sheet¶

This is the master document detailing everything inside this tiny chip. For now, all we are interested in is the Instruction Setfor this chip;

### AVR Instructuction Summary¶

This document is a summary of all instructions supported by the AVR family of processors. This is more detailed data on those instructions, but remember, our tiny chip does not support all of the instructuons listed in this reference.

### AVR Instruction Encoding¶

This last document provides information that can be produced with a bit of coding and the tools provided above. Basically, we want to discover how each instruction is encoded by the manufacturer. This information is essential in building our decoder, which is coming up next!

## Python Listing File Processor¶

Just for fun (!), I put together short Python program that reads a listing file produced by our build system, and shows the binary encoding for each instruction it find. Try this code out and see what it produces:

code-extractor.py
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # Extract code from .lst file import os import sys def bin_format(hex, bits): ival = int(hex, 16) return f'{ival:0>{bits}b}' fname = sys.argv[1] fin = open(fname) lines = fin.readlines() for line in lines: if len(line) < 5 or line[4] != ":": continue parts = line.split('\t') address = parts[0].strip()[:-1] hex_bytes = parts[1].split() mnemonic = parts[2] try: operands = parts[3] except: operands = "" out_str = "" out_str += bin_format(address,16) out_str += ": " code_len = len(hex_bytes) for b in range(4): if b < code_len: out_str += bin_format(hex_bytes[b],8) + ' ' else: out_str + " " out_str += " " + parts[2] + " " + operands print(out_str) `