Execute Stage

Read time: 48 minutes (12014 words)

We have reached the point where we need to process some data. That all happens in the execute stage, and it can get complex. We have a bunch of instructions, each trying to manipulate some bits, and we have some components that can help do all of that. Specifically, we have a nice ALU unit. The problem is that we need to route the various parts of instructions, all those items we decoded previously, to that ALU, or past that ALU, depending on what needs to happen.

The only way to figure all this out is to study the instructions we want to execute.

Decoder Data

There are only a few different data items we need to decode in this set:


The range of numbers we decode is set by the number of bits in the instruction. The only puzzle is figuring out which of those numbers is a two’s complement number, meaning a signed number.

Positive values

  • k16: 0<= k16 <= 65536 (16_bits) Unsigned
  • k8: 0 <= k8 <= 255 (8-bits)unsigned
  • k12: -2048 <= k12 <= 2047 (12_bits) signed
  • k7: -64 <= k7 <= 63 (7-bits) signed
  • A8: I/O Port address (8 bits) unsigned

The instructions where we are dealing with signed data are all involved in branching. All of those “k” constants will end up being added to the PC value.

The I/O instructions reference a port number, limited to a value between 0 and 63 (unsigned). We could keep that number separate from the other data items. However, in examining the instruction set, there is never a time when an instruction references both A and k, so we could deal with that port address as just another constant.

We also know that every instruction will alter the value in the PC by one, maybe more if we are referencing something in data memory. The PC will also be modified by branching instructions, and we have some math to do if that is the case.

All arithmetic operations involving either the PC register value or the SP register value are 16-bit operations. Perhaps we need to just let our ALU do 16-bit math if needed. We could also set up two ALU units, one for 16 bit math, and another for 8-bit math.

Our decoder needs to tell us which instruction we have, and provide four other pieces of data: Rs, Rd, and two constants: “A” and “k”. Whoops, there are four variants of “k” in this set! ALl are signed numbers, so the size is not really important, We will end up treating them as a 16-bit signed number when we do our address calculations.


The number of bits decoded for each constant limits the range of possible values. What we do need to do is note the sign of that constant. In our simulator, we will need to pay attention to the sign when we proces the constant.

Instruction Set Architecture

Up to this point in our processing, the action in each stage has been pretty generic, and would be very similar in any processor you wish to study. However, things can change as we work through the rest of the machine. The exact parts we select for the next stage and how we organize them is driven by the instruction set chosen, not the other way around. That is why this is called “instruction set architecture”

Designers of chips study the applications the chip is intended to support. In doing that, they introduce instructions they feel will assist in those applications. Companies like Atmel, who designed the AVR family, came up with a set of instructions, then proceeded to manufacture a range of chips all supporting a subset of all available instructions. The idea was to offer chips best suited to a range of intended applications, not all of them. Customers chould select the chip that best suited their needs from that set.

This whole idea is under attack now, since any company that can build a chip is all set up to build custom chips based on a customers needs directly. There is some overhead cost involved with setting up the new chip design, but that is a software issue, not a hardware one. Switching software is easy enough that you can approach any chip manufacturer with a Verilog design you have put together and get one prototype, to 10,000 chips in short order!

That is scary to companies like Intel, who has spent years developing a complex chip designed to handle “all” processing needs. (You should know that attempts to use Pentium chips in consumer gadgets have failed to gain much attention. Today, ARM rules that market!)

Routing our Data

For each instruction we have selected, we need to see where the bits need to go to get through the Execute Stage. Lets work on them one by one!


This instruction has a constant that will be added to the PC coming out of the decoder. If the decoder has done its job, it will already have been incremented, so all we need to do is add the constant k to the PC, but only if the Z flag tells us to do that. So, we need to route PC to one side of our ALU, and k to the other. (Whoops!, this is a 16-bit operation! We cannot use an 8-bit ALU without resorting to some trickery!) The final PC value leaving this stage and will be routed back to the input to the fetch stage later.


We subtract a constant (one) from a register value in this one. All we need to do is route that register item to one side of the ALU, and come up with a constant (One) to route to the other side. The result will end up routed back to the register memory in the next step


We could avoid bothering the ALU if we just add a “decrement” unit to the execute stage. Those are simple to implement. We probably could use an “increment” unit as well.


This is a classic two operand ALU instruction. One register will be fed t one side of the ALU, the other to the opposite side. We need to send a code to the ALU telling it what instruction we want (which we really needed to do in our previous instructions as well), then the final result simply leave this stage to find its way back to the register memory later.


IN (and OUT) both reference a constant port number. These I/O ports are just normal 8-bit memory units, except when we work with the memory cells the actual data is either leaving the chip (OUT) or arriving at the chip (IN) from the outside world. These memory cells will be what we need to connect to our Graphical widget set to see interesting things happen.

Both of these instructions reference a register, but they do no processing. That means nothing here needs to reach the ALU. The data will either be written back into a register, or into the I/O memory cell on the next stage.


This instruction has a constant that will be passed back into a register untouched by the ALU.


These instructions manipulate the stack. They both alter the program counter as well The RCALL instruction does this by by adding the constant to that value. RET simply accesses the stack and puts that value back in the SP register. Just as we saw for BRNE we need to use the ALU to do the math, so the PC and that constant need to be routed to the ALU. The stack pointer math is another increment of decrement operation, but this one needs to increment or decrement bt two, since the stack will live at the top of the 8-bit data memory. Perhaps we need a special unit that can add or subtract one or two depending on our needs.


RJMP works exactly like RCALL, except it does not need to deal with the stack.


This instruction simple passes k along to the store unit, along with the register data. That data item will be written into the data memory at the address specified by k. No math involved.


Let’s study the RTL definitions that tell us what needs to happen in this stage. We will ignore the basic need to add one to the PC for almost all of these, and let that happen in the decoder. That means that when the execute stage sees the PC register value, it has already been incremented by one.

Z == 0 ? PC <- PC + k7 x  
Rd <= Rd - 1   x
Rd <- Rd ^ Rs   x
Rd <- A8   X
Rd <- k8    
A <- Rs    
[SP] <- PC, SP <- SP - 2, PC <- PC + K x  
PC <- [SP], SP <- SP + 2    
PC <- PC + K x  
[k16] <- Rs, PC <- PC + 1 x  

The only concern we really have to start this design is figuring out exactly what data items we will need to pass to the ALU:

  • PC + k
  • Rd op Rs
  • Rs add/sub 1
  • SP add/sub 2

Stack Management

Formally, the stack is just a piece of the data memory in out machine, and managing that stack is handled by setting up a stack pointer (the SP register, which we saw is implemented as two 8-bit registers: SPH and SPL. We normally do not manage these registers directly, instructions do that. For our simulator, we can invent a stack module and not worry about that register, The normal PUSH and POP stack operations can be set up, and we will not “execute” anything to update the stack pointer. We will leave that updating to the control unit.

ALU Operations

For the other ALU operations, we see that we need to feed the ALU from multiple sources. That means we need to select from the data provided by the decoder and route the proper items to the ALU inputs. Here is a typical setup:

Tracing Instructions

The best way to make sure all instructions can make it through the Execute` Stage is to trace the flow of the data we need to move through that stage.

Here is a working diagram, showing a bunch of parts we need to hook together to make this stage work.


This is a work in progress. I will update this as the design stabilizes. Also note that this diagram is not broken up into distinct stages. We will add that after we get the basic data flow figured out.


Let’s focus in on just the part of this diagram concerned with the Execute Stage:



This instruction alters the value of PC, using the ALU (16-bit). Here is the flow we need:


Making sure the multiplexors are set correctly is a job for the controller. The decoder will have identified this instructions, providing the information needed to make this all happen when the execute stage is activated.


Here is the data flow for this instruction:


I have seen a lot of diagrams where the multiplexor feeding the ALU has a constant value on one side. We can just teach our decoder to generate the needed constant value when it decodes this instruction.


This is one of many simple two register ALU operations. ALl generate an 8-bit result:



Both of these instructions simply move data between registers and ports. We will consider them during the final stage of processing.


This instruction passes a constant on to the store stage, so there is not much to see here.


In this instruction, we need to modify the program counter by adding the constant to the current (updated) counter value. This operation is identical to that shown for BRNE so we do not need to see anything new here.


Again, this instruction is going to update the program counter. However, the current value in that (updated) program counter points to the next instruction. We will be saving that value, unmodified, on the stack. After that, we again update the current value of the program counter by adding the constant. This is another operation already covered by the BRNE setup.


This instruction will update the PC using data popped off of the stack. No processing will occur here.


In this last instruction, we will be storing a data item locate din some register, and storing it in a location defined by the 16-bit constant. There is no processing going on, so all we need to do it route these items through the execute stage to store, where the final actions will occur.

Not Too Bad

We have managed to put together a simple Execute Stage using only our ALU and two multiplexors. We have verified that we can route the signals where they need to go. All we need to do to complete this action is teach the control unit how to set the selectors that route the signals, then work through all the “tick” calls to make things happen. ,