Inventing the Cache

Read time: 12 minutes (3180 words)

After studying the access patterns, and looking at the timing data, it occurred to designers that we could extend this idea and create a way to manage memory that is dramatically different from the programmer’s traditional view.

The cool thing about this idea is that the programmer will never see it, and the machine will be more flexible.

Suppose we just inject a translation component between the addresses the programmer asks for, and the location of that item in real memory.

We need new terminology:

  • logical address - what the programmer wants

  • physical address - where that address really is

In the traditional view of memory, those two addresses are identical, but they do not need to be.

In fact, with this translation unit, the memory store could be any place we choose, even somewhere on another system we can reach over a network. Of course performance might not be so nice, but it would work.

Let’s create a translation unit:

Manager.h
#pragma once

class Manager {
    public:
        Manager();

        uint64_t read( uint64_t addr );
        void debug( uint64_t addr );
    private:
        void init( void );

        // size the memory area
        static const uint16_t tag_bits = 8; 
        static const uint16_t offset_bits = 9;

        static const uint16_t block_size = 1 << offset_bits;
        static const uint16_t num_blocks = 1 << tag_bits;
        static const uint64_t max_address = block_size * num_blocks;

        // create the memory area
        uint8_t memory[num_blocks][block_size];

        // for debuggubg
        uint32_t current_addr;
        uint32_t tag;
        uint16_t offset;
};

This is pretty basic. We define a constructor that will manufacture a storage area out of a two dimensional unsigned byte array. The memory area is being modeled as a list of blocks, each a specified size. Each block has a number we can use as an address. We will take the user’s address, properly split up, and use part of that as the block address. The rest of the user address will form the offset within that block to the byte (or bytes) within that block they really want. At this point, our memory is not that much different from simple array, but, as we shall see, we gain a lot of flexibility in how we manage this store by doing this.

The implementation of this class is also pretty simple.

Manager.cpp
#include <iostream>
#include "Manager.h"

// constructor
Manager::Manager() {
    init();
}

// load all blocks with test data
void Manager::init( void ) {
    int k = 0;
    for( int i=0; i<num_blocks; i++ )
        for( int j=0; j<block_size; j++ )
            memory[i][j] = k++;
}

// accessor
uint64_t Manager::read( uint64_t addr ) {
    current_addr = addr;
    tag = addr >> offset_bits;
    offset = addr & (block_size -1);
    return memory[tag][offset];
}
    
// debug routines
void Manager::debug( uint64_t addr ) {
    read(addr);
    std::cout << "address: " << addr << std::endl;
    std::cout << "\ttag: " << tag << std::endl;
    std::cout << "\toffset: " << offset << std::endl;
}


Just to check that things are working, here ia a start on a test application that checks the addressing mechanism.

main.h
#include <iostream>
#include "Manager.h"

int main( void ) {
    std::cout << "Memory Manager (v1)" << std::endl << std::endl;
    Manager m;

    uint64_t addr = 1042;
    m.read(addr);
    m.debug(addr);
}

By now you should know that building a Makefile is an essential part of your project.

Makefile
SRCS 	:= $(wildcard *.cpp)
OBJS 	:= $(SRCS:.cpp=.o)
TGT	 	:= test
CFLAGS	:= -std=c++11 -I .

all:	$(TGT)

$(TGT):	$(OBJS)
	g++ -o $@  $^

%.o:	%.cpp
	g++ -c $(CFLAGS) -o $@ $<

clean:
	rm -f $(TGT) $(OBJS)

Here is the initial output:

Memory Manager (v1)

address: 1042
        tag: 2
        offset: 18

Adding Fast Memory

The important idea is very simple. If we have a large block of slow memory where all of our data sits, and we want to speed things up, let’s add a small block of fast memory between main memory and the processor. We can then stream main memory as a sequence of blocks of data, using the scheme shown above. As the user accesses data anywhere in memory, we will determine what block that item is located in, them make sure that block is copied from slow memory to the faster cache memory. We will then return the data item from fast memory.

Future accesses, which we hope obey the principle of locality will come from the cache. That is until they actually live in a different block. At that point, we need to pitch the current block, and load it from main memory.

Modeling the Memory Unit with Cache

We can now expand this to allow the Manager to sit between the user and the physical memory. The user still hands us “logical addresses”, but we translate that into something different.

We can model main memory as a large static array with slow access numbers, then add a smaller array that holds one or more blocks of faster memory. The code can be extended to give us a total delay time, that will be used to determine when the user can access the data requested. Calls to the memory unit’s ready function fill use this number to control the timing in that unit.