A hardware component in the processor that is small, but fast!

To achieve a high cache hit rate, data blocks need to be dynamically transferred between the cache and main memory (i.e. principle of locality)


--> The tag is often the rest of the bits that weren't used to select the cache index

Cache Too Small?

If the cache is too small / block size is too big - we may run into a 'thrashing' issue- where the contents in the cache may start to constantly change -> negative impact as there will be a ~100% conflict miss.

i.e. if there are 4 cache blocks, but 5 items that need to be accessed. After the first four cache items populate, the fifth block will replace the first item; but then the first block will need to be read from the cache

  • Solution 1 - Increase the cache size
  • Solution 2 - Multiple entries for a memory block (a memory block can map to multiple cache blocks) -> refer to associative cache

Cache Structures

Direct Mapped Cache

A memory block can map to only one cache location

For a cache of 2^m blocks, a memory block with address X maps to the cache location X mod 2^m --> The least significant m bits of the memory block address

Multiple addresses will map to the same cache block (and compete for use)

Set Associative Cache

n-way set associative cache

The cache is divided into sets, each that contains n blocks. A memory block is mapped to a set, and can be stored in any location in the set.

  • n comparators are required to search a block in a set

Given an 8-block cache, how is the memory block 12 placed in the cache using a 2-way set associative cache?

There are 4 sets of 2-block caches, so 12 % 4 = 0 -> Set 0 will be used

  • 2 -> Block size is 4 byte = 2^2
  • 3 -> 64 byte cache / (2 * 4 byte blocks) = 8 sets of 2 blocks; 8 = 2^3

Fully Associative Cache

A memory block can be mapped to any location in the cache; however this increases the cache search time, as the full cache needs to be searched for a memory block

Cache Concerns

Where to put a memory block in cache?

Block Placement Strategy

  • Direct mapped - only one location
  • Fully associative - Anywhere
  • Set associative - Any location in a set

How to find a memory block in cache?

Block Identification

( 2^(n-2) )

If there are no free spaces, which block will be replaced?

Block Replacement

  • For a direct mapped cache - as only one location can be used, the contents at that location must be replaced
  • For set associative / fully associative
    • Random - random location [of the set] chosen
    • Least Recently Used
    • "optimal"

  • The larger the cache size, the lower the cache miss rate (more data can be cached)
  • The higher the associativity (2 way -> 4 way -> 8 way -> etc) the lower the cache miss rate
  • The cache miss rate for LRU is less than the miss rate for random replacement

When memory data is updated, how is the cache involved?

Write Strategy

  • Cache hit -> Write through
    • Written to both cache and memory
    • (+) Read misses don't result in writes
    • (-) Potentially high memory traffic
  • Cache hit -> Write back
    • Written only to cache
    • The modified cache block is written to memory only when it is replaced
    • Dirty bit is required to facilitate this functionality
    • (+) memory accesses
    • (-) cache coherence issue
  • Cache miss -> Write allocate
    • Fetch-on-write
      • Fetch data from memory
      • Write to the cache
    • Not-fetch-on-write
      • Do not write to the memory
      • Write to cache
      • The cache block is invalid, except for the data word that is written
  • Cache miss -> Write not allocate
    • Skip the cache

Cold Start Miss

Compulsory miss that occurs when a block is first accessed (hence has never been cached)

Conflict Miss (Collision Miss)

For associative caches, occurs when blocks must evict another block in a set

Capacity Miss

Fully associative cache is filled completely

Write Buffer

To reduce the impact of slow memory access, a write buffer (FIFO) can be used.

  • Processor writes data to the write buffer
  • Memory controller writes contents of the buffer to the memory
  • Typically has 4 entries
  • Works if write frequency << 1 / DRAM write cycle

Buffer Overflow

If write frequency > 1 / DRAM write cycle then a buffer overflow may occur (the write buffer is filled up and more data comes in)

A solution is to implement a second cache (L2)

Cache Design

Note: The Compare Tag stage might take time - could split it into smaller stages, so that the clock cycle time can be reduced