| Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |
|
|||||
| First Prev [ 1 2 3 4 5 6 ] Next Last |
The diagram to the right shows two memories. Each location in each memory has a datum, which in different designs ranges in size from 8 to 128 bytes. Each location in each memory also has an index, which is a unique number used to refer to that location. The index for locations in main memory is called the address. Each location in the cache has a tag, which contains the address of a datum in main memory which has been cached. In a CPU's data cache, these entries are called cache lines or cache blocks. No two entries of the cache can hold the same tag.
When the processor wishes to read or write a location in main memory, it first checks the main memory address against the tags in the cache. If one of the tag entries matches (a cache hit), the processor reads or writes the corresponding data in that entry rather than doing the same to main memory. The time taken by this operation is known as the cache's latency.
If none of the tag entries match, the reference is a miss. Most caches allocate a new entry on a miss, and fill it with the tag just missed, and a copy of the data from memory, after which the reference can be applied to the new entry just as in the case of a hit. Misses are slow because they require the data from main memory, and the slowness of that memory is, of course, the reason for the cache in the first place. The proportion of accesses that hit in the cache is known as the hit rate.
In order to make room for the new entry on a miss, the cache generally has to evict one of the existing entries. The heuristic that it uses to choose the entry to evict is called the replacement policy. The fundamental problem with any replacement policy is that it must predict which existing cache entry is least likely to be used in the future. Predicting the future is hard, especially for hardware caches which use simple rules amenable to implementation in circuitry, so there are a variety of replacement policies to choose from and no perfect way to decide among them. One popular replacement policy, LRU, is to replace the least recently used entry.
When data is written to the cache, it must at some point be written to main memory as well. The timing of this write is controlled by what is known as the write policy. In a write-through cache, every write to the cache causes a write to main memory. Alternatively, in a write-back cache, writes are not immediately mirrored to memory. Instead, the cache tracks which locations have been written over (these locations are marked dirty). The data in these locations is written back to main memory when that data is evicted from the cache. For this reason, a miss in a write-back cache will often require two memory accesses to service.
There are intermediate policies as well. The cache may be write-through, but the writes may be held in a store data queue temporarily, usually so that multiple stores can be processed together (which can reduce bus turnarounds and so improve bus utilization).
The data in main memory being cached may be changed by other entities, in which case the copy in the cache may become out-of-date or stale. Alternatively, when the CPU updates the data in the cache, copies of that data in other caches will become stale. Communication protocols between the cache managers which keep the data consistent are known as coherency protocols.
Modern CPUs can execute hundreds of instructions in the time taken to fetch a datum from memory. Various techniques have been employed to keep the CPU busy during this time. Out-of-order CPUs ( Pentium Pro and later Intel designs, for example) attempt to execute independent instructions after the instruction which is waiting for the cache miss data. The Pentium 4 uses Simultaneous multithreading ( HyperThreading in Intel's terminology) to allow a second program to use the CPU while a first program waits for data to come from main memory.
Recall that the replacement policy decides where in the cache a copy
of a particular entry of main memory will go. If the replacement
policy is free to choose any entry in the cache to hold the copy, the
cache is called fully associative. At the other extreme, if each
entry in main memory can go in just one place in the cache, the cache
is direct mapped. Many caches implement a compromise, and are
described as set associative. For example, the level-1 data cache
in an AMD Athlon is 2-way set associative, which means that a location
in main memory can be cached in any of 2 locations in the level-1 data
cache.
Associativity is a tradeoff. If there are ten places the replacement policy can put a new cache entry, then when the cache is checked for a hit, all ten places must be checked. Checking more places takes more power and area. On the other hand, caches with more associativity suffer fewer misses (see conflict misses, below), but generally have larger latency and more power dissipation. The rule of thumb is that doubling the associativity has about the same effect on hit rate as doubling the cache size, from 1-way (direct mapped) to 4-way. Associativity increases beyond 4-way have much less effect on the hit rate, and are generally done for other reasons (see virtual aliasing, below).
One of the advantages of a direct mapped cache is that it allows simple and fast speculation. Once the address has been computed, the one cache index which might have a copy of that datum is known. That cache entry can be read, and the processor can continue to work with that data before it finishes checking that the tag actually matches the requested address.
The idea of having the processor use the cached data before the tag match completes can be applied to associative caches as well. A subset of the tag, called a hint, can be used to pick just one of the possible cache entries mapping to the requested address. This datum can then be used in parallel with checking the full tag. The hint technique works best when used in the context of address translation, as explained below.