Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 The Basics
- 3 Superscalar Processors
- 4 Front-End: Branch Prediction, Instruction Fetching, and Register Renaming
- 5 Back-End: Instruction Scheduling, Memory Access Instructions, and Clusters
- 6 The Cache Hierarchy
- 7 Multiprocessors
- 8 Multithreading and (Chip) Multiprocessing
- 9 Current Limitations and Future Challenges
- Bibliography
- Index
- References
6 - The Cache Hierarchy
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 The Basics
- 3 Superscalar Processors
- 4 Front-End: Branch Prediction, Instruction Fetching, and Register Renaming
- 5 Back-End: Instruction Scheduling, Memory Access Instructions, and Clusters
- 6 The Cache Hierarchy
- 7 Multiprocessors
- 8 Multithreading and (Chip) Multiprocessing
- 9 Current Limitations and Future Challenges
- Bibliography
- Index
- References
Summary
We reviewed the basics of caches in Chapter 2. In subsequent chapters, when we looked at instruction fetch in the front-end and data load–store operations in the back-end, we assumed most of the time that we had cache hits in the respective first-level instruction and data caches. It is time now to look at the memory hierarchy in a more realistic fashion. In this chapter, our focus is principally on the cache hierarchy.
The challenge for an effective memory hierarchy can be summarized by two technological constraints:
With processors running at a few gigahertz, main memory latencies are now of the order of several hundred cycles.
In order to access first-level caches in 1 or 2 cycles, their size and associativity must be severely limited.
These two facts point to a hierarchy of caches: relatively small-size and small-associativity first-level instruction and data caches (L1 caches); a large second-level on-chip cache with access an order of magnitude slower than L1 accesses (L2 cache generally unified, i.e., holding both instructions and data); often in high-performance servers a third-level cache (L3) off chip, with latencies approaching 100 cycles; and then main memory, with latencies of a few hundred cycles. The goal of the design of a cache hierarchy is to keep a latency of one or two cycles for L1 caches and to hide as much as possible the latencies of higher cache levels and of main memory.
- Type
- Chapter
- Information
- Microprocessor ArchitectureFrom Simple Pipelines to Chip Multiprocessors, pp. 208 - 259Publisher: Cambridge University PressPrint publication year: 2009