The Pentium 4 Architecture Design
Table of Contents
paper link: Pentium IV Architecture
This paper presents the micro-architecture (called NetBurst) of Pentium 4 processor. There are four main sections in NetBurst:
- In-order front end
- out-of-order execution engine
- interger and floating-point execution units
- memory subsystem
In-order front end
- Highly accurate branch prediction logic
- The predicted instruction address, is used to fetch instructions from L2.
- IA-32 instructions are decoded into uops(micr-operations).
- Trace Cache (L1) sits between instruction decode logic and execution cores.
- stores already decoded uops.
- this helps remove decoding from execution loop
- IA-32 decoder is only used when machine misses trace cache.
OOO Execution logic
- The retirement logic reorders instructions (executed in OOO manner) back to original manner.
- Pentium 4 can retire up to 3 uops per clock cycle.
- Reports branch history information to branch predictor at front end.
Integer and Floating-Point Execution Units
- ALUs and L1 data cache.
Memory Subsystem
- L2 cache and system bus connected to main memory.
Clock rate
- Higher clock rates has tradeoffs:
- Higher clock rates need deeper pipeline. Deeper pipeline make many things take more clock cycles.
- 50%+ frequency -> 30%+ performance
- Also depend on circuit design techniques, design methodology, design tools, silicon process technology, power and thermal constraints.
- Deeper pipelines make machine more complicated and need more buffering.
- 286, 386, 486 and P5 and similar pipeline depths, P6 doubles stages, NetBurst further lengthens stages (1.5x frequency over Pentium III).
- P6 micro-architecture has 10-stage misprediction pipeline while NetBurst has 20.
- Different parts run at different clock frequencies:
- Highest frequency section was set equal to ALU-bypass execution loop.
- Other parts run at half of the 3GHz, some even slower, bus logic @ 100MHz.
- Higher clock rates need deeper pipeline. Deeper pipeline make many things take more clock cycles.
Some other notable features:
- real-time MPEG2 video encoding and near real-time MPEG4 encoding.
- introduce SSE2, new 128-bit SIMD instruction.