The Pentium 4 Architecture Design

Table of Contents

paper link: Pentium IV Architecture

This paper presents the micro-architecture (called NetBurst) of Pentium 4 processor. There are four main sections in NetBurst:

  1. In-order front end
  2. out-of-order execution engine
  3. interger and floating-point execution units
  4. memory subsystem

In-order front end

  • Highly accurate branch prediction logic
    • The predicted instruction address, is used to fetch instructions from L2.
  • IA-32 instructions are decoded into uops(micr-operations).
  • Trace Cache (L1) sits between instruction decode logic and execution cores.
    • stores already decoded uops.
    • this helps remove decoding from execution loop
    • IA-32 decoder is only used when machine misses trace cache.

OOO Execution logic

  • The retirement logic reorders instructions (executed in OOO manner) back to original manner.
    • Pentium 4 can retire up to 3 uops per clock cycle.
    • Reports branch history information to branch predictor at front end.

Integer and Floating-Point Execution Units

  • ALUs and L1 data cache.

Memory Subsystem

  • L2 cache and system bus connected to main memory.

Clock rate

  • Higher clock rates has tradeoffs:
    • Higher clock rates need deeper pipeline. Deeper pipeline make many things take more clock cycles.
      • 50%+ frequency -> 30%+ performance
    • Also depend on circuit design techniques, design methodology, design tools, silicon process technology, power and thermal constraints.
    • Deeper pipelines make machine more complicated and need more buffering.
    • 286, 386, 486 and P5 and similar pipeline depths, P6 doubles stages, NetBurst further lengthens stages (1.5x frequency over Pentium III).
      • P6 micro-architecture has 10-stage misprediction pipeline while NetBurst has 20.
    • Different parts run at different clock frequencies:
      • Highest frequency section was set equal to ALU-bypass execution loop.
      • Other parts run at half of the 3GHz, some even slower, bus logic @ 100MHz.

Some other notable features:

  • real-time MPEG2 video encoding and near real-time MPEG4 encoding.
  • introduce SSE2, new 128-bit SIMD instruction.

Author: expye(Zihao YE)



Last modified: 2022-12-27 Tue 07:18

Licensed under CC BY-NC 4.0