Cyclone @ ISCA 2003
link: https://dl.acm.org/doi/10.1145/859618.859647
Primary disadvantage of compile-time scheduling is that it lack an accurate accessment of dependencies caused by branches and memory operations.
Related techniques:
- Branch boosting/predication
- Advanced loads
- Run-time disambiguation
- list scheduling schedule a list of jobs on a set of \(m\) machines.
- take the first job in the list (w/ highest priority)
- select next job in the list if no suitable machine is available
Dynamic scheduling at hardware-side is complex and slow.
- larger instruction window -> more ILP but slower clock speed.
- scheduler broadcast logic dominates shceduler circuit performance.
- Wire latencies grow
Contribution:
- Cyclone scheduler, combines advantage of compile-time and run-time scheduling techniques.
- Hardware-based list scheduling.
- Instructions are scheduled by predicated latency until its operands are ready.
- Broadcast free dynamic scheduling.
- Efficient dependency-based variable-latency instruction replay.
- First-class scheduling of memory dependencies.