Vijayaraghavan Soundararajan and Anant
Agarwal. Dribbling Registers: A Mechanism for Reducing Context
Switch Latency in Large-Scale Multiprocessors. MIT/LCS
Technical Memo TM-474.
(pdf, compressed postscript)
As parallel machines grow in scale and complexity, latency tolerance of synchronization faults and remote memory accesses becomes increasingly important. One method for tolerating latency is multithreading, in which the processor rapidly context switches between a few threads on cache misses and synchronization faults. While a few threads (say, 4) are adequate to completely overlap all the latency when the latencies being tolerated are short compared to the total run lengths of all the processor-resident threads, many more threads are needed if this condition is not met, which is often the case for synchronization latencies.
This paper proposes dribbling registers (D-registers) as a mechanism for fast switching between a large number of threads, and compares its performance against other methods including software context switches, multiple register sets, and context caches. The idea behind D-registers is to implement a few threads (say, 2) in hardware, but attempt to provide a supply of threads to switch among on synchronization faults by continually loading and unloading contexts on free cache cycles over an extra register file read-write port. Although SPICE analysis on a preliminary VLSI implementation indicates D-registers to be 3-5% slower due to the extra port, they result in higher processor utilizations than the other methods for typical workload parameters.
email@example.com $Date: 1998/01/06 16:49:48 $