J. W. Haskins, Jr., K. R. Hirst, and K. Skadron.
In Proc. of the 20th IEEE International Performance,
Computing, and Communications Conference, April, 2001.
Abstract
This paper examines differential multithreading
(dMT) as an attractive organization for coping with pipeline stalls in
small-scale processors like those used in embedded environments. The
paper proposes extensions to block multithreading to cope with data-
and instruction-cache misses, and then explores some of the design
tradeoffs that this enables. Results show that dMT boosts throughput
substantially and can in fact replace dynamic branch
prediction or data forwarding, or can be used to reduce the sizes of
the instruction and data caches.
Block multithreading, described by Farrens and Pleszkun, is a technique to achieve high throughput from a single-issue microarchitecture by switching among multiple instruction streams in response to pipeline stalls. Although single-issue organizations are no longer used in high-performance processors, they remain common even in newly-designed processors for small-scale, embedded devices. Like the original description of block multithreading, dMT uses auxiliary pipeline registers to save the state of in-flight instructions. By coping with data- and instruction-cache misses, however, our implementation can attack all the major sources of pipeline stalls. Overall, we find that dMT can substantially lower the cost and complexity of microprocessors for embedded environments, especially environments for which throughput rather than speed is the primary concern. In addition, dMT is an attractive prospect for use in chip-multiprocessing environments.