D. Parikh, K. Skadron, Y. Zhang, M. Barcella, and M. Stan
Univ. of Virginia Dept. of Computer Science Technical Report CS-2001-25, Nov. 2001.
(This is an extended version of a conference paper with the same title and authors appearing in Proc. of the 2002 International Symposium on High-Performance Computer Architecture, February, 2002, Cambridge, MA.)
Abstract
This paper explores the role of branch predictor organization in power/energy/performance
tradeoffs for processor design. We find that as a general rule, to reduce overall energy consumption in the
processor it is worthwhile to spend more power in the branch predictor if this results in more accurate
predictions that improve running time. Two techniques, however, provide substantial reductions in power
dissipation without harming accuracy. Banking reduces the portion of the branch predictor that is active at
any one time. And a new on-chip structure, the prediction probe detector (PPD), can use pre-decode bits to
entirely eliminate unnecessary predictor and BTB accesses. Despite the extra power that must be spent
accessing the PPD, it reduces local predictor power and energy dissipation by about 45% and overall
processor power and energy dissipation by 5-6%