D. Parikh, K. Skadron, Y. Zhang, and M. Stan.
In IEEE Transactions on Computers.
Abstract
This paper uses Wattch and the SPEC 2000 integer and floating-point benchmarks to explore the role of branch predictor organization in
power/energy/performance tradeoffs for processor design. Even though the direction predictor by itself represents less than 1% of the processor’s
total power dissipation, prediction accuracy is nevertheless a powerful lever on processor behavior and program execution time. A thorough
study of branch predictor organizations shows that, as a general rule, to reduce overall energy consumption in the processor it is worthwhile
to spend more power in the branch predictor if this results in more accurate predictions that improve running time. This not only improves
performance, but can also improve the energy-delay product by up to 20%. Three techniques, however, can reduce power dissipation without
harming accuracy. Banking reduces the portion of the branch predictor that is active at any one time. A new on-chip structure, the prediction
probe detector (PPD), uses pre-decode bits to entirely eliminate unnecessary predictor and branch target buffer (BTB) accesses. Despite the
extra power that must be spent accessing it, the PPD reduces local predictor power and energy dissipation by about 31%, and overall processor
power and energy dissipation by 3%. These savings can be further improved by using profiling to annotate branches, identifying those that are
highly biased and do not require static prediction. Finally, the paper explores the effectiveness of a previously-proposed technique, pipeline
gating, and finds that even with adaptive control based on recent predictor accuracy, pipeline gating yields little or no energy savings.