J. W. Haskins and K. Skadron
Tech Report CS-2002-19, Univ. of Virginia Dept. of Computer Science, July, 2002.
Abstract
This paper explores techniques for speeding up sampled microprocessor simulations by exploiting the
observation that of the memory references that precede a sample, references that occur nearest to the sample are
more likely to be germane during the sample itself. This means that accurately warming up simulated cache and
branch predictor state only requires that a subset of the memory references and control-flow instructions
immediately preceding a simulation sample need to be modeled. Our technique takes measurements of the
memory reference reuse latencies (MRRLs) and uses these data to choose a point prior to each sample to engage
cache hierarchy and branch predictor modeling. By starting cache and branch predictor modeling late in the
pre-sample instruction stream, rather than modeling cache and branch predictor interactions for all pre-sample
instructions we are able to save the time cost of modeling them. This savings reduces overall simulation running
times by an average of 25%, while generating an average error in IPC of less than 0.7%.