Your name is on this list because you have, at some point, expressed interest
in the STREAM benchmark or the general goal of memory system benchmarking.
In the last few years STREAM has had a significant influence on the industry
(as measured by the amount of cursing that occurred at many vendor locations
when either the word STREAM or my name was mentioned), but I feel that it is
getting to be time to try to get more out of STREAM than the single figure
of main memory bandwidth that it currently provides.
I have created a mailing list for discussions relating to the STREAM benchmark,
and would like to take this opportunity to invite you to participate -- or just
lurk. Even two years ago I would have done this in the open on comp.benchmarks,
but wading through the current content of USENET groups is an option is too
painful to consider these days....
If you have gotten a piece of mail saying "Welcome to the list", then
you are on the mailing list already -- if not, then feel free to
subscribe (or not) using the directions at the bottom of this note.
Here are some issues that I would like to discuss with other interested people:
(1) Should STREAM be extended to automagically measure bandwidths at each
level of a memory hierarchy? What is a robust way of doing this with
a single, portable piece of source code?
(2) Should STREAM be extended to automagically measure latencies at each
level of a memory hierarchy? How would this be different than what
Larry McVoy does with lmbench/lat_mem_rd? Is it possible or desirable
to measure a "different kind" of latency than lmbench measures?
(3) Should STREAM be extended to automagically measure bandwidths and
latencies across distributed memory systems, such as the Convex cc-NUMA
machines, and the future SGI "Scalable Node" products? What about
distributed memory machines without global addressability? (I am not
interested in an MPI "solution"!!!!!)
(4) If we are looking at multiprocessor machines with some kind of network
rather than a shared bus, do we want to look at "peak point-to-point"
performance, or performance in the presence of contention? How do we
want to define the communication patterns that will create that contention?
Can this be done in a way that is not biased for or against any particular
network architecture? (Unless that bias seems "reasonable"?)
(5) In a completely different direction -- how do we address the question of
how much a memory system characterization can be useful for application
performance prediction? How much detail is needed to predict the
performance of SPECf95 to within acceptable margins of error, for example?
I have done a bit of this using ad hoc curve fitting of available numbers,
but a "first principles" approach would be preferred.
Several example codes have already been created to perform some of the above
tasks. One example is Larry McVoy's extensive lmbench suite, available at
<http://reality.sgi.com/employees/lm/lmbench/lmbench.html>
A second example is Nick McClaren's code that I call "nmm.c", which I have
placed in the /pub/stream/Code/Contrib/ area of the anonymous ftp server at the
STREAM home at the University of Virginia. The usual access is via the web, at
<http://www.cs.virginia.edu/stream/>
Thanks for your patience ----------------------------------------------
Administrative Info:
To *subscribe* to the STREAM mailing list send email to:
external-majordomo@postofc.corp.sgi.com
with the following line in the _body_ of the message:
subscribe stream _your_email_address_here_
To *unsubscribe* from the STREAM mailing list send email to:
external-majordomo@postofc.corp.sgi.com
with the following line in the _body_ of the message:
unsubscribe stream _your_email_address_here_
To post a message to the STREAM mailing list use this address:
stream@postofc.corp.sgi.com
john
-- John D. McCalpin, Ph.D. Supercomputing Performance Analyst Advanced Systems Division http://reality.sgi.com/employees/mccalpin Silicon Graphics, Inc. mccalpin@asd.sgi.com 415-933-7407