John,
Many thanks for releasing "stream" to the world - it has done much to
make vendors aware of a very important issue in large-scale scientific
computing. I enclose some results from a slightly unusual system, a
Hitachi SR2201 (single node). It is a RISC based MPP with some
pseudovectorisation features which enhance the memory bandwidth and hide
latency. If you want any more details about the system, do ask, or see
http://www.hpcf.cam.ac.uk/tech.html.
Michael Rutter
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 2000000
Offset = 0
The total memory requirement is 45 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 2 microseconds
The tests below will each take a time on the order
of 40948 microseconds
(= 20474 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 775.6824 0.0413 0.0413 0.0413
Scale: 736.0721 0.0435 0.0435 0.0436
Add: 829.4023 0.0580 0.0579 0.0584
Triad: 826.9161 0.0581 0.0580 0.0581
Sum of a is = 0.230660156249616742E+019
Sum of b is = 461320312502628096.
Sum of c is = 615093750008502400.
[Compiled with: xf90 -W0,'OPT(O(SS))' stream_d.f second_cpu.f
second_cpu.f was completely rewritten, stream_d.f was untouched,
revision 4.1, June 4, 1996]
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:07 CDT