To whom it may concern,
I would like to submit STREAM benchmark results for an
IBM RS/6000 SP 222 MHz POWER3smp. Please add the following
summary to the web page of standard STREAM results:
Machine ID npus COPY SCALE ADD TRIAD
IBM-SP_222MHz-POWER3smp 1 790.4 794.1 951.0 953.3
IBM-SP_222MHz-POWER3smp 2 1549.2 1524.4 1843.5 1844.5
IBM-SP_222MHz-POWER3smp 4 2677.0 2630.9 3091.3 3101.2
IBM-SP_222MHz-POWER3smp 8 4332.9 4211.4 4843.3 4937.8
Each CPU on this machine has a private 4 MB L2 cache. The array
size at each CPU count exceeds the aggregate available L2. However
the arrays were kept as small as possible in order to give
system tables mapping virtual to real memory a chance to reside in
the L2. When data in these tables comes from main memory,
there is additional memory traffic which STREAM does not count.
In addition to changing the array size and offset, the following
changes were made to the Fortran STREAM code:
1) Increase "ntimes" to 1000 in order to reduce run-to-run fluctuations.
2) Change the "scalar" constant to sqrt(2) - 1 to prevent overflows (INFs).
3) Add compiler directives to allow the code to run in parallel on multiple
CPUs and to allow certain arrays to be "prefetched" by the hardware.
Here is an example (the COPY loop):
!smp$ parallel do private(j) , schedule(static)
DO 30 j = 1,n
!ibm* prefetch_by_load( c(j) )
c(j) = a(j)
30 CONTINUE
If you have any questions, please let me know.
Frank Johnston
(fjohn@us.ibm.com, xxx-xxx-xxxx)
P.S. Here are the actual outputs:
output for 1 CPU:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 524300
Offset = 330
The total memory requirement is 12 MB
You are running each test 1000 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 8259 microseconds
(= 1652 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 790.4472 .0107 .0106 .0116
Scale: 794.1492 .0106 .0106 .0109
Add: 951.0364 .0133 .0132 .0144
Triad: 953.3298 .0133 .0132 .0137
Sum of a is = 1048600.00000001118
Sum of b is = 434344.341503158561
Sum of c is = 1482944.34150735638
output for 2 CPUs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 1048600
Offset = 404
The total memory requirement is 24 MB
You are running each test 1000 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 15856 microseconds
(= 3171 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 1549.2400 .0109 .0108 .0201
Scale: 1524.3722 .0111 .0110 .0120
Add: 1843.4750 .0137 .0137 .0147
Triad: 1844.5380 .0137 .0136 .0143
Sum of a is = 2097200.00000001118
Sum of b is = 868688.683004021645
Sum of c is = 2965888.68302347837
output for 4 CPUs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 2097200
Offset = 260
The total memory requirement is 48 MB
You are running each test 1000 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 32019 microseconds
(= 6404 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 2676.9512 .0133 .0125 .0329
Scale: 2630.9133 .0133 .0128 .0229
Add: 3091.3001 .0169 .0163 .0181
Triad: 3101.2452 .0168 .0162 .0180
Sum of a is = 4194400.00000001118
Sum of b is = 1737377.36602994637
Sum of c is = 5931777.36605572235
output for 8 CPUs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 4194400
Offset = 468
The total memory requirement is 96 MB
You are running each test 1000 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 64196 microseconds
(= 12839 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 4332.9165 .0166 .0155 .0576
Scale: 4211.3665 .0170 .0159 .0588
Add: 4843.2744 .0217 .0208 .0607
Triad: 4937.7796 .0213 .0204 .0486
Sum of a is = 8388800.00000001118
Sum of b is = 3474754.73209443502
Sum of c is = 11863554.7321202103
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT