From: Frank Johnston on 10/12/99 04:49 PM
To: John D Mccalpin/Austin/IBM@IBMUS
John,
Here are STREAM results without the compiler directive for a
Nighthawk1 (IBM RS/6000 SP 222 MHz POWER3smp).
Would it be possible to publish both these and the results
obtained with the directive on the table of "Standard Results",
using slightly different machine IDs? I still believe that use
of a simple Fortran compiler directive is much closer in spirit
to the "Standard Results", than "Experimental Results" obtained
with hand-coded assembler.
Machine ID npus COPY SCALE ADD TRIAD
IBM-SP_222MHz-POWER3smp 1 413.0 421.2 587.4 614.2
IBM-SP_222MHz-POWER3smp 2 820.7 800.4 1182.1 1165.4
IBM-SP_222MHz-POWER3smp 4 1603.5 1535.9 2218.3 2187.5
IBM-SP_222MHz-POWER3smp 8 2954.1 2821.0 3889.0 3872.2
Frank Johnston
output for 1 CPU:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 524300
Offset = 260
The total memory requirement is 12 MB
You are running each test 1000 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 8319 microseconds
(= 1664 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 412.9788 .0204 .0203 .0213
Scale: 421.1776 .0200 .0199 .0210
Add: 587.4248 .0216 .0214 .0226
Triad: 614.2449 .0205 .0205 .0216
Sum of a is = 1048600.00000001118
Sum of b is = 434344.341503158561
Sum of c is = 1482944.34150735638
output for 2 CPUs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 1048600
Offset = 360
The total memory requirement is 24 MB
You are running each test 750 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 8704 microseconds
(= 1741 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 820.6599 .0205 .0204 .0217
Scale: 800.4499 .0210 .0210 .0222
Add: 1182.0659 .0214 .0213 .0226
Triad: 1165.4323 .0217 .0216 .0229
Sum of a is = 2097200.00000000373
Sum of b is = 868688.683004021295
Sum of c is = 2965888.68302347790
output for 4 CPUs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 2097200
Offset = 272
The total memory requirement is 48 MB
You are running each test 1000 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 11072 microseconds
(= 2214 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 1603.5355 .0219 .0209 .0232
Scale: 1535.8922 .0225 .0218 .0235
Add: 2218.2522 .0235 .0227 .0249
Triad: 2187.5434 .0236 .0230 .0247
Sum of a is = 4194400.00000001118
Sum of b is = 1737377.36602994637
Sum of c is = 5931777.36605572235
output for 8 CPUs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 4194400
Offset = 224
The total memory requirement is 96 MB
You are running each test 1000 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 5 microseconds
The tests below will each take a time on the order
of 13120 microseconds
(= 2624 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 2954.0690 .0240 .0227 .0347
Scale: 2820.9498 .0247 .0238 .0724
Add: 3889.0287 .0271 .0259 .0456
Triad: 3872.2120 .0270 .0260 .0612
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT