Hi John,
I just tried to get stream data for our IBM RS/6k 43-260 Power3 machine
but that didn't work very well: So far only the xlf 4.x compiler is
installed, and it doesn't know yet about the Power3 machines. Hence, the
numbers are not representative (in fact, smaller than the ones you list
in the experimental results section).
I am planning to have the compiler upgraded and then I will make another
attempt. However, once on it I decided to benchmark our IBM RS/6k 595
(AIX 4.1, 2 GB RAM) and found the so far fastest results for this
machine with these compiler options:
xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2 stream_d.f second_cpu.f \
-bmaxdata:500000000 -qhot -qhot=arraypad -qalign=4k -o strem_595
to be in the following price range
Function Rate (MB/s) RMS time Min time Max time
Copy: 847.0588 .4002 .3400 .4500
Scale: 900.0000 .3965 .3200 .4400
Add: 881.6327 .5402 .4900 .6600
Triad: 960.0000 .5273 .4500 .6000
(The Copy result is from a re-run of the same binary, well within the 5%
threshold you also mentioned). The complete log is appended for your
records. This is faster than the currently listed 595 entry (probably
still with xlf 3.x?) -- maybe you could also list the dates of the
entries so one can deduce such things? -- and also faster than I can
get the Power3 machine with this compiler.
You will notice that I had to increase n quite a bit since the box was
too fast. For fun I'll build an OS/2 version tonight for pentium and p6
class machines.
Cheers, Stefan
Anyway, here is the complete log:
Concord: (IBM 595 2 GB RAM) xlf 4.x
xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2
stream_d.f second_cpu.f -bmaxdata:500000000 -qhot
-o strem_595
timex ./strem_595
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 18000000
Offset = 0
The total memory requirement is 411 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10000 microseconds
The tests below will each take a time on the order
of 220000 microseconds
(= 22 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 738.4615 .4071 .3900 .4200
Scale: 738.4615 .3981 .3900 .4100
Add: 830.7692 .5381 .5200 .5600
Triad: 847.0588 .5333 .5100 .5600
Sum of a is = 0.207594140609274962E+20
Sum of b is = 0.415188281269478605E+19
Sum of c is = 0.553584374975187046E+19
real 46.64
user 20.25
sys 2.65
xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2 stream_d.f second_cpu.f \
-bmaxdata:500000000 -qhot -qhot=arraypad -o strem_595
concord pts/4 [169] /lscrtch/sad/stream> timex ./strem_595
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 18000000
Offset = 0
The total memory requirement is 411 MB
You are running each test 10 times
The *best* time for each test is used
--------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 757.8947 .3991 .3800 .4100
Scale: 757.8947 .4002 .3800 .4300
Add: 864.0000 .5382 .5000 .5600
Triad: 864.0000 .5212 .5000 .5400
Sum of a is = 0.207594140609274962E+20
Sum of b is = 0.415188281269478605E+19
Sum of c is = 0.553584374975187046E+19
real 46.46
user 19.93
sys 2.89
xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2 stream_d.f second_cpu.f \
-bmaxdata:500000000 -qhot -qhot=arraypad -qalign=4k -o strem_595
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 18000000
Offset = 0
The total memory requirement is 411 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10000 microseconds
The tests below will each take a time on the order
of 220000 microseconds
(= 22 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 738.4615 .4092 .3900 .4300
Scale: 757.8947 .3991 .3800 .4100
Add: 864.0000 .5262 .5000 .5500
Triad: 881.6327 .5253 .4900 .5500
Sum of a is = 0.207594140609274962E+20
Sum of b is = 0.415188281269478605E+19
Sum of c is = 0.553584374975187046E+19
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 18000000
Offset = 0
The total memory requirement is 411 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10000 microseconds
The tests below will each take a time on the order
of 220000 microseconds
(= 22 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 757.8947 .3971 .3800 .4100
Scale: 757.8947 .3911 .3800 .4000
Add: 847.0588 .5372 .5100 .5500
Triad: 847.0588 .5241 .5100 .5500
Sum of a is = 0.207594140609274962E+20
Sum of b is = 0.415188281269478605E+19
Sum of c is = 0.553584374975187046E+19
real 46.24
user 19.90
sys 2.74
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 18000000
Offset = 0
The total memory requirement is 411 MB
You are running each test 80 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10000 microseconds
The tests below will each take a time on the order
of 260000 microseconds
(= 26 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 800.0000 .4022 .3600 .4500
Scale: 900.0000 .3965 .3200 .4400
Add: 881.6327 .5402 .4900 .6600
Triad: 960.0000 .5273 .4500 .6000
Sum of a is = 0.440152552974613180E+102
Sum of b is = 0.880305106010380985E+101
Sum of c is = 0.117374014130075336E+102
real 187.16
user 150.50
sys 2.99
* Another run with the same binary resulted in:
Function Rate (MB/s) RMS time Min time Max time
Copy: 847.0588 .4002 .3400 .4500
Scale: 778.3784 .3996 .3700 .5200
Add: 919.1489 .5340 .4700 .5800
Triad: 900.0000 .5306 .4800 .5700
Sum of a is = 0.440152552974613180E+102
Sum of b is = 0.880305106010380985E+101
Sum of c is = 0.117374014130075336E+102
real 167.85
user 150.43
sys 3.25
-- ========================================================================= Dr. Stefan A. Deutscher | (+33-(0)1) voice fax Laboratoire des Collisions Atomiques et | LCAM : 6915-7699 6915-7671 Mol\'{e}culaires (LCAM), B\^{a}timent 351 | home : 5624-0992 call first Universit\'{e} de Paris-Sud | email: sad@utk.edu 91405 Orsay Cedex, France (Europe) | (forwarded to France) ========================================================================= Do you know what they call a quarter-pounder with cheese in Paris?
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT