Hello John,
I have run your benchmark on our Dell 8100 (P4 1500 MHz, 1 GByte
Dual-Channel Rambus-DRAM RIMM 800) linux box, running kernel 2.4.0.
PGI FORTRAN-77 compiler (pgf77 3.2-3):
pgf77 -fast -Mvect=sse -Mcache_aline -Mdalign -Munroll -Minline
No P4 optimizations! --pgf77 version 3.2-3 supports only PIII and Athlon
spezific optimizations.
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 25012345
Offset = 0
The total memory requirement is 572 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10000 microseconds
The tests below will each take a time on the order
of 200000 microseconds
(= 20 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 2106.3028 0.1941 0.1900 0.2000
Scale: 2106.3028 0.2203 0.1900 0.3700
Add: 2143.9153 0.2850 0.2800 0.2900
Triad: 2143.9153 0.2830 0.2800 0.2900
Sum of a is = 2.8846757034495990E+019
Sum of b is = 5.7693514069685250E+018
Sum of c is = 7.6924685388864070E+018
Regards
/Herbert
This archive was generated by hypermail 2b29 : Mon Apr 23 2001 - 09:29:53 CDT