John:
I figured it might be useful to have STREAM numbers for
the same machine, but generated with executables from 2
different compilers.
1. The following results are from my home system. It has
an Asus K7V motherboard with Via KX133 chipset and an
750 MHz Athlon Classic, 256MB of Crucial PC133-CAS2
SDRAM. It is running WinNT4/SP6 and I used Compaq
Visual Fortran 6.5 as the compiler. Compiler was invoked
like so: df -unroll:8 -arch:k7
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 9012345
Offset = 0
The total memory requirement is 206 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10000 microseconds
The tests below will each take a time on the order
of 210000 microseconds
(= 21 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 576.7901 0.2580 0.2500 0.2600
Scale: 576.7901 0.2540 0.2500 0.2600
Add: 655.4433 0.3310 0.3300 0.3400
Triad: 636.1655 0.3430 0.3400 0.3500
Sum of a is = 1.039394452812601E+019
Sum of b is = 2.078788905585281E+018
Sum of c is = 2.771718540568361E+018
2. The following results are also from my home system.
Again, Asus K7V motherboard with Via KX133 chipset
and 750 MHz Athlon Classic, 256MB of Crucial PC133-CAS2
SDRAM. Running WinNT4/SP6 and I used Lahey Fujitsu Fortran
95 5.6 as the compiler. Compiler was invoked like so:
lf95 -o1 -tpp
---------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
---------------------------------------------
Array size = 9012345
Offset = 0
The total memory requirement is 206 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10014 microseconds
The tests below will each take a time on the order
of 280403 microseconds
(= 28 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
unction Rate (MB/s) RMS time Min time Max time
opy: 496.5194 0.2914 0.2904 0.3004
cale: 553.8091 0.2614 0.2604 0.2704
dd: 654.5015 0.3365 0.3305 0.3405
riad: 635.2507 0.3475 0.3405 0.3505
Sum of a is = 1.039394452812601E+19
Sum of b is = 2.078788905585281E+18
Sum of c is = 2.771718540568361E+18
As one can see the results match nicely with the exception of
"Copy". LF95 doesn't offer an assembly language listing so I
have not figured out why its copy loop is slower than CVF's.
-- Norbert
This archive was generated by hypermail 2b29 : Sun Jan 28 2001 - 18:38:10 CST