Hi,
I took my chances and tried your stream code on my PC: Intel P133, 512 KB
PB cache, 64 MB EDO RAM, running Linux RedHat 4.0, kernel 2.0.18.
Compilers used: GCC 2.7.2.f.1, G77 0.5.18
Here the compiler command line and the results for both the C and the
F77 version of stream (best results out of 10 runs) come:
C:
gcc -O2 -fomit-frame-pointer -finline-functions -funroll-all-loops -fstrength-reduce -fexpensive-optimizations -malign-jumps=2 -malign-loops=2 -malign-functions=2 -o stream.LINUX.GCC.full_opt second_cpu.c stream_d.c -lm
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 220000 microseconds.
(= 22 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 88.8889 0.3610 0.3600 0.3700
Scale: 118.5185 0.2770 0.2700 0.2800
Add: 126.3158 0.3810 0.3800 0.3900
Triad: 117.0732 0.4170 0.4100 0.4200
F77:
g77 -O1 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -finline-functions -fstrength-reduce -funroll-all-loops -o stream.LINUX.g77.opt etime.o second_cpu.f stream_d.f
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 2000000
Offset = 0
The total memory requirement is 45 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 10000 microseconds
The tests below will each take a time on the order
of 230000 microseconds
(= 23 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 86.4865 0.3606 0.3700 0.4000
Scale: 118.5187 0.2625 0.2700 0.2900
Add: 123.0771 0.3732 0.3900 0.4000
Triad: 117.0732 0.3921 0.4100 0.4200
Sum of a is = 2.30660156E+18
Sum of b is = 4.61320312E+17
Sum of c is = 6.1509375E+17
Beside of Copy these numbers are a bit better than what's on
your results table.
Ciao, Ralf Loesche
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:06 CDT