Hi,
I ran STREAM on a 1-cpu 750MHz Sunblade 1000 tonight, and thought
you might be interested in their numbers. I compiled it using
the Sun Forte 6 maintenance update 1 compilers, with the UltraSPARC-III
architecture compiler flags, to allow for whatever tricks their compiler
would do. I gave the results for both the C and Fortran version of STREAM,
and the results are different, C being a lot slower, despite very agressive
optimization flags. This didn't used to be quite so different with older
compilers/machines from Sun, I seem to remember that the C/Fortran results
were closer with older machines/compilers. Anyway, here are the numbers.
John Stone
johns@ks.uiuc.edu
**
** 32-bit Fortran results (-xarch=v8plusb)
**
johns@sundemo[21] f77 second_cpu.f stream_d.f -fast -xarch=v8plusb -fsimple=2
second_cpu.f:
second:
stream_d.f:
MAIN stream:
realsize:
confuse:
checktick:
checksums:
Linking:
johns@sundemo[22] a.out
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 20000000
Offset = 0
The total memory requirement is 457 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity/precision appears to be 7 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 827.6625 0.3869 0.3866 0.3872
Scale: 768.9331 0.4163 0.4162 0.4164
Add: 827.3550 0.5832 0.5802 0.5840
Triad: 845.0919 0.5682 0.5680 0.5685
----------------------------------------------------
Solution Validates!
----------------------------------------------------
**
** 64-bit Fortran results (-xarch=v9b)
**
johns@sundemo[17] f77 second_cpu.f stream_d.f -fast -xarch=v9b -fsimple=2
second_cpu.f:
second:
stream_d.f:
MAIN stream:
realsize:
confuse:
checktick:
checksums:
Linking:
johns@sundemo[18] a.out
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 20000000
Offset = 0
The total memory requirement is 457 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity/precision appears to be 7 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 809.2464 0.3957 0.3954 0.3959
Scale: 814.7661 0.3933 0.3928 0.3936
Add: 931.4449 0.5158 0.5153 0.5160
Triad: 890.7752 0.5392 0.5389 0.5395
----------------------------------------------------
Solution Validates!
----------------------------------------------------
**
** 32-bit C results (-xarch=v8plusb)
**
johns@sundemo[33] cc -fast -xO5 -native -xarch=v8plusb -xrestrict -fsimple=2 stream_d.c second_cpu.c
stream_d.c:
second_cpu.c:
johns@sundemo[34] a.out
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 8000000, Offset = 0
Total memory required = 183.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 239999 microseconds.
(= 24 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 492.3091 0.2680 0.2600 0.2700
Scale: 474.0749 0.2700 0.2700 0.2700
Add: 426.6668 0.4550 0.4500 0.4600
Triad: 426.6678 0.4520 0.4500 0.4600
**
** 64-bit C results (-xarch=v9b)
**
johns@sundemo[35] cc -fast -xO5 -native -xarch=v9b -xrestrict -fsimple=2 stream_d.c second_cpu.c
stream_d.c:
second_cpu.c:
johns@sundemo[36] a.out
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 8000000, Offset = 0
Total memory required = 183.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 250000 microseconds.
(= 25 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 492.3091 0.2670 0.2600 0.2700
Scale: 474.0749 0.2720 0.2700 0.2800
Add: 426.6668 0.4570 0.4500 0.4600
Triad: 426.6668 0.4530 0.4500 0.4600
-- NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801 Email: johns@ks.uiuc.edu Phone: 217-244-3349 WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078-- NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801 Email: johns@ks.uiuc.edu Phone: 217-244-3349 WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078
This archive was generated by hypermail 2b29 : Wed Dec 06 2000 - 08:27:00 CST