Hi McCalpin-san,
Thank you for your reply and suggestion that I had better
check the number.
And I checked source again I found that I made a
mistake in timing fuction.
Now I get the reasonable expected numbers, 42 GB/s.
We can make a reasonable criteria for the procurement.
And I submitted a bug #676604 that I had expericenced
with STREAM.
Thank you again,
ashizawa
typhoon 37% ./go
+ export OMP_NUM_THREADS=256
+ export OMP_DYNAMIC=FALSE
+ export MPC_GANG=OFF
+ export _DSM_PPM=2
+ export _DSM_MUSTRUN=1
+ ./stream
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 160000000
Offset = 0
The total memory requirement is 3662 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 4 microseconds
The tests below will each take a time on the order
of 34356 microseconds
(= 8589 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 42825.1375 0.0618 0.0598 0.0721
Scale: 45299.3503 0.0622 0.0565 0.0739
Add: 51802.3491 0.0982 0.0741 0.2121
Triad: 52977.9809 0.0738 0.0725 0.0765
Sum of a is = 1.8452812500173852E+20
Sum of b is = 3.6905625000360264E+19
Sum of c is = 4.9207500000800375E+19
Large Case:
typhoon 21% ./go
+ export OMP_NUM_THREADS=256
+ export OMP_DYNAMIC=FALSE
+ export MPC_GANG=OFF
+ export _DSM_PPM=2
+ export _DSM_MUSTRUN=1
+ ./stream
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 400000000
Offset = 0
The total memory requirement is 9155 MB
You are running each test 10 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be 4 microseconds
The tests below will each take a time on the order
of 90979 microseconds
(= 22745 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 42824.2493 0.1505 0.1494 0.1525
Scale: 43213.4826 0.1638 0.1481 0.2423
Add: 48285.8374 0.2003 0.1988 0.2025
Triad: 49275.5196 0.1982 0.1948 0.2171
Sum of a is = 4.6132031251661586E+20
Sum of b is = 9.2264062499237265E+19
Sum of c is = 1.2301875000272059E+20
-- ------------------------------------------------------ Yoshio Ashizawa Scalable Systems Technology Center Silicon Graphics Inc. Japan E-mail: ashizawa@nsg.sgi.com PHONE : +81-3-5488-1838 FAX : +81-3-5420-2397
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT