From: McCalpin (mccalpin@austin.ibm.com)
Date: Tue May 27 2003 - 10:48:27 CDT
Here are results for STREAM on the IBM eServer pSeries 655/4p/1.7GHz
(aka p655+)
These are all four-cpu results using large pages.
The first set are with the standard code, while the second
set uses the DCBZ-tuned code.
Base Offset = 67108864
Incremental Offset = 1536
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67108864
The total memory requirement is 1536 MB
You are running each test 5 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 8369.7740 .1320 .1283 .1392
Scale: 8354.4727 .1287 .1285 .1288
Add: 11914.0431 .1354 .1352 .1357
Triad: 12038.7274 .1340 .1338 .1343
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Base Offset = 67108864
Incremental Offset = 3072
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67108864
Offset = 0
The total memory requirement is 1536 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 59712 microseconds
(= 59712 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 17661.9023 .2132 .0608 .0764
Scale: 17839.1467 .2125 .0602 .0748
Add: 19901.3958 .2372 .0809 .1211
Triad: 19973.0932 .2368 .0806 .1194
Sum of a is = 101921363943750.000
Sum of b is = 20384272788750.0000
Sum of c is = 27179030385000.0000
bindprocessor successful: thread_self() 36811 cpu_id 1
bindprocessor successful: thread_self() 36539 cpu_id 0
bindprocessor successful: thread_self() 38515 cpu_id 3
bindprocessor successful: thread_self() 33177 cpu_id 2
This archive was generated by hypermail 2.1.4 : Tue May 27 2003 - 12:51:20 CDT