From: Frank Johnston (fjohn@us.ibm.com)
Date: Mon Oct 04 2004 - 16:20:55 CDT
These are tuned STREAM results on a IBM eServer p5 550 Express
with four 1500 MHz cpus (36MB L3 cache). This is a POWER5 SMP machine.
Large pages were used in all cases.
Function Rate (MB/s) RMS time Min time Max time
Copy: 6336.64 .17 .17 .19
Scale: 6203.58 .17 .17 .17
Add: 9044.43 .18 .18 .18
Triad: 9260.01 .17 .17 .17
Here is the full output file:
------------------------------------------------------------
Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 2
CPU binding list : 0 2
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158802366464
Shared Segment Pointer = 504403159339237376
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 536870912 (MB = 512 )
Array Size (DW) = 67108864
Num_threads = 4
Num_threads = 4
Num_threads = 4
Num_threads = 4
rebind: num_parthds is 4
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000
Incremental Offset = 512
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67079168
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173001 microseconds
(= 173001 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6332.81 .17 .17 .19
Scale: 6202.76 .17 .17 .17
Add: 9044.40 .18 .18 .18
Triad: 9241.50 .17 .17 .17
Sum of a is = 101876486400000.000
Sum of b is = 20375297280000.0000
Sum of c is = 27167063040000.0000
Incremental Offset = 1536
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67079168
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173075 microseconds
(= 173075 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6334.93 .17 .17 .19
Scale: 6202.84 .17 .17 .17
Add: 9041.60 .18 .18 .18
Triad: 9244.06 .17 .17 .17
Sum of a is = 101876486400000.000
Sum of b is = 20375297280000.0000
Sum of c is = 27167063040000.0000
Incremental Offset = 2560
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67079168
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173063 microseconds
(= 173063 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6335.98 .17 .17 .19
Scale: 6202.82 .17 .17 .17
Add: 9037.42 .18 .18 .18
Triad: 9237.77 .17 .17 .17
Sum of a is = 101876486400000.000
Sum of b is = 20375297280000.0000
Sum of c is = 27167063040000.0000
Incremental Offset = 512
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67077120
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173122 microseconds
(= 173122 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6336.64 .17 .17 .19
Scale: 6203.58 .17 .17 .17
Add: 9044.43 .18 .18 .18
Triad: 9260.01 .17 .17 .17
Sum of a is = 101873376000000.000
Sum of b is = 20374675200000.0000
Sum of c is = 27166233600000.0000
Incremental Offset = 1536
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67077120
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173064 microseconds
(= 173064 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6333.57 .17 .17 .19
Scale: 6203.03 .17 .17 .17
Add: 9042.34 .18 .18 .18
Triad: 9237.94 .17 .17 .17
Sum of a is = 101873376000000.000
Sum of b is = 20374675200000.0000
Sum of c is = 27166233600000.0000
Incremental Offset = 2560
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67077120
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173082 microseconds
(= 173082 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6336.72 .17 .17 .19
Scale: 6204.31 .17 .17 .17
Add: 9038.50 .18 .18 .18
Triad: 9235.56 .17 .17 .17
Sum of a is = 101873376000000.000
Sum of b is = 20374675200000.0000
Sum of c is = 27166233600000.0000
Incremental Offset = 512
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67075072
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173082 microseconds
(= 173082 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6334.12 .17 .17 .19
Scale: 6202.30 .17 .17 .17
Add: 9043.60 .18 .18 .18
Triad: 9253.06 .17 .17 .18
Sum of a is = 101870265600000.000
Sum of b is = 20374053120000.0000
Sum of c is = 27165404160000.0000
Incremental Offset = 1536
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67075072
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173061 microseconds
(= 173061 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6333.75 .17 .17 .19
Scale: 6203.76 .17 .17 .17
Add: 9043.13 .18 .18 .18
Triad: 9247.27 .17 .17 .17
Sum of a is = 101870265600000.000
Sum of b is = 20374053120000.0000
Sum of c is = 27165404160000.0000
Incremental Offset = 2560
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67075072
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173057 microseconds
(= 173057 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6334.65 .17 .17 .19
Scale: 6203.77 .17 .17 .17
Add: 9041.13 .18 .18 .18
Triad: 9241.83 .17 .17 .17
Sum of a is = 101870265600000.000
Sum of b is = 20374053120000.0000
Sum of c is = 27165404160000.0000
Incremental Offset = 512
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67073024
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173041 microseconds
(= 173041 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6334.66 .17 .17 .19
Scale: 6202.59 .17 .17 .17
Add: 9042.47 .18 .18 .18
Triad: 9261.63 .17 .17 .17
Sum of a is = 101867155200000.000
Sum of b is = 20373431040000.0000
Sum of c is = 27164574720000.0000
Incremental Offset = 1536
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67073024
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 173152 microseconds
(= 173152 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6334.58 .17 .17 .19
Scale: 6204.01 .17 .17 .17
Add: 9043.97 .18 .18 .18
Triad: 9243.32 .17 .17 .17
Sum of a is = 101867155200000.000
Sum of b is = 20373431040000.0000
Sum of c is = 27164574720000.0000
Incremental Offset = 2560
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67073024
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 172962 microseconds
(= 172962 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6337.22 .17 .17 .19
Scale: 6201.97 .17 .17 .17
Add: 9035.72 .18 .18 .18
Triad: 9238.51 .17 .17 .17
Sum of a is = 101867155200000.000
Sum of b is = 20373431040000.0000
Sum of c is = 27164574720000.0000
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 659609 cpu_id 0
bindprocessor successful: thread_self() 659609 cpu_id 2
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 659609 cpu_id 0
bindprocessor successful: thread_self() 659609 cpu_id 2
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 659609 cpu_id 0
bindprocessor successful: thread_self() 659609 cpu_id 2
bindprocessor successful: thread_self() 803005 cpu_id 2
bindprocessor successful: thread_self() 815321 cpu_id 3
bindprocessor successful: thread_self() 786609 cpu_id 1
bindprocessor successful: thread_self() 659609 cpu_id 0
This archive was generated by hypermail 2.1.4 : Tue Oct 05 2004 - 07:49:38 CDT