From: Frank Johnston (fjohn@us.ibm.com)
Date: Mon Oct 04 2004 - 16:18:43 CDT
These are tuned STREAM results on a IBM eServer p5 520 Express
with two 1500 MHz cpus (36MB L3 cache). This is a POWER5 SMP machine.
Large pages were used in all cases.
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.04 .15 .14 .16
Scale: 3632.69 .15 .15 .15
Add: 5252.87 .15 .15 .15
Triad: 5359.51 .15 .15 .15
Here is the full output file:
---------------------------------------------------
Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 1
CPU binding list : 0
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158533931008
Shared Segment Pointer = 504403158802366464
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 268435456 (MB = 256 )
Array Size (DW) = 33554432
Num_threads = 2
Num_threads = 2
rebind: num_parthds is 2
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000
Incremental Offset = 512
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33541120
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147644 microseconds
(= 147644 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.34 .15 .14 .16
Scale: 3635.18 .15 .15 .15
Add: 5238.18 .15 .15 .15
Triad: 5351.20 .15 .15 .15
Sum of a is = 50940501581250.0000
Sum of b is = 10188100316250.0000
Sum of c is = 13584133755000.0000
Incremental Offset = 1536
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33541120
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147532 microseconds
(= 147532 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.58 .15 .14 .16
Scale: 3638.01 .15 .15 .15
Add: 5224.91 .15 .15 .15
Triad: 5343.64 .15 .15 .15
Sum of a is = 50940501581250.0000
Sum of b is = 10188100316250.0000
Sum of c is = 13584133755000.0000
Incremental Offset = 2560
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33541120
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147470 microseconds
(= 147470 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.17 .15 .14 .16
Scale: 3633.30 .15 .15 .15
Add: 5252.42 .15 .15 .15
Triad: 5360.53 .15 .15 .15
Sum of a is = 50940501581250.0000
Sum of b is = 10188100316250.0000
Sum of c is = 13584133755000.0000
Incremental Offset = 512
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33539072
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147671 microseconds
(= 147671 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.45 .15 .14 .16
Scale: 3633.24 .15 .15 .15
Add: 5256.05 .15 .15 .15
Triad: 5346.22 .15 .15 .15
Sum of a is = 50937391181250.0000
Sum of b is = 10187478236250.0000
Sum of c is = 13583304315000.0000
Incremental Offset = 1536
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33539072
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147570 microseconds
(= 147570 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3720.40 .15 .14 .16
Scale: 3634.51 .15 .15 .15
Add: 5248.29 .15 .15 .15
Triad: 5350.85 .15 .15 .15
Sum of a is = 50937391181250.0000
Sum of b is = 10187478236250.0000
Sum of c is = 13583304315000.0000
Incremental Offset = 2560
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33539072
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147591 microseconds
(= 147591 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.04 .15 .14 .16
Scale: 3632.69 .15 .15 .15
Add: 5252.87 .15 .15 .15
Triad: 5359.51 .15 .15 .15
Sum of a is = 50937391181250.0000
Sum of b is = 10187478236250.0000
Sum of c is = 13583304315000.0000
Incremental Offset = 512
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33537024
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147622 microseconds
(= 147622 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3722.54 .15 .14 .16
Scale: 3637.51 .15 .15 .15
Add: 5227.95 .15 .15 .16
Triad: 5353.06 .15 .15 .15
Sum of a is = 50934280781250.0000
Sum of b is = 10186856156250.0000
Sum of c is = 13582474875000.0000
Incremental Offset = 1536
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33537024
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147660 microseconds
(= 147660 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3718.99 .15 .14 .16
Scale: 3634.71 .15 .15 .15
Add: 5230.82 .15 .15 .15
Triad: 5336.16 .15 .15 .15
Sum of a is = 50934280781250.0000
Sum of b is = 10186856156250.0000
Sum of c is = 13582474875000.0000
Incremental Offset = 2560
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33537024
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147555 microseconds
(= 147555 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.74 .15 .14 .16
Scale: 3633.74 .15 .15 .15
Add: 5258.08 .15 .15 .15
Triad: 5356.05 .15 .15 .15
Sum of a is = 50934280781250.0000
Sum of b is = 10186856156250.0000
Sum of c is = 13582474875000.0000
Incremental Offset = 512
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33534976
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147649 microseconds
(= 147649 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3722.11 .15 .14 .16
Scale: 3633.78 .15 .15 .15
Add: 5262.85 .15 .15 .15
Triad: 5355.36 .15 .15 .15
Sum of a is = 50931170381250.0000
Sum of b is = 10186234076250.0000
Sum of c is = 13581645435000.0000
Incremental Offset = 1536
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33534976
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147545 microseconds
(= 147545 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3720.23 .15 .14 .16
Scale: 3634.44 .15 .15 .15
Add: 5227.36 .15 .15 .15
Triad: 5341.53 .15 .15 .15
Sum of a is = 50931170381250.0000
Sum of b is = 10186234076250.0000
Sum of c is = 13581645435000.0000
Incremental Offset = 2560
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33534976
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 147579 microseconds
(= 147579 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3721.60 .15 .14 .16
Scale: 3633.22 .15 .15 .15
Add: 5251.38 .15 .15 .15
Triad: 5341.64 .15 .15 .15
Sum of a is = 50931170381250.0000
Sum of b is = 10186234076250.0000
Sum of c is = 13581645435000.0000
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 295117 cpu_id 0
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 295117 cpu_id 0
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 295117 cpu_id 0
bindprocessor successful: thread_self() 295117 cpu_id 0
bindprocessor successful: thread_self() 475217 cpu_id 1
This archive was generated by hypermail 2.1.4 : Tue Oct 05 2004 - 07:49:28 CDT