From: Frank Johnston (fjohn@us.ibm.com)
Date: Tue Nov 02 2004 - 09:57:57 CST
These are tuned STREAM results on an IBM eServer p5 595
with sixty-four 1.9GHz cpus (36MB L3 cache). This is a POWER5 SMP machine.
Large pages were used in all cases.
Function Rate (MB/s) RMS time Min time Max time
Copy: 158176.44 .03 .03 .03
Scale: 153812.38 .03 .03 .03
Add: 169687.38 .04 .04 .04
Triad: 174567.44 .04 .04 .04
Here is the full output file:
----------------------------------------------
Requesting Large Pages
Setting up for 8 CPUs per module
Number of segments per array = 8
CPU binding list : 0 8 16 24 32 40 48 56
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403160412979200
Shared Segment Pointer = 504403162560462848
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 2147483648 (MB = 2048 )
Array Size (DW) = 268435456
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
Num_threads = 64
rebind: num_parthds is 64
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 2511067 cpu_id 0
bindprocessor successful: thread_self() 2511067 cpu_id 8
bindprocessor successful: thread_self() 2511067 cpu_id 16
bindprocessor successful: thread_self() 2511067 cpu_id 24
bindprocessor successful: thread_self() 2511067 cpu_id 32
bindprocessor successful: thread_self() 2511067 cpu_id 40
bindprocessor successful: thread_self() 2511067 cpu_id 48
bindprocessor successful: thread_self() 2511067 cpu_id 56
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 2511067 cpu_id 0
bindprocessor successful: thread_self() 2511067 cpu_id 8
bindprocessor successful: thread_self() 2511067 cpu_id 16
bindprocessor successful: thread_self() 2511067 cpu_id 24
bindprocessor successful: thread_self() 2511067 cpu_id 32
bindprocessor successful: thread_self() 2511067 cpu_id 40
bindprocessor successful: thread_self() 2511067 cpu_id 48
bindprocessor successful: thread_self() 2511067 cpu_id 56
GETSHRSEG: requesting large pages
GETSHRSEG ENTRY: shmgetflag -2147481216
bindprocessor successful: thread_self() 2511067 cpu_id 0
bindprocessor successful: thread_self() 2511067 cpu_id 8
bindprocessor successful: thread_self() 2511067 cpu_id 16
bindprocessor successful: thread_self() 2511067 cpu_id 24
bindprocessor successful: thread_self() 2511067 cpu_id 32
bindprocessor successful: thread_self() 2511067 cpu_id 40
bindprocessor successful: thread_self() 2511067 cpu_id 48
bindprocessor successful: thread_self() 2511067 cpu_id 56
bindprocessor successful: thread_self() 2793567 cpu_id 1
bindprocessor successful: thread_self() 2502873 cpu_id 14
bindprocessor successful: thread_self() 2859137 cpu_id 26
bindprocessor successful: thread_self() 2707509 cpu_id 28
bindprocessor successful: thread_self() 2805859 cpu_id 16
bindprocessor successful: thread_self() 2506965 cpu_id 48
bindprocessor successful: thread_self() 2490573 cpu_id 38
bindprocessor successful: thread_self() 2809957 cpu_id 3
bindprocessor successful: thread_self() 2596865 cpu_id 61
bindprocessor successful: thread_self() 2715853 cpu_id 51
bindprocessor successful: thread_self() 2818153 cpu_id 4
bindprocessor successful: thread_self() 2613259 cpu_id 60
bindprocessor successful: thread_self() 2592769 cpu_id 29
bindprocessor successful: thread_self() 2474181 cpu_id 40
bindprocessor successful: thread_self() 2535651 cpu_id 41
bindprocessor successful: thread_self() 2453591 cpu_id 49
bindprocessor successful: thread_self() 2543847 cpu_id 39
bindprocessor successful: thread_self() 2641943 cpu_id 63
bindprocessor successful: thread_self() 2637845 cpu_id 62
bindprocessor successful: thread_self() 2785369 cpu_id 17
bindprocessor successful: thread_self() 2584829 cpu_id 13
bindprocessor successful: thread_self() 2678977 cpu_id 20
bindprocessor successful: thread_self() 2564339 cpu_id 21
bindprocessor successful: thread_self() 2572537 cpu_id 25
bindprocessor successful: thread_self() 2773085 cpu_id 24
bindprocessor successful: thread_self() 2740297 cpu_id 35
bindprocessor successful: thread_self() 2478279 cpu_id 34
bindprocessor successful: thread_self() 2703413 cpu_id 11
bindprocessor successful: thread_self() 2486477 cpu_id 50
bindprocessor successful: thread_self() 2764889 cpu_id 12
bindprocessor successful: thread_self() 2605063 cpu_id 58
bindprocessor successful: thread_self() 2629651 cpu_id 59
bindprocessor successful: thread_self() 2498769 cpu_id 45
bindprocessor successful: thread_self() 2531553 cpu_id 44
bindprocessor successful: thread_self() 2867327 cpu_id 30
bindprocessor successful: thread_self() 2527455 cpu_id 32
bindprocessor successful: thread_self() 2494671 cpu_id 33
bindprocessor successful: thread_self() 2560239 cpu_id 54
bindprocessor successful: thread_self() 2461887 cpu_id 55
bindprocessor successful: thread_self() 2830447 cpu_id 10
bindprocessor successful: thread_self() 2822251 cpu_id 5
bindprocessor successful: thread_self() 2826349 cpu_id 15
bindprocessor successful: thread_self() 2855033 cpu_id 3 Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000
Incremental Offset = 512
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267914240
Offset = 0
The total memory requirement is 6132 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20269 microseconds
(= 20269 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 158382.48 .03 .03 .03
Scale: 152918.74 .03 .03 .03
Add: 169487.13 .04 .04 .04
Triad: 173851.94 .04 .04 .04
Sum of a is = 406877816418750.000
Sum of b is = 81375563283750.0000
Sum of c is = 108500751045000.000
Incremental Offset = 1536
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267914240
Offset = 0
The total memory requirement is 6132 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20562 microseconds
(= 20562 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 158275.12 .03 .03 .03
Scale: 154048.14 .03 .03 .03
Add: 170215.60 .04 .04 .04
Triad: 174275.48 .04 .04 .04
Sum of a is = 406877816418750.000
Sum of b is = 81375563283750.0000
Sum of c is = 108500751045000.000
Incremental Offset = 2560
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267914240
Offset = 0
The total memory requirement is 6132 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20980 microseconds
(= 20980 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 157782.03 .03 .03 .03
Scale: 152296.98 .03 .03 .03
Add: 168906.49 .04 .04 .04
Triad: 174002.25 .04 .04 .04
Sum of a is = 406877816418750.000
Sum of b is = 81375563283750.0000
Sum of c is = 108500751045000.000
Incremental Offset = 512
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267912192
Offset = 0
The total memory requirement is 6132 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20865 microseconds
(= 20865 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 156829.81 .03 .03 .03
Scale: 152168.21 .03 .03 .03
Add: 169285.82 .04 .04 .04
Triad: 173655.83 .04 .04 .04
Sum of a is = 406874706018750.000
Sum of b is = 81374941203750.0000
Sum of c is = 108499921605000.000
Incremental Offset = 1536
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267912192
Offset = 0
The total memory requirement is 6132 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20919 microseconds
(= 20919 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 158176.44 .03 .03 .03
Scale: 153812.38 .03 .03 .03
Add: 169687.38 .04 .04 .04
Triad: 174567.44 .04 .04 .04
Sum of a is = 406874706018750.000
Sum of b is = 81374941203750.0000
Sum of c is = 108499921605000.000
Incremental Offset = 2560
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267912192
Offset = 0
The total memory requirement is 6132 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20027 microseconds
(= 20027 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 156806.56 .03 .03 .03
Scale: 153200.32 .03 .03 .03
Add: 169332.59 .04 .04 .04
Triad: 173341.07 .04 .04 .04
Sum of a is = 406874706018750.000
Sum of b is = 81374941203750.0000
Sum of c is = 108499921605000.000
Incremental Offset = 512
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267910144
Offset = 0
The total memory requirement is 6131 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20166 microseconds
(= 20166 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 158603.60 .03 .03 .03
Scale: 152932.01 .03 .03 .03
Add: 169236.72 .04 .04 .04
Triad: 173923.28 .04 .04 .04
Sum of a is = 406871595618750.000
Sum of b is = 81374319123750.0000
Sum of c is = 108499092165000.000
Incremental Offset = 1536
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267910144
Offset = 0
The total memory requirement is 6131 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20604 microseconds
(= 20604 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 157792.08 .03 .03 .03
Scale: 153395.21 .03 .03 .03
Add: 169663.66 .04 .04 .04
Triad: 173924.40 .04 .04 .04
Sum of a is = 406871595618750.000
Sum of b is = 81374319123750.0000
Sum of c is = 108499092165000.000
Incremental Offset = 2560
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267910144
Offset = 0
The total memory requirement is 6131 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20299 microseconds
(= 20299 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 156556.85 .03 .03 .03
Scale: 153242.24 .03 .03 .03
Add: 169466.43 .04 .04 .04
Triad: 173797.75 .04 .04 .04
Sum of a is = 406871595618750.000
Sum of b is = 81374319123750.0000
Sum of c is = 108499092165000.000
Incremental Offset = 512
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267908096
Offset = 0
The total memory requirement is 6131 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20354 microseconds
(= 20354 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 157714.75 .03 .03 .03
Scale: 152767.11 .03 .03 .03
Add: 169418.29 .04 .04 .04
Triad: 173775.14 .04 .04 .04
Sum of a is = 406868485218750.000
Sum of b is = 81373697043750.0000
Sum of c is = 108498262725000.000
Incremental Offset = 1536
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267908096
Offset = 0
The total memory requirement is 6131 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20715 microseconds
(= 20715 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 158698.98 .03 .03 .03
Scale: 153576.16 .04 .03 .06
Add: 169291.73 .04 .04 .04
Triad: 173827.79 .08 .04 .16
Sum of a is = 406868485218750.000
Sum of b is = 81373697043750.0000
Sum of c is = 108498262725000.000
Incremental Offset = 2560
Number of Threads = 64
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 267908096
Offset = 0
The total memory requirement is 6131 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 20393 microseconds
(= 20393 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 156882.15 .03 .03 .03
Scale: 153576.16 .03 .03 .03
Add: 169742.46 .04 .04 .04
Triad: 173661.00 .04 .04 .04
Sum of a is = 406868485218750.000
Sum of b is = 81373697043750.0000
Sum of c is = 108498262725000.000
6
bindprocessor successful: thread_self() 2470083 cpu_id 37
bindprocessor successful: thread_self() 2801763 cpu_id 2
bindprocessor successful: thread_self() 2871425 cpu_id 31
bindprocessor successful: thread_self() 2723907 cpu_id 18
bindprocessor successful: thread_self() 2781279 cpu_id 19
bindprocessor successful: thread_self() 2736197 cpu_id 56
bindprocessor successful: thread_self() 2777177 cpu_id 57
bindprocessor successful: thread_self() 2760789 cpu_id 7
bindprocessor successful: thread_self() 2814057 cpu_id 6
bindprocessor successful: thread_self() 2687021 cpu_id 9
bindprocessor successful: thread_self() 2699315 cpu_id 8
bindprocessor successful: thread_self() 2511067 cpu_id 0
bindprocessor successful: thread_self() 2539749 cpu_id 47
bindprocessor successful: thread_self() 2482377 cpu_id 46
bindprocessor successful: thread_self() 2552043 cpu_id 53
bindprocessor successful: thread_self() 2465987 cpu_id 52
bindprocessor successful: thread_self() 2547945 cpu_id 43
bindprocessor successful: thread_self() 2719807 cpu_id 42
bindprocessor successful: thread_self() 2556147 cpu_id 23
bindprocessor successful: thread_self() 2600973 cpu_id 22
bindprocessor successful: thread_self() 2797663 cpu_id 27
This archive was generated by hypermail 2.1.4 : Wed Nov 03 2004 - 08:05:58 CST