From: John D Mccalpin (mccalpin@us.ibm.com)
Date: Thu Feb 24 2005 - 10:14:37 CST
To: mccalpin@us.ibm.com
cc:
From: Ly Vu/Austin/IBM@IBMUS
Subject: tuned STREAM on IBM eServer p5 575 (1900 MHz, 8cpu)
These are tuned STREAM results on an IBM eServer p5 575
with eight 1900 MHz cpus. This is a POWER5 SMP machine.
Large pages were used in all cases.
Function Rate (MB/s) RMS time Min time Max time
Copy: 52098.39 .08 .08 .09
Scale: 52173.72 .08 .08 .08
Add: 54863.67 .12 .12 .12
Triad: 55732.80 .12 .12 .12
Here is the full output file:
--------------------------------------------------
Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 8
CPU binding list : 0 2 4 6 8 10 12 14
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403160412979200
Shared Segment Pointer = 504403162560462848
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 2147483648 (MB = 2048 )
Array Size (DW) = 268435456
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
rebind: num_parthds is 16
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000
Incremental Offset = 512
Number of Threads = 16
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 268305408
Offset = 0
The total memory requirement is 6141 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 83800 microseconds
(= 83800 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 52098.39 .08 .08 .09
Scale: 52173.72 .08 .08 .08
Add: 54863.67 .12 .12 .12
Triad: 55732.80 .12 .12 .12
Sum of a is = 407484806118750.000
Sum of b is = 81496961223750.0000
Sum of c is = 108662614965000.000
______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.
This archive was generated by hypermail 2.1.4 : Tue Mar 08 2005 - 07:42:58 CST