tuned STREAM on IBM eServer p5 575 (1500 MHz, 16cpu)

From: Ly Vu (lyvu@us.ibm.com)
Date: Tue Aug 09 2005 - 10:05:51 CDT

  • Next message: Schmidt, David (Performance Eng.): "STREAM results for the HP ProLiant BL20p G3, DL140 G2, and ML350 G4p with Intel 3.6Ghz Xeon CPUs"

    These are tuned STREAM results on an IBM eServer p5 575
    with sixteen 1.5 GHz cpus. This is a POWER5 SMP machine.
    Large pages were used in all cases.

    Function Rate (MB/s) RMS time Min time Max time
    Copy: 50173.70 .09 .09 .09
    Scale: 49010.25 .09 .09 .09
    Add: 55184.19 .12 .12 .12
    Triad: 55870.61 .12 .12 .12

    Here is the full output file:
    --------------------------------------------------

    Requesting Large Pages
     Setting up for 2 CPUs per module
     Number of segments per array = 8
     CPU binding list : 0 2 4 6 8 10 12 14
     Shared Segment Pointer = 504403158265495552
     Shared Segment Pointer = 504403160412979200
     Shared Segment Pointer = 504403162560462848
     Segment Size (B) = 268435456 (MB = 256 )
     Array Size (B) = 2147483648 (MB = 2048 )
     Array Size (DW) = 268435456
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     Num_threads = 16
     rebind: num_parthds is 16
     Starting Initialization
     Done With Initialization
     a(1) 1.00000000000000000
     b(M) 1.00000000000000000
     c(M) 1.00000000000000000
     Incremental Offset = 512
     Number of Threads = 16

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 268303360
     Offset = 0
     The total memory requirement is 6140 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 83383 microseconds
        (= 83383 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 50173.70 .09 .09 .09
    Scale: 49010.25 .09 .09 .09
    Add: 55184.19 .12 .12 .12
    Triad: 55870.61 .12 .12 .12
     Sum of a is = 407485728000000.000
     Sum of b is = 81497145600000.0000
     Sum of c is = 108662860800000.000
    ______________________________________________
    Ly Vu
    IBM Corp. - Austin, Texas.
    RS/6000 Performance Analysis.
    Phone : (512) 838-8228
    Email : lyvu@us.ibm.com



    This archive was generated by hypermail 2.1.4 : Fri Sep 02 2005 - 11:24:11 CDT