tuned STREAM on IBM eServer p5 570 (1900 MHz, 16 cpu)

From: Frank Johnston (fjohn@us.ibm.com)
Date: Mon Jul 12 2004 - 16:09:54 CDT

  • Next message: Bradley Lucier: "Better stream results for Mac G5"

    IBM eServer p5 570 (1900 MHz, 16 cpu, 36MB L3 cache) with DDR2 memory.

    Tuned STREAM Benchmark Output for Submission:
    Requesting Large Pages
    Setting up for 2 CPUs per module
    Number of segments per array = 8
    CPU binding list : 0 2 4 6 8 10 12 14
    Shared Segment Pointer = 504403158265495552
    Shared Segment Pointer = 504403160412979200
    Shared Segment Pointer = 504403162560462848
    Segment Size (B) = 268435456 (MB = 256 )
    Array Size (B) = 2147483648 (MB = 2048 )
    Array Size (DW) = 268435456
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    Num_threads = 16
    rebind: num_parthds is 16
    Starting Initialization
    Done With Initialization
    a(1) 1.00000000000000000
    b(M) 1.00000000000000000
    c(M) 1.00000000000000000
    Incremental Offset = 1536
    Number of Threads = 16
    ----------------------------------------------
    Double precision appears to have 16 digits of accuracy
    Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
    Array size = 268305408
    Offset = 0
    The total memory requirement is 6141 MB
    You are running each test 5 times
    The *best* time for each test is used
    ----------------------------------------------------
    Your clock granularity appears to be less than one microsecond
    Your clock granularity/precision appears to be 1 microseconds
    The tests below will each take a time on the order
    of 73990 microseconds
    (= 73990 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    ----------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 40465.2741 .2417 .1061 .1099
    Scale: 40132.7332 .2418 .1070 .1073
    Add: 44906.5518 .2722 .1434 .1436
    Triad: 45186.8670 .2714 .1425 .1427
    Sum of a is = 407484806118750.000
    Sum of b is = 81496961223750.0000
    Sum of c is = 108662614965000.000



    This archive was generated by hypermail 2.1.4 : Tue Jul 13 2004 - 08:50:46 CDT