STREAM on p655+/4p/1.7GHz

From: McCalpin (mccalpin@austin.ibm.com)
Date: Tue May 27 2003 - 10:48:27 CDT

  • Next message: Keith Josephson: "[stream] STREAM results for ION I2X2"

    Here are results for STREAM on the IBM eServer pSeries 655/4p/1.7GHz
    (aka p655+)

    These are all four-cpu results using large pages.

    The first set are with the standard code, while the second
    set uses the DCBZ-tuned code.

     Base Offset = 67108864
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67108864
     The total memory requirement is 1536 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 8369.7740 .1320 .1283 .1392
    Scale: 8354.4727 .1287 .1285 .1288
    Add: 11914.0431 .1354 .1352 .1357
    Triad: 12038.7274 .1340 .1338 .1343
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

     Base Offset = 67108864
     Incremental Offset = 3072
     Number of Threads = 4
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67108864
     Offset = 0
     The total memory requirement is 1536 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 59712 microseconds
        (= 59712 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 17661.9023 .2132 .0608 .0764
    Scale: 17839.1467 .2125 .0602 .0748
    Add: 19901.3958 .2372 .0809 .1211
    Triad: 19973.0932 .2368 .0806 .1194
     Sum of a is = 101921363943750.000
     Sum of b is = 20384272788750.0000
     Sum of c is = 27179030385000.0000
    bindprocessor successful: thread_self() 36811 cpu_id 1
    bindprocessor successful: thread_self() 36539 cpu_id 0
    bindprocessor successful: thread_self() 38515 cpu_id 3
    bindprocessor successful: thread_self() 33177 cpu_id 2



    This archive was generated by hypermail 2.1.4 : Tue May 27 2003 - 12:51:20 CDT