STREAM Results on the NEC SX-6

From: h-takahara@bc.jp.nec.com
Date: Wed Oct 16 2002 - 03:51:19 CDT

  • Next message: Yves Edel: "STREAM Memory Bandwidth"

    Dear Dr. McCalpin,

    Please find attached the STREAM runs we obtained on the NEC SX-6.
    The machine used is the SX-6/8A, which consists of 8 processors.

    The performance figures are reported for 1, 2, 4, and 8-processor
    configurations of this system.
    Each digit after the "/" on the attached table represents the number
    of processors actually used for each run, e.g., SX-6/2A for
    a 2-processor configuration.

    We would be appreciated if you could update the STREAM Web site
    with these data.

    Thank you.

    Best regards,

    Hiroshi Takahara

       
       Hiroshi Takahara
       Senior Manager, Scientific Software Department
       HPC Marketing Promotion Division, NEC Corporation

       1-10, Nisshin-cho, Fuchu, Tokyo 183-8501, Japan
       Tel/Fax: +81-42-333-6389 /6382
       E-mail: h-takahara@bc.jp.nec.com

     
    -------
    1. Summary

    [1cpu]

    Function Rate (MB/s) RMS time Min time Max time
    Copy: 31959.2652 0.0401 0.0401 0.0401
    Scale: 31920.2167 0.0401 0.0401 0.0401
    Add: 31983.0006 0.0600 0.0600 0.0600
    Triad: 31982.9371 0.0600 0.0600 0.0600

    [2cpus]

    Function Rate (MB/s) RMS time Min time Max time
    Copy: 63770.4794 0.0201 0.0201 0.0202
    Scale: 63665.7411 0.0201 0.0201 0.0202
    Add: 63908.6389 0.0301 0.0300 0.0301
    Triad: 63908.3853 0.0301 0.0300 0.0301

    [4cpus]

    Function Rate (MB/s) RMS time Min time Max time
    Copy: 126895.8381 0.0101 0.0101 0.0101
    Scale: 126620.4981 0.0101 0.0101 0.0102
    Add: 127643.0474 0.0150 0.0150 0.0150
    Triad: 127633.9437 0.0150 0.0150 0.0150

    [8cpus]

    Function Rate (MB/s) RMS time Min time Max time
    Copy: 202627.2054 0.0064 0.0063 0.0065
    Scale: 192306.2280 0.0067 0.0067 0.0068
    Add: 190231.3486 0.0102 0.0101 0.0104
    Triad: 213024.2882 0.0092 0.0090 0.0093

    2. Details

    Machine: SX-6/8A 64GB

    [SX-6/1A]

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 80000000
     Offset = 0
     The total memory requirement is 1831 MB
     You are running each test 10 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity/precision appears to be 28 microseconds
     The tests below will each take a time on the order
     of 40113 microseconds
        (= 1433 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 31959.2652 0.0401 0.0401 0.0401
    Scale: 31920.2167 0.0401 0.0401 0.0401
    Add: 31983.0006 0.0600 0.0600 0.0600
    Triad: 31982.9371 0.0600 0.0600 0.0600
     Sum of a is = 9.226406249984801D+19
     Sum of b is = 1.845281249988019D+19
     Sum of c is = 2.460375000016015D+19

    [SX-6/2A]

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 80000000
     Offset = 0
     The total memory requirement is 1831 MB
     You are running each test 10 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity/precision appears to be 28 microseconds
     The tests below will each take a time on the order
     of 20146 microseconds
        (= 720 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 63770.4794 0.0201 0.0201 0.0202
    Scale: 63665.7411 0.0201 0.0201 0.0202
    Add: 63908.6389 0.0301 0.0300 0.0301
    Triad: 63908.3853 0.0301 0.0300 0.0301
     Sum of a is = 9.226406249985600D+19
     Sum of b is = 1.845281249991997D+19
     Sum of c is = 2.460375000000000D+19

    [SX-6/4A]

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 80000000
     Offset = 0
     The total memory requirement is 1831 MB
     You are running each test 10 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity/precision appears to be 28 microseconds
     The tests below will each take a time on the order
     of 10132 microseconds
        (= 362 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 126895.8381 0.0101 0.0101 0.0101
    Scale: 126620.4981 0.0101 0.0101 0.0102
    Add: 127643.0474 0.0150 0.0150 0.0150
    Triad: 127633.9437 0.0150 0.0150 0.0150
     Sum of a is = 9.226406249987199D+19
     Sum of b is = 1.845281249999994D+19
     Sum of c is = 2.460375000000000D+19

    [SX-6/8A]

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 80000000
     Offset = 0
     The total memory requirement is 1831 MB
     You are running each test 10 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity/precision appears to be 29 microseconds
     The tests below will each take a time on the order
     of 6047 microseconds
        (= 209 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 202627.2054 0.0064 0.0063 0.0065
    Scale: 192306.2280 0.0067 0.0067 0.0068
    Add: 190231.3486 0.0102 0.0101 0.0104
    Triad: 213024.2882 0.0092 0.0090 0.0093
     Sum of a is = 9.226406249990398D+19
     Sum of b is = 1.845281250000000D+19
     Sum of c is = 2.460375000000000D+19

    -------



    This archive was generated by hypermail 2.1.4 : Tue Oct 29 2002 - 15:11:18 CST