STREAM results for the HP ProLiant BL45p and DL585 with AMD 880 Opteron CPUs

From: Schmidt, David (Performance Eng.) (D.Schmidt@hp.com)
Date: Wed Sep 28 2005 - 18:39:51 CDT

  • Next message: Schmidt, David (Performance Eng.): "STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with AMD 2.8GHz Opteron CPUs"

    John,
    Below are standard STREAM results for the HP ProLiant BL45p (4 CPUs, 8
    cores) and the HP ProLiant DL585 (4 CPUs, 8 cores) using 2.4GHz AMD 880
    Opteron dual-core processors. The configurations are described below
    with the results. Please post them to the STREAM website. Thanks,

    HP ProLiant BL45p
    4x2.4GHz /1MB L2 (per core) dual core 880 Opteron processors
    32GB PC3200 memory (16x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    I used Revision 5.3 of the stream code and compiled with PathScale EKO
    C++ compiler v.2.1:

       pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
    stream_omp.c

    For 4x2.4GHz Opteron 880 processors (8 cores):
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 49152
    Total memory required = 183.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 7460 microseconds.
       (= 7460 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 17613.8751 0.0066 0.0073 0.0074
    Scale: 17682.3303 0.0065 0.0072 0.0074
    Add: 17782.6783 0.0098 0.0108 0.0109
    Triad: 17786.2130 0.0098 0.0108 0.0109
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    ------------------
    HP ProLiant DL585
    4x2.4GHz/1MB L2 (per core) 880 Opteron processors
    32GB PC3200 memory (16x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    I used Revision 5.3 of the stream code and compiled with PathScale EKO
    C++ compiler v.2.1:

       pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
    stream_omp.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 41472
    Total memory required = 183.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 7652 microseconds.
       (= 7652 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 16892.8263 0.0069 0.0076 0.0078
    Scale: 16913.0489 0.0068 0.0076 0.0076
    Add: 16493.3922 0.0105 0.0116 0.0119
    Triad: 16507.5920 0.0105 0.0116 0.0117
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------

    David Schmidt
    Hewlett-Packard Company
    (281) 514-5039
    D.Schmidt@hp.com



    This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT