From: Schmidt, David (Performance Eng.) (D.Schmidt@hp.com)
Date: Wed Sep 28 2005 - 18:46:47 CDT
John,
Below are standard STREAM results for the HP ProLiant BL25p (2 CPUs),
the HP ProLiant BL45p (4 CPUs), the HP ProLiant DL385 (2 CPUs), and the
HP ProLiant DL585 (4 CPU) using 2.8GHz AMD Opteron processors. The
configurations are described below with the results. Please post them to
the STREAM website. Thanks,
HP ProLiant BL25p
2x2.8GHz/1MB L2 Opteron 254 processors
16GB memory (8x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1 - kernel 2.6.5-7.139-smp
I used Revision 5.3 of the stream code and compiled with PathScale EKO
Compiler Suite 2.1 and used the following:
/opt/pathscale/bin/pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4
-mp -o ompstream stream_omp.c
For 2x2.8GHz Opteron 254 processors:
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2500000, Offset = 26112
Total memory required = 57.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads requested = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 5116 microseconds.
(= 5116 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 9368.0362 0.0039 0.0043 0.0043
Scale: 9206.1106 0.0040 0.0043 0.0047
Add: 9457.6361 0.0057 0.0063 0.0064
Triad: 9499.4051 0.0057 0.0063 0.0064
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
------------------------------------------------------------------------
-------
HP ProLiant BL45p
4x2.8GHz /1MB L2 854 Opteron processors
32GB PC3200 memory (16x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1
I used Revision 5.3 of the stream code and compiled with PathScale EKO
C++ compiler v.2.1:
pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
stream_omp.c
For 4x2.8GHz Opteron 854 processors:
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 8000000, Offset = 49152
Total memory required = 183.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 8164 microseconds.
(= 8164 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 18515.9825 0.0062 0.0069 0.0070
Scale: 17664.8760 0.0066 0.0072 0.0074
Add: 18595.7227 0.0093 0.0103 0.0104
Triad: 18689.8062 0.0093 0.0103 0.0103
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
------------------------------------------------------------------------
-------
HP ProLiant DL385
2x2.8Hz/1MB L2 254 Opteron processors
16GB PC3200 DDR memory (8x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64)
I used Revision 5.3 of the stream code and compiled with PathScale C/C++
for Linux v.2.1:
pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o
ompstream -o ompstream stream_omp.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 59392
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads requested = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 2033 microseconds.
(= 2033 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 9400.3171 0.0015 0.0017 0.0017
Scale: 9205.6055 0.0016 0.0017 0.0017
Add: 9638.3853 0.0022 0.0025 0.0025
Triad: 9721.2261 0.0022 0.0025 0.0025
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
------------------------------------------------------------------------
-------
HP ProLiant DL585
4x2.8GHz/1MB L2 854 Opteron processors
32GB PC3200 memory (16x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1
I used Revision 5.3 of the stream code and compiled with PathScale EKO
C++ compiler v.2.1:
pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
stream_omp.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 7000000, Offset = 58880
Total memory required = 160.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 7424 microseconds.
(= 7424 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 18285.7940 0.0055 0.0061 0.0062
Scale: 18223.3706 0.0055 0.0061 0.0062
Add: 18346.7355 0.0082 0.0092 0.0092
Triad: 18386.9497 0.0082 0.0091 0.0093
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
David Schmidt
Hewlett-Packard Company
(281) 514-5039
D.Schmidt@hp.com
This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT