From: Schmidt, David (Performance Eng.) (d.schmidt@hp.com)
Date: Mon Jun 06 2005 - 14:26:37 CDT
John,
Below are standard STREAM results for the HP ProLiant DL585 (4 CPU), and
the HP ProLiant BL45p (4 CPU) using 2.6Ghz 852 Opteron processors. The
configurations are described below with the results:
HP ProLiant DL585
4x2.6GHz/1MB L2 852 Opteron processors
32GB PC3200 memory (16x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1
I used Revision 5.3 of the stream code and compiled with PathScale EKO
C++ compiler v.2.1:
pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
stream_omp.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 8000000, Offset = 41472
Total memory required = 183.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 8920 microseconds.
(= 8920 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 17091.8122 0.0067 0.0075 0.0075
Scale: 16353.5567 0.0071 0.0078 0.0080
Add: 17225.8047 0.0100 0.0111 0.0112
Triad: 17233.9147 0.0101 0.0111 0.0113
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
=============================================================
HP ProLiant BL45p
4x2.6GHz/1MB L2 852 Opteron processors
32GB PC3200 memory (16x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1
I used Revision 5.3 of the stream code and compiled with PathScale EKO
C++ compiler v.2.1:
pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
stream_omp.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 8000000, Offset = 49152
Total memory required = 183.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 8783 microseconds.
(= 8783 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 17172.1760 0.0067 0.0075 0.0076
Scale: 16433.1470 0.0070 0.0078 0.0078
Add: 17261.6202 0.0100 0.0111 0.0113
Triad: 17322.1417 0.0100 0.0111 0.0111
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
David Schmidt
Hewlett-Packard Company
(281) 514-5039
D.Schmidt@hp.com
This archive was generated by hypermail 2.1.4 : Mon Jun 13 2005 - 08:58:13 CDT