From: Schmidt, David (Performance Eng.) (D.Schmidt@hp.com)
Date: Wed Sep 28 2005 - 18:32:21 CDT
John,
Below are standard STREAM results for the HP ProLiant BL20p G3 (1 CPU),
the HP ProLiant DL140 G2 (1 CPU), the HP ProLiant DL360 G4p (1 CPU), the
HP ProLiant DL380 G4 (1 CPU), and the HP ProLiant ML370 G4 (4 CPU) using
3.8GHz Intel Xeon processors. The configurations are described below
with the results. Please post them to the STREAM website. Thanks,
-------------------------------------------------------------
Editor's Note: In a separate note, the submitter clarified that
all results presented below were run with a single thread (set by
the "OMP_NUM_THREADS" environment variable.
-------------------------------------------------------------
HP ProLiant BL20p G3
2x3.8GHz/2MB L2 Xeon processors
8GB PC3200 memory (4x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version
8.1, Build 20040812
icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 3500000, Offset = 14336
Total memory required = 80.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 15160 microseconds.
(= 15160 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3767.2584 0.0150 0.0149 0.0156
Scale: 3730.8759 0.0151 0.0150 0.0152
Add: 4020.1456 0.0209 0.0209 0.0209
Triad: 3899.9074 0.0216 0.0215 0.0216
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
------------------------------------------------------------------------
---------------
HP ProLiant DL140 G2
2x3.8GHz/2MB L2 Xeon processor
8GB memory (8x1024MB)
SuSE Enterprise Linux 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
Build 20040812
icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 9000000, Offset = 20480
Total memory required = 206.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 44713 microseconds.
(= 44713 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3917.2916 0.0369 0.0368 0.0375
Scale: 3760.3727 0.0383 0.0383 0.0384
Add: 4086.4114 0.0529 0.0529 0.0533
Triad: 4004.8345 0.0540 0.0539 0.0540
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
------------------------------------------------------------------------
---------------
HP ProLiant DL360 G4p
2x3.8GHz/2MB L2 Xeon processor
8GB memory (4x2048MB Dual Rank DIMMs)
SuSE Enterprise Linux 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
Build 20040812
icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 9000000, Offset = 16896
Total memory required = 206.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 40524 microseconds.
(= 40524 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3912.0902 0.0369 0.0368 0.0375
Scale: 3768.9375 0.0382 0.0382 0.0383
Add: 4149.2957 0.0521 0.0521 0.0521
Triad: 4045.5549 0.0534 0.0534 0.0535
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
------------------------------------------------------------------------
---------------
HP ProLiant DL380 G4
2x3.8GHz/2MB L2 Xeon processor
8GB memory (4x2048MB Dual Rank DIMMs)
SuSE Enterprise Linux 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
Build 20040812
icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 9000000, Offset = 24576
Total memory required = 206.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 40484 microseconds.
(= 40484 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3919.6559 0.0369 0.0367 0.0375
Scale: 3775.2747 0.0382 0.0381 0.0382
Add: 4151.7676 0.0521 0.0520 0.0521
Triad: 4049.7142 0.0534 0.0533 0.0535
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
------------------------------------------------------------------------
---------------
HP ProLiant ML370 G4
1x3.8GHz/2MB L2 Xeon processor
8GB memory (8x1024MB)
SuSE Enterprise Linux 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
Build 20040812
icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 9000000, Offset = 2048
Total memory required = 206.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 40999 microseconds.
(= 40999 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3893.0521 0.0371 0.0370 0.0377
Scale: 3750.0064 0.0384 0.0384 0.0384
Add: 4119.2423 0.0525 0.0524 0.0525
Triad: 4013.6703 0.0539 0.0538 0.0539
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
David Schmidt
Hewlett-Packard Company
(281) 514-5039
D.Schmidt@hp.com
This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT