From: Schmidt, David (Performance Eng.) (D.Schmidt@hp.com)
Date: Wed Sep 28 2005 - 18:23:33 CDT
John,
Below are standard STREAM results for the HP ProLiant BL20p G3 (1 CPU),
the HP ProLiant DL140 G2 (1 CPU), and the HP ProLiant ML350 G4p (4 CPU)
using 3.6GHz Intel Xeon processors. The configurations are described
below with the results. Please post them to the STREAM website. Thanks,
-----------------------------------------------------------------------
[Editor's Note: In a separate e-mail, the submitter clarified that
all results presented here were run with a single thread specified
using the "OMP_NUM_THREADS" environment variable.]
-----------------------------------------------------------------------
HP ProLiant BL20p G3
1x3.6GHz/2MB L2 Xeon processors
8GB PC3200 memory (4x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version
8.1, Build 20040812
icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 3500000, Offset = 5120
Total memory required = 80.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 15221 microseconds.
(= 15221 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3764.4809 0.0150 0.0149 0.0155
Scale: 3723.8961 0.0151 0.0150 0.0151
Add: 4028.6037 0.0209 0.0209 0.0209
Triad: 3893.5719 0.0216 0.0216 0.0217
-------------------------------------------------------------
Solution Validates
HP ProLiant DL140 G2
1x3.6GHz/2MB L2 Xeon processor
8GB memory (8x1024MB)
SuSE Enterprise Linux 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R)EM64T-based applications, Version 8.1
Build 20040812
icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 9000000, Offset = 512
Total memory required = 206.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 41546 microseconds.
(= 41546 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3800.3837 0.0380 0.0379 0.0386
Scale: 3657.1367 0.0395 0.0394 0.0398
Add: 3994.7514 0.0541 0.0541 0.0542
Triad: 3914.8118 0.0552 0.0552 0.0553
-------------------------------------------------------------
Solution Validates
HP ProLiant ML350 G4p
1x3.6GHz/2MB L2 Xeon processors
8GB PC3200 memory (4x2GB DIMMs)
SuSE Linux Enterprise Server 9 (x86_64) SP1
Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version
8.1, Build 20040812
icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 4500000, Offset = 16896
Total memory required = 103.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 19776 microseconds.
(= 19776 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3834.5001 0.0189 0.0188 0.0195
Scale: 3756.0464 0.0192 0.0192 0.0192
Add: 4100.0781 0.0264 0.0263 0.0264
Triad: 3968.6773 0.0273 0.0272 0.0273
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
David Schmidt
Hewlett-Packard Company
(281) 514-5039
D.Schmidt@hp.com
This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT