From: Schmidt, David (Performance Eng.) (d.schmidt@hp.com)
Date: Fri Dec 17 2004 - 13:39:24 CST
Below are 4 CPU stream results for an HP ProLiant DL585, configured as
follows:
HP ProLiant DL585
4x2.4GHz 840 Opteron processors
32GB PC2700 memory (16x2GB DIMMs)
SuSE Linux Enterprise Server 9 for AMD64
I used Revision 5.3 of the stream code and compiled with PGI C/C++ for
Linux v.5.2-4:
pgcc -O2 -Mvect=sse -Mnontemporal -mp -o ompstream stream_omp.c
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 9500000, Offset = 0
Total memory required = 217.4 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
Number of Threads requested = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 11195 microseconds.
(= 11195 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 9515.4359 0.0144 0.0160 0.0160
Scale: 9486.4104 0.0144 0.0160 0.0161
Add: 11207.2251 0.0183 0.0203 0.0204
Triad: 11233.1592 0.0183 0.0203 0.0204
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
David Schmidt
Hewlett-Packard Company
(281) 514-5039
D.Schmidt@hp.com
This archive was generated by hypermail 2.1.4 : Sun Dec 19 2004 - 10:00:17 CST