STREAM results on a dual Opteron 246
From: Paul Saxe (PSaxe@MaterialsDesign.com)
Date: Mon May 09 2005 - 23:48:27 CDT
Next message: Schmidt, David (Performance Eng.): "STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p with 852 Opteron CPUs"
Dear Dr. McCalpin,
I ran the 01-stream_d_c_omp_x86_64 executable from your 2004-11-17 "New
Opteron Binary" note on a dual processor Opteron 246 (2 GHz) machine
with 2 GB PC3200 memory, a Tyan 2882 motherboard and running Fedora 2
for the AMD64. Perhaps these results are of use to you -- they seem to
be considerably different than similar ones on the web site.
If you need any other information or would like me to do anything else,
please let me know.
Paul Saxe.
[psaxe@opteron1 psaxe]$ export OMP_NUM_THREADS=1
[psaxe@opteron1 psaxe]$ ./01*
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 507503 microseconds.
(= 507503 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 4163.9075 0.0769 0.0769 0.0772
Scale: 4326.7150 0.0740 0.0740 0.0742
Add: 4271.1491 0.1125 0.1124 0.1126
Triad: 4339.7302 0.1107 0.1106 0.1108
[psaxe@opteron1 psaxe]$ export OMP_NUM_THREADS=2
[psaxe@opteron1 psaxe]$ ./01*
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 256726 microseconds.
(= 256726 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 8052.1296 0.0398 0.0397 0.0402
Scale: 8491.1922 0.0378 0.0377 0.0381
Add: 8369.8108 0.0574 0.0573 0.0578
Triad: 8431.9977 0.0570 0.0569 0.0570
This archive was generated by hypermail 2.1.4
: Mon Jun 13 2005 - 08:58:09 CDT