Hi,
I did some benchmarking on dual-core Opterons (Socket 1207) with 667 DDR2. Only got about 20% increase over previous DDR-400 HMz
dual single core opterons going from 2 to 4 processors (about 11 Gb/s with the new dual cores versus about 9 GB/s with two older (254) single cores). Don't have the output since I left Reclamation and did the benchmarking there.
Here's a decent floating point box for now although it clearly only has enough memory bandwidth for about two processors. AMD really needs to update to HT 2/3, although maybe the memory hardware/modules aren't ready yet...
Avadirect AMD Quad FX Workstation (two AMD FX-72 dual-core precessors).
8 GB of 800 MHz DDR2 memory (4 2GB modules).
OpenSuse 10.2
Linux 2.6.18.8-0.3-default #1 SMP Tue Apr 17 08:42:35 UTC 2007 x86_64 x86_64 x86_64 GNU/Linux
Pathscale 2.4
pathf95 -Ofast -CG:load_exe=2 -LNO:blocking=off -msse -msse2 -m3dnow -mp basic_stream.f -DUNDERSCORE second_wall.c -o ps2p4_openmp_stream -static
4 processor result
** OpenMP warning: requested pthread stack too large, using 8388608 bytes instea
d
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 16000000
Offset = 0
The total memory requirement is 366 MB
You are running each test 20 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
Number of Threads = 4
Number of Threads = 4
Number of Threads = 4
Number of Threads = 4
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 14810.6406 0.0332 0.0173 0.0523
Scale: 15003.0995 0.0300 0.0171 0.0504
Add: 15325.0115 0.0406 0.0251 0.0800
Triad: 15498.2846 0.0390 0.0248 0.0783
----------------------------------------------------
Solution Validates!
----------------------------------------------------
real 0m3.186s
user 0m5.944s
sys 0m0.272s
Single processor
pathf95 -Ofast -CG:load_exe=2 -LNO:blocking=off -msse -msse2 -m3dnow basic_stream.f -DUNDERSCORE second_wall.c -o ps2p4_scalar_stream -static
numactl --localalloc ./ps2p4_scalar_stream
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 16000000
Offset = 0
The total memory requirement is 366 MB
You are running each test 20 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 8232.2594 0.0311 0.0311 0.0313
Scale: 8065.0605 0.0318 0.0317 0.0319
Add: 7576.9651 0.0508 0.0507 0.0510
Triad: 7587.7452 0.0507 0.0506 0.0512
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Cheers, DRHO
Daniel R.H. O'Connell, Ph.D.
Senior Geophysicist
William Lettis and Associates, Inc.
433 Park Point Drive, Suite 250
Golden, CO 80401
oconnell@lettis.com
Received on Fri Jun 08 10:57:09 2007
This archive was generated by hypermail 2.1.8 : Mon Jun 18 2007 - 10:04:57 CDT