From: Kevin Ball (kball@pathscale.com)
Date: Fri Feb 04 2005 - 18:00:53 UTC
Hi John,
I'm a benchmark engineer at PathScale, working under Tom Elken. We'd
like to do a STREAM submission with the latest released compiler
(EKOPath 2.0) on AMD Opteron. There are both serial results with one
machine, and OpenMP results up to 4 threads with another. Details are
below. Thanks much!
-Kevin Ball
System info:
Compiler: PathScale EKO Compiler Suite, Release 2.0
Model Name: ASUS SK8N Motherboard, AMD Opteron (TM) Model 248
CPU: AMD Opteron 248
CPU MHz: 2200
FPU: Integrated
CPU(s) enabled: 1 core, 1 chip, 1 core/chip
Memory: 4x512MB, DDR400, PC3200, Corsair, CL2
Operating System: SuSE Linux 9.0 (AMD64) 2.4.21-102-default
> pathcc -Ofast -LNO:prefetch_ahead=10 -lm stream.c mysecond.c -o
path2.0
stream.c:
mysecond.c:
> ./path2.0
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 9317 microseconds.
(= 9317 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 4811.3611 0.0060 0.0067 0.0067
Scale: 4782.3883 0.0060 0.0067 0.0068
Add: 4684.7375 0.0092 0.0102 0.0103
Triad: 4681.5783 0.0092 0.0103 0.0103
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
System info:
Compiler: PathScale EKO Compiler Suite, Release 2.0
Model Name: 4-way, 2.2 GHz AMD Opteron (TM) Model 848
CPU: AMD Opteron 848
CPU MHz: 2200
FPU: Integrated
CPU(s) enabled: 4 core, 4 chip, 1 core/chip
Memory: 16x1024MB, DDR400
Operating System: Fedora Core 2; 2.6.8-1.521smp kernel
> pathcc -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -c mysecond.c
> pathf90 -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -mp -c stream.f
> pathf90 -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -mp mysecond.o
stream.o -o path2.0mp
> export OMP_NUM_THREADS=1
> ./path2.0mp
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 4000000
Offset = 0
The total memory requirement is 91 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
Number of Threads = 1
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 4455.9519 0.0144 0.0144 0.0144
Scale: 4503.5728 0.0142 0.0142 0.0143
Add: 4326.2548 0.0222 0.0222 0.0222
Triad: 4401.4471 0.0218 0.0218 0.0219
----------------------------------------------------
Solution Validates!
----------------------------------------------------
> export OMP_NUM_THREADS=2
> ./path2.0mp
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 4000000
Offset = 0
The total memory requirement is 91 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
Number of Threads = 2
2
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 8736.1427 0.0073 0.0073 0.0073
Scale: 8876.5403 0.0072 0.0072 0.0072
Add: 8521.9409 0.0113 0.0113 0.0113
Triad: 8660.5120 0.0111 0.0111 0.0111
----------------------------------------------------
Solution Validates!
----------------------------------------------------
> export OMP_NUM_THREADS=4
> ./path2.0mp
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 4000000
Offset = 0
The total memory requirement is 91 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
Number of Threads = 4
4
4 Number of Threads = 4 44 Number of Threads = 4
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 15377.8332 0.0042 0.0042 0.0042
Scale: 15845.3135 0.0040 0.0040 0.0040
Add: 15617.6086 0.0062 0.0061 0.0062
Triad: 15920.8091 0.0061 0.0060 0.0061
----------------------------------------------------
Solution Validates!
----------------------------------------------------
This archive was generated by hypermail 2.1.4 : Tue Feb 15 2005 - 07:11:56 UTC