From: James Cook (cook@ned1.sims.nrc.ca)
Date: Fri Aug 09 2002 - 10:21:27 CDT
I have some interesting STREAM results for Dual processor machines. You
don't seem to have many results for 2 cpu machines so I thought I would
send these in.
First machine is
Dual PIII 933mhz.
1GB of PC133 ram.
Motherboard: ABIT VP6
Runnning Linux kernel 2.4.19-rc3 (SMP of course)
Compiler: PGI 3.2-4
For single processor the compiler directives are
pgcc -fast -tp p6 -Minline -Mvect=assoc,sse,cachesize:327680
-Mcache_align second_wall_pg.c -c
pgf77 -fast -tp p6 -Minline -Mvect=assoc,prefetch,cachesize:327680
-Mcache_align second_wall_pg.o stream_d.f -o stream_pgi
Results for 1 CPU:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 1000000
Offset = 0
The total memory requirement is 22 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 583.7067 0.0274 0.0274 0.0275
Scale: 669.3169 0.0239 0.0239 0.0240
Add: 652.0326 0.0368 0.0368 0.0369
Triad: 716.8468 0.0335 0.0335 0.0336
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Next: 2 cpus. I added the compiler directive -mp to the above
statements to enable the OpenMP directives.
The results for OpenMP:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 1000000
Offset = 0
The total memory requirement is 22 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 643.3466 0.0249 0.0249 0.0250
Scale: 685.2555 0.0234 0.0233 0.0234
Add: 641.6766 0.0374 0.0374 0.0375
Triad: 611.3099 0.0393 0.0393 0.0394
----------------------------------------------------
Solution Validates!
----------------------------------------------------
The OpenMP doesn't seem to be any help.
Next machine is a Dual Athlon 1900+ (1.6Ghz)
1GB of ram (266Mhz)
Motherboard: Asus A7M266-D
Running SuSE SMP Linux kernel 2.4.18
Compiler PGI 3.2-4
the compiler directives I used:
pgcc -fast -tp athlon -Minline -Mvect=assoc,sse,cachesize:393216
-Mcache_align second_wall_pg.c -c
pgf77 -fast -tp athlon -Minline -Mvect=assoc,sse,cachesize:393216
-Mcache_align second_wall_pg.o stream_d.f -o stream_pgi
Without OpenMP
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 1000000
Offset = 0
The total memory requirement is 22 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 719.6844 0.0223 0.0222 0.0225
Scale: 766.1756 0.0209 0.0209 0.0209
Add: 844.4471 0.0285 0.0284 0.0286
Triad: 849.6478 0.0283 0.0282 0.0283
----------------------------------------------------
Solution Validates!
----------------------------------------------------
With OpenMP
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 1000000
Offset = 0
The total memory requirement is 22 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 712.4733 0.0225 0.0225 0.0225
Scale: 741.1513 0.0216 0.0216 0.0216
Add: 867.1146 0.0277 0.0277 0.0277
Triad: 863.7760 0.0278 0.0278 0.0278
----------------------------------------------------
Solution Validates!
----------------------------------------------------
I can't seem to achieve good results with the OpenMP code. It doesn't seem
to be doing anything at all.
I may have be able to run the MPI version of stream soon on the cluster
here.
Hope you enjoy.
James Cook
Steacie Institue for Molecular Sciences
National Research Council Canada
This archive was generated by hypermail 2.1.4 : Fri Nov 08 2002 - 13:37:15 CST