Dear Dr. McCalpin,
Here are results that I got on our 2CPU DEC 2100/233Mhz. I believe it uses
older 21064 chips. The benchmark effectively used only one CPU. I do not
know how to make it run in parallel.
I tried to use threads manually but I did not get a better result :( It is
not so easy to bind a thread to a CPU. Sometimes it does not want to do
that.
I checked your FTP archive. A guy from DEC just added 2 figures from
2 different runs to get a result for a 2 CPU system, 4 figures from 4 runs
- for a 4 CPU system, and so on.
Nice approach! So, for a 100 CPU DEC will have about 10GB/s (100MB/s
* 100) without problems.
In my opinion, it is not a correct result :(
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 3000000, Offset = 0
Total memory required = 68.7 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 16666 microseconds.
Each test below will take on the order of 399999 microseconds.
(= 24 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 80.0000 0.6135 0.6000 0.6500
Scaling : 77.8378 0.6385 0.6167 0.6667
Summing : 80.0000 0.9134 0.9000 0.9333
SAXPYing : 80.0000 0.9084 0.9000 0.9333
The executable file was produced using cc compiler with -O4 flag.
Have a nice day,
Igor Tatarinov
CS Dept, NDSU
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:05 CDT