Your program was using cm_timer_cm_read_busy, presumably inserted by Alex.
Unfortunately, this function and other functions that return a CM time to
the caller are BROKE on CMSS 6.1. So I had to revert to the usual
CM_timer_print which prints to standard error.
Also, the optimizer did optimize away all of your code.
So I returned to my own program, and did basically the same thing, using
primitives that were *not* optimized away.
Now I can't give you a spread with RMS error on the times, but just by
looking at consecutive runs I can tell you that the number of digits I
supply indicates the accuracy.
The following are for N=1,000,000 repeat count NITER=125, making for
exactly 1 GB (decimal, that is) per stream (double precision).
Times in seconds.
CM-2 8K 8MHz CM-2 4K 8MHz CM-2 4K 7MHz bytes/tick
*optimized* (256 PEs) (128 PEs) (128 PEs) (approx)
a = a + 1.d0 0.523 1.045 1.201 2
a = a + b 0.713 1.425 1.633 2
a = a + 2.d0*b 0.717 1.434 1.642 2
*unoptimized*
a = 2.d0 0.364 0.727 0.833 1.333
a = b 0.543 1.085 1.242 2
a = 2.d0*a 0.523 1.046 1.201 2
a = b + c 0.716 1.432 1.642 2
a = c + 2.d0*b 0.750 1.500 1.719 2
This looks like a speed of 2 bytes per PE per clock tick.
Which is less than the nominal rate of 1 slice per PE per tick.
Daan Sandee sandee@think.com
Thinking Machines Corporation
Cambridge, Mass 02142
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT