Hello John;
I'm still trying to figure out the CM-2 stream code. I now strongly
suspect that the compiler has done some dead code elimination.
The orginal (from TMC) source code has additional "do i=1,100" loops
around each kernel and the TIMES array is adjusted accordingly. I ran
the KAP output (from TMC) on a CRAY Y-MP (this code is attached
below). The results are -low- by a factor of 100. The KAP output does
not have the correction for the factor of 100.
I'm disappointed that the CF77 compiler did not see the dead
code in the "do i=1,100" loops. Just the same, the calibaration
in this program is incorrect.
I suspect that the CM-5 compiler code optimized eventually kicked in
and deleted somethin. Else, if it it really ran the program correctly,
the bandwidth is 8,000,000 Mbyte/sec.!
Regards,
-- Charles M. Grassl Cray Research, Inc. (612) 683-3531 cmg@cray.comC Source code output from KAP: PROGRAM stream C .. Parameters .. INTEGER n,ntimes PARAMETER (p=256,n=4000*p,ntimes=10) C .. C .. Local Scalars .. INTEGER j,k,nbpw C .. C .. Local Arrays .. real a(n),b(n),c(n),maxtime(4),mintime(4),rmstime(4), $ times(4,ntimes) INTEGER bytes(4) CHARACTER label(4)*11 C .. C .. External Functions .. INTEGER realsize C .. C .. Intrinsic Functions .. INTRINSIC dble,max,min,sqrt C .. C .. Data statements .. DATA rmstime/4*0.0/,mintime/4*1.0E+36/,maxtime/4*0.0/ DATA label/' Assignment:',' Scaling :',' Summing :', $ ' SAXPYing :'/ DATA bytes/2,2,3,3/ etime()=second() C .. * --- SETUP --- determine precision and check timing --- nbpw = realsize() t = etime() A = 1.0e0 B = 2.0e0 C = 0.0e0 t=etime()-t PRINT *,'Timing calibration ; time = ',t*100,' hundredths', $ ' of a second' PRINT *,'Increase the size of the arrays if this is <30 ', $ ' and your clock precision is =<1/100 second' PRINT *,'---------------------------------------------------' * --- MAIN LOOP --- repeat test cases NTIMES times --- DO 60 k = 1,ntimes t=etime() DO 20 I=1,100 C = A 20 CONTINUE t=etime()-t times(1,k) = t t=etime() DO 30 I=1,100 C = 3.0 * A 30 CONTINUE t=etime()-t times(2,k) = t t=etime() DO 40 I=1,100 C = A + B 40 CONTINUE t=etime()-t times(3,k) = t
t=etime() DO 50 I=1,100 C = A + 3.0 * B 50 CONTINUE t=etime()-t times(4,k) = t call dummysub(a,b,c,n) 60 CONTINUE * --- SUMMARY --- C*$*NOVECTORIZE DO 80 k = 1,ntimes DO 70 j = 1,4 rmstime(j) = rmstime(j) + times(j,k)**2 mintime(j) = min(mintime(j),times(j,k)) maxtime(j) = max(maxtime(j),times(j,k)) 70 CONTINUE 80 CONTINUE WRITE (*,FMT=9000) DO 90 j = 1,4 rmstime(j) = sqrt(rmstime(j)/float(ntimes)) WRITE (*,FMT=9010) label(j),n*bytes(j)*nbpw/mintime(j)/1.0e6, $ rmstime(j),mintime(j),maxtime(j) 90 CONTINUE 9000 FORMAT (' Function',5x,'Rate (MB/s) RMS time Min time Max time' $ ) 9010 FORMAT (a,4 (f10.4,2x)) END *------------------------------------- * INTEGER FUNCTION realsize() * * A semi-portable way to determine the precision of default REAL * in Fortran. * Here used to guess how many bytes of storage a real number occupies. * integer function realsize() double precision ref(30) real test double precision pi
C Test #1 - compare double precision pi to acos(-1.0e0)
pi = 3.14159 26535 89793 23846 26433 83279 50288 d0 picalc = acos(-1.0e0) diff = abs(picalc-pi) if (diff.eq.0.0) then print *,'Test #1 Failed = picalc=piexact' print *,'Apparently Single=Double Precision' print *,'Proceeding to Test #2' print *,' ' goto 200 else ndigits = -log10(abs(diff))+0.5 goto 1000 endif
C Test #2 - compare single(1.0d0+delta) to 1.0e0
200 do 10 j=1,30 ref(j) = 1.0d0+10.0d0**(-j) 10 continue
do 20 j=1,30 test = ref(j) ndigits = j call dummy(test,result) if (test.eq.1.0e0) then goto 1000 endif 20 continue print *,'Test #2 failed - Precision appears to exceed 30 digits' print *,'Proceeding to Test #3' goto 300
C Test #3 - abs(sqrt(1.0d0)-sqrt(1.0e0))
300 diff = abs(sqrt(1.0d0)-sqrt(1.0e0)) if (diff.eq.0.0) then print *,'Test Failed - sqrt(1.0e0)=sqrt(1.0d0)' print *,'Apparently Single=Double Precision' print *,'Giving up' goto 400 else ndigits = -log10(abs(diff))+0.5 goto 1000 endif
1000 write (*,'(a)') '--------------------------------------' write (*,'(1x,a,i2,a)') 'Single precision appears to have ', $ ndigits,' digits of accuracy' if (ndigits.le.8) then realsize = 4 else realsize = 8 endif write (*,'(1x,a,i1,a)') 'Assuming ',realsize, $ ' bytes per default REAL word' write (*,'(a)') '--------------------------------------' return
400 print *,'Hmmmm. I am unable to determine the size of a REAL' print *,'Please enter the number of Bytes per REAL number : ' read (*,*) realsize if (realsize.ne.4.and.realsize.ne.8) then print *,'Your answer ',sizeof,' does not make sense!' print *,'Try again!' print *,'Please enter the number of Bytes per ', $ 'REAL number : ' read (*,*) realsize endif print *,'You have manually entered a size of ',realize, $ ' bytes per REAL number' write (*,'(a)') '--------------------------------------' end
subroutine dummy(q,r) r = cos(q) return end subroutine dummysub(a,b,c,n) return end
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT