>L2$_stream_pe_code_3:
> dflodv [aP3+0]2++ aV0
> # C = A + B
> dfaddv [aP2+0]2++ aV0 aV1
> dfstrv aV1 [aP4+0]2++
> jnz aC2 L2$_stream_pe_code_3 ! this is an iteration counter
Okay, I've calmed down, and now I *do* recognize the CM-2.
Every instruction is a macro which expands into actual CMIS code.
So, [above]
load into register file using memory address at P3 strided by 2 (slices) and
using register file address V0;
load (ditto, P2) and add to register contents using RF address V0 > V1 ;
store (ditto, P4) from RF at RF address V1.
> # C = A + 3.0D0*B
> dflodc $3.000000000000000000d+00 aS28
>L2$_stream_pe_code_4:
> dflodv [aP2+0]2++ aV0
> # C = A + 3.0D0*B
> dfmuladdv aS28 [aP3+0]2++ aV1 aV0 aV1
> dfstrv aV1 [aP4+0]2++
> jnz aC2 L2$_stream_pe_code_4
This appears to be the same thing except the Weitek chip does a multiply-
with-constant (in one hardware instruction ; I don't know the op codes.)
So what has this to do with memory bandwidth ? Nothing. In both cases
it loads 3 data words per 6 cycles ; in the second case it does two flops
in stead of one, per 6 cycles (and per output word). I'd have to go to the
chip manual to see if there is anything to be got out of the timing specs.
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT