In article fsf@cfm.brown.edu, ce107@cfm.brown.edu (C. Evangelinos) writes:
>
> Which in turn poses the question: On systems where non-cacheable
> load/stores (like VIS) can be used or on others where certain stores
> can avoid write-allocate traffic (eg. the new PIII SIMD extensions
> unless I'm mistaken) are there compilers that will actually generate
> the code (preferably in the absence of non-portable directives) or
> does one always need to go and use assembler or special tools like
> Intel's Vtune? STREAM experimental numbers contain very impressive
> results for Suns using VIS but at least the 4.2 compilers didn't
> appear capable to generate such code even for a simple copy loop with
> a hard-coded loop length... No idea if the 5.0 compilers are any
> better.
>
The 5.0 compilers can handle the copy loop, but
apparently they don't do the others.
The compilers can now also generate prefetch instructions.
[In addition, the processors are now faster (400 MHz) and the system
speed has also improved to 100 MHz]
On a 8-way 400 MHz E3500, here is what we get (compiled with 5.0 compilers, no assembly):
CPUs COPY SCALE SUM TRIAD
1 577.7 369.1 375.9 431.7
2 1160.9 721.9 690.5 847.4
4 1884.1 984.3 1043.9 1189.9
8 3021.7 1538.5 1512.1 1768.8
(With only 4 boards, this box runs out of memory bank bandwidth before it
runs out of backplane bandwidth for the 8CPU case.)
Ashok Singhal
Sun Microsystems
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT