In article <4lanqc$k0t@strauss.udel.edu> you write:
>In article <4l6hjo$8ok@engnews2.Eng.Sun.COM>, ASHOK SINGHAL <ashok.singhal@Sun.
COM> writes:
>
>> Here are the McCalpin Streams numbers for a
>> 16 CPU Ultra Enterprise 6000:
>>
>> ---------------------------------------------
>> SUSTAINED BANDWIDTH (MB/s)
>> cpus COPY SCALE ADD TRIAD
>> ----------------------------------------------
>> 16 1753.1 1746.6 1919.5 1945.1
>> 12 1641.9 1647.1 1710.4 1749.1
>> 8 1237.2 1248.9 1256.5 1299.6
>> 4 674.2 681.0 680.3 711.3
>> 2 365.7 371.9 379.8 386.4
>> 1 196.1 198.8 202.8 208.6
>> ----------------------------------------------
I tried the copy function in your stream benchmark using the block
load and store instructions available in UltraSparc. With it the number
for copy goes from about 200 MB/s to about 361 MB/s on my 167 MHz
UltraSparc-1 system with 512 kb of L2 cache (not that the cache matters
here).
I have forwarded this information to our software guys and soon
we should be able to deliver this number directly from the compiler.
It should also be possible to use bld/bst in the other codes with
slightly increased complexity. That would push those numbers up as
well. I don't know about the details of the above MP system and
whether it could sustain this nice increase in BW all the way up to
16 processors.
I thought you might be interested.
Thanks,
Partha Tirumalai
$ a.out
Results with copy
Measured time 44.209003448486 ms
Copy rate 361.91722843614 Mb/sec
1.0000000000000 1.0000000000000
Measured time 44.252038002014 ms
Copy rate 361.56526845773 Mb/sec
1.0000000000000 1.0000000000000
Measured time 44.214010238647 ms
Copy rate 361.87624496487 Mb/sec
1.0000000000000 1.0000000000000
This is for copying 1 million double values (8 Mb read and 8 Mb written)
- should exceed the size of the L2 by a significant amount and thus operated
out of memory as you want.
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:05 CDT