Hi John,
As I mentioned in my last email, I've been doing some testing of the new
Sun Ultra Enterprise 6000 machine we have, and among the tests I ran
STREAM. I took your advice about the parallel tests, and parallelized the
stream code rather than doing aggregate measures. I did both F77 and C,
and got about equal results with both, so I've decided to settle on the C
results since they more accurately reflect our environment (as well as
most people using Sun machines, I suspect).
Here, then, are the results.
Compiler: SunPro C 4.0 with iMPact MP compiler
Flags: -xcg92 -xO4 -xexplicitpar -xloopinfo
Tests: Best of 10 runs, 10 tests (NTIMES) per run
[Note: I had to insert some concurrency pragmas to keep it from optimizing
(reordering) the timing commands in the outside (k) test loop! In that
regard, the parallelization was for explicit pragmas only (the test
kernels themselves).]
1] Sun SparcCenter 2000E
Uname: SunOS asmodeus 5.4 Generic_101945-36 sun4d sparc
Processors: 20
Speed: 85 MHz SuperSparc+
ECache: 2MB per processor
Bus speed: 50 MHz
Memory: 4608Mb (4-way interleave)
Additional Cflags: -xcache=16/32/4:2048/32/1 -xchip=super
Bytes Bandwidth (MB/s) Speedup
Machine NCPUs /word Copy Scale Sum Triad Copy Triad
--------------- ----- ----- -------- -------- -------- -------- ---- -----
Sun_SC2000E_spro 20 8 426.9 364.8 456.2 408.2 13.3 12.2
Sun_SC2000E_spro 16 8 330.2 289.5 351.1 323.2 10.3 9.7
Sun_SC2000E_spro 14 8 283.5 252.3 307.5 286.3 8.8 8.6
Sun_SC2000E_spro 12 8 237.2 214.3 257.6 240.9 7.4 7.2
Sun_SC2000E_spro 8 8 140.9 131.9 153.1 147.8 4.4 4.4
Sun_SC2000E_spro 6 8 130.2 128.0 150.0 135.4 4.0 4.0
Sun_SC2000E_spro 4 8 111.6 118.8 128.8 125.7 3.5 3.8
Sun_SC2000E_spro 2 8 66.4 53.6 57.0 70.1 2.0 2.1
Sun_SC2000E_spro 1 8 32.2 29.5 36.5 33.4 1.0 1.0
2] Sun Ultra Enterprise 6000 (SunFire)
Uname: SunOS mephisto 5.5.1 BETA_REFRESH sun4u sparc SUNW,Ultra-Enterprise
Processors: 24
Speed: 167 MHz Ultrasparc I
ECache: 512KB per processor
Bus speed: 83 MHz
Memory: 3072Mb (see **note)
Additional CFlags: -xcache=16/32/1:512/64/1 -xchip=ultra
Bytes Bandwidth (MB/s) Speedup
Machine NCPUs /word Copy Scale Sum Triad Copy Triad
--------------- ----- ----- -------- -------- -------- -------- ---- -----
Sun_UE6000_spro 24 8 1760.6 1745.4 1950.1 1981.8 10.7 10.8
Sun_UE6000_spro 20 8 1752.2 1740.6 1938.2 1957.1 10.6 10.7
Sun_UE6000_spro 16 8 1732.4 1738.9 1854.5 1916.4 10.5 10.4
Sun_UE6000_spro 12 8 1475.2 1567.5 1538.9 1510.8 8.9 8.2
Sun_UE6000_spro 8 8 1069.1 1175.4 1110.4 1201.8 6.5 6.6
Sun_UE6000_spro 6 8 800.4 882.3 847.1 865.4 4.9 4.7
Sun_UE6000_spro 4 8 556.1 609.8 584.5 635.4 3.4 3.5
Sun_UE6000_spro 2 8 309.5 314.3 314.3 343.0 1.9 1.9
Sun_UE6000_spro 1 8 165.0 170.4 165.3 183.5 1.0 1.0
----**NOTE: due to the configuration of the UE6000 machine we have; the memory is a combination of 4-way and 8-way interleaved (the machine only interleaves adjacent N-way banks, and we don't have enough memory/boards to get up to full 8-way or 16-way interleaving). The current configuration is:
Memory Units: Size, Interleave Factor, Interleave With 0: MB Factor: With: 1: MB Factor: With: ----- ------- ----- ----- ------- ----- Board 0: 256 8-way A Board 2: 256 8-way A Board 4: 256 8-way A Board 5: 256 8-way A Board 6: 256 8-way A Board 7: 256 8-way A Board 8: 256 8-way A Board 9: 256 8-way A Board10: 256 4-way B Board11: 256 4-way B Board12: 256 4-way B Board13: 256 4-way B
This may be somewhat responsible for the flattened speedup over 16 processors. This could also be due to the fact that the UPA sustainable bandwidth is 1.3 GB/s (between each CPU/memory module) so this could be the result of saturation in the face of increased coherency traffic. I need to spend some more time with it to figure this out.
Still, I think these are fine results from Sun (at last).
Regards, Fred
-- Fred True "My name is Ozymandias, King of Kings: AT&T High Tech Database Development Look on my works, ye Mighty, ft@angels.att.com and despair!" ftrue@research.att.com -P. B. Shelley
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:05 CDT