some STREAM results (fwd)

From: < (@ig1.att.att.com:ft@angels.att.com)
Date: Mon Apr 22 1996 - 19:25:57 CDT


John,

I'm wondering if you received the attached message. I noticed today that
Ashok Singhal at Sun has sent you some results (for 1-16 cpus on the
UE6000) and they seem to differ slightly from mine. I didn't work with
Ashok on these (although he has seen my results - I sent them to their
engineering group last week).

I'm suspecting that Ashok limited the results at 16 cpus for reasons
relating to what I noted in my message. Interesting :)

Just let me know if you received my message. I also included results for
1-20 cpus on a SparcCenter 2000E with 85MHz processors, which are not
currently in your table. No rush of course, just want to make sure you
got them.

Thanks,
Fred

---------- Forwarded message ----------
Date: Sun, 21 Apr 1996 16:51:12 -0400 (EDT)
From: Fred True <ft@angels.att.com>
To: mccalpin@udel.edu
Subject: some STREAM results

Hi John,

As I mentioned in my last email, I've been doing some testing of the new
Sun Ultra Enterprise 6000 machine we have, and among the tests I ran
STREAM. I took your advice about the parallel tests, and parallelized the
stream code rather than doing aggregate measures. I did both F77 and C,
and got about equal results with both, so I've decided to settle on the C
results since they more accurately reflect our environment (as well as
most people using Sun machines, I suspect).

Here, then, are the results.

Compiler: SunPro C 4.0 with iMPact MP compiler
Flags: -xcg92 -xO4 -xexplicitpar -xloopinfo
Tests: Best of 10 runs, 10 tests (NTIMES) per run

[Note: I had to insert some concurrency pragmas to keep it from optimizing
(reordering) the timing commands in the outside (k) test loop! In that
regard, the parallelization was for explicit pragmas only (the test
kernels themselves).]

1] Sun SparcCenter 2000E

Uname: SunOS asmodeus 5.4 Generic_101945-36 sun4d sparc
Processors: 20
Speed: 85 MHz SuperSparc+
ECache: 2MB per processor
Bus speed: 50 MHz
Memory: 4608Mb (4-way interleave)

Additional Cflags: -xcache=16/32/4:2048/32/1 -xchip=super

                            Bytes Bandwidth (MB/s) Speedup
Machine NCPUs /word Copy Scale Sum Triad Copy Triad
--------------- ----- ----- -------- -------- -------- -------- ---- -----
Sun_SC2000E_spro 20 8 426.9 364.8 456.2 408.2 13.3 12.2
Sun_SC2000E_spro 16 8 330.2 289.5 351.1 323.2 10.3 9.7
Sun_SC2000E_spro 14 8 283.5 252.3 307.5 286.3 8.8 8.6
Sun_SC2000E_spro 12 8 237.2 214.3 257.6 240.9 7.4 7.2
Sun_SC2000E_spro 8 8 140.9 131.9 153.1 147.8 4.4 4.4
Sun_SC2000E_spro 6 8 130.2 128.0 150.0 135.4 4.0 4.0
Sun_SC2000E_spro 4 8 111.6 118.8 128.8 125.7 3.5 3.8
Sun_SC2000E_spro 2 8 66.4 53.6 57.0 70.1 2.0 2.1
Sun_SC2000E_spro 1 8 32.2 29.5 36.5 33.4 1.0 1.0

2] Sun Ultra Enterprise 6000 (SunFire)

Uname: SunOS mephisto 5.5.1 BETA_REFRESH sun4u sparc SUNW,Ultra-Enterprise
Processors: 24
Speed: 167 MHz Ultrasparc I
ECache: 512KB per processor
Bus speed: 83 MHz
Memory: 3072Mb (see **note)

Additional CFlags: -xcache=16/32/1:512/64/1 -xchip=ultra

                            Bytes Bandwidth (MB/s) Speedup
Machine NCPUs /word Copy Scale Sum Triad Copy Triad
--------------- ----- ----- -------- -------- -------- -------- ---- -----
Sun_UE6000_spro 24 8 1760.6 1745.4 1950.1 1981.8 10.7 10.8
Sun_UE6000_spro 20 8 1752.2 1740.6 1938.2 1957.1 10.6 10.7
Sun_UE6000_spro 16 8 1732.4 1738.9 1854.5 1916.4 10.5 10.4
Sun_UE6000_spro 12 8 1475.2 1567.5 1538.9 1510.8 8.9 8.2
Sun_UE6000_spro 8 8 1069.1 1175.4 1110.4 1201.8 6.5 6.6
Sun_UE6000_spro 6 8 800.4 882.3 847.1 865.4 4.9 4.7
Sun_UE6000_spro 4 8 556.1 609.8 584.5 635.4 3.4 3.5
Sun_UE6000_spro 2 8 309.5 314.3 314.3 343.0 1.9 1.9
Sun_UE6000_spro 1 8 165.0 170.4 165.3 183.5 1.0 1.0

----

**NOTE: due to the configuration of the UE6000 machine we have; the memory is a combination of 4-way and 8-way interleaved (the machine only interleaves adjacent N-way banks, and we don't have enough memory/boards to get up to full 8-way or 16-way interleaving). The current configuration is:

Memory Units: Size, Interleave Factor, Interleave With 0: MB Factor: With: 1: MB Factor: With: ----- ------- ----- ----- ------- ----- Board 0: 256 8-way A Board 2: 256 8-way A Board 4: 256 8-way A Board 5: 256 8-way A Board 6: 256 8-way A Board 7: 256 8-way A Board 8: 256 8-way A Board 9: 256 8-way A Board10: 256 4-way B Board11: 256 4-way B Board12: 256 4-way B Board13: 256 4-way B

This may be somewhat responsible for the flattened speedup over 16 processors. This could also be due to the fact that the UPA sustainable bandwidth is 1.3 GB/s (between each CPU/memory module) so this could be the result of saturation in the face of increased coherency traffic. I need to spend some more time with it to figure this out.

Still, I think these are fine results from Sun (at last).

Regards, Fred

-- Fred True "My name is Ozymandias, King of Kings: AT&T High Tech Database Development Look on my works, ye Mighty, ft@angels.att.com and despair!" ftrue@research.att.com -P. B. Shelley



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:05 CDT