Hi there! :)
Yet another question about STREAM...
I asked you in the far gone distance past:
"Is STREAM CPU-dependent? For example, I'm familiar with the existing
STREAM result for dual-CPU Octane (515); would/will this number go up
significantly if one were using two 250MHz R10000s or 300MHz R12000s?"
to which you replied:
STREAM performance dependends on three factors. The most obvious is
the peak carrying capacity of the system memory interface. The ability
of a system to use that bandwidth depends on the relation between the
main memory latency and the ability of the cpu to tolerate that latency.
Latency tolerance is usually quantified by:
(# outstanding misses) * (cache line size)
latency tolerance = ------------------------------------------
burst bandwidth on the interface"
>From this I infered that faster CPUs of the same chip design wouldn't
affect the results that much. However, I observed these numbers on
the STREAM results table:
COPY SCALE ADD TRIAD
SGI_Octane_195 1 309.0 309.2 346.5 351.8
SGI_Octane_175 1 277.4 277.9 311.2 316.6
SGI_Indy_R4400_175 1 72.7 68.1 71.6 70.6
SGI_Indy_R4400_100 1 45.7 41.0 44.4 50.0
For these cases, if my inference was accurate, what has caused the
jump in the numbers? I ask because a customer wants me to test a
180MHz R5000SC O2, ie. will I see higher numbers with R5K/200SC.
Btw, I just ran the default STREAM on my office Indy (R4400SC/200MHz,
IRIX 6.2, MIPS Pro 7.1) and obtained:
Function Rate (MB/s) RMS time Min time Max time
Copy: 61.5657 0.2614 0.2599 0.2684
Scale: 60.1434 0.2665 0.2660 0.2680
Add: 65.4992 0.3671 0.3664 0.3679
Triad: 74.3992 0.3238 0.3226 0.3258
Is that pretty typical? NB: running the program several times, the
numbers never varied by more than 0.1MB/sec; then I compiled with:
cc -O -n32 stream_d.c second_wall.c -o stream_d -lm
the results changed somewhat to:
Function Rate (MB/s) RMS time Min time Max time
Copy: 69.1109 0.2495 0.2315 0.3199
Scale: 67.9333 0.2508 0.2355 0.3022
Add: 72.3061 0.3343 0.3319 0.3466
Triad: 70.5965 0.3430 0.3400 0.3589
which is rather odd since the compilation with -n32 gave this warning:
ld32: WARNING 84: /usr/lib32/mips3/libm.so is
not used for resolving any symbol.
Naturally, I saw the same effect with -mips3.
I obtained the best results, after some fun experimentation, with:
cc -O3 -n32 -IPA -LNO:opt=1 -TARG:platform=ip22_4k
stream_d.c second_wall.c -o stream_d -lm
That gave:
Function Rate (MB/s) RMS time Min time Max time
Copy: 75.0595 0.2166 0.2132 0.2325
Scale: 71.4723 0.2244 0.2239 0.2252
Add: 74.7831 0.3218 0.3209 0.3257
Triad: 76.0398 0.3164 0.3156 0.3171
Want to add them to the results table? (full output given below).
Note that I didn't modify the source files before execution - are
they ok as is for a system like mine with 1MB L2? Btw, the numbers
jump by 1% if I use CC instead of cc! :D
It's a pity nobody has ever submitted Indy R5K results. I'd like to
see those.
I'm hoping to run the test on an O2 soon (R5KSC/180). Just need a
SCSI cable now to port over the stuff by DAT (awaiting delivery of a
cable from SGI). I think I mentioned before that SGI US is loaning me
an O2 for a few months, so I'll be able to test an O2 with R5K/200
too. They *may* swap it for an R10K/195 model half way through the
loan if one turns up, in which case I'll test that too.
Now to test that Indigo2 I bought. I think I need to change the
source file somewhere because the I2 has 2MB L2, yes? If so, what
should I change?
Cheers! :)
Ian.
SGI Network Admin, University of Central Lancashire, Preston, England, PR1 2HE.
mapleson@gamers.org | Tel: (+44 -0) 1772 893297, Fax: (+44 -0) 1772 892913
"There is no magic, only stuff." - Nakor, "The King's Buccaneer" (R.E. Feist)
Doom Help Service (DHS): http://doomgate.gamers.org/dhs/
SGI/N64/Future Technology: http://www.geocities.com/ResearchTriangle/2321/
BSc Dissertation: http://doomgate.gamers.org/dhs/diss/
CyberSurvival: http://www.007eleven.com/
********************************************************************************
MILAMBER 31# cc -O3 -n32 -IPA -LNO:opt=1 -TARG:platform=ip22_4k stream_d.c second_wall.c -o stream_d -lm
stream_d.c:
second_wall.c:
ld32: WARNING 84: /usr/lib32/mips3/libm.so is not used for resolving any symbol.
MILAMBER 30# ./stream_d
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 25 microseconds.
Each test below will take on the order of 141404 microseconds.
(= 5656 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 75.0595 0.2166 0.2132 0.2325
Scale: 71.4723 0.2244 0.2239 0.2252
Add: 74.7831 0.3218 0.3209 0.3257
Triad: 76.0398 0.3164 0.3156 0.3171
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:07 CDT