John,
Thanks for the clear description of c = a + b.
I do not understand how to predict the SGI performance. (It doesn't
matter other than my curiousity!) On the Crimson,
cache miss penalties (both) were quoted as 110 internal cycles with no write
back, and 119 with write back. These seem consistent with some
measurements. Apparently these also seem to apply to the Challenge (too
bad). The latency is so large, it wouldn't matter if the bandwidth were
infinite! :-) The current Challenge is 100 MHz internal, 50 MHz external,
47.6 MHz bus. Cache lines are 16 bytes primary, 128 bytes secondary, and the
bus (256 bits wide) can deliver data on 4 out of 5 cycles on one transaction,
for (256/8 bytes)*(47.6MHz)*(4/5) = 1.218 GB/s. (The secondary cache is 1MB).
Next clock goes to 150 MHz, 75MHz, bus still at 47.6 MHz. I have asked SGI
for an explanation of how the different parts of the system affect the latency,
and how it should change in the future. If I hear from them, I'll send you
the information.
I believe the latency for the IBM 580 is about 15.5 cycles from other
tests I have run, and a TLB miss penalty of 38 cycles.
I have run streams on several machines here (included below). The
Challenge runs better when I compile on the Crimson (? - I'll track this down).
If you would like me to run some small benchmarks for you on the 580 (or
Challenge), let me know.
Thanks again,
Mark
stream_d
SGI Crimson R4000 os 4.0.5 ftn 3.10
f77 -O2 -mips2
--------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
Timing calibration ; time = 91.99999682605267 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 61.5385 0.2700 0.2600 0.2800
Scaling : 59.2594 0.2750 0.2700 0.2800
Summing : 58.5367 0.4140 0.4100 0.4200
SAXPYing : 59.9999 0.4080 0.4000 0.4100
stream_d
SGI Challenge R4400 os 5.0 ftn 3.10
f77 -O2 -mips2
compiled on Challenge
--------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
Timing calibration ; time = 122.9999981820583 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 48.4847 0.3410 0.3300 0.3500
Scaling : 47.0589 0.3451 0.3400 0.3600
Summing : 50.0000 0.4921 0.4800 0.5000
SAXPYing : 54.5456 0.4451 0.4400 0.4600
stream_d
SGI Challenge R4400 os 5.0 ftn 3.10
f77 -O2 -mips2
compiled on Crimson
--------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
Timing calibration ; time = 117.0000027865171 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 57.1429 0.2891 0.2800 0.3000
Scaling : 55.1724 0.3001 0.2900 0.3100
Summing : 53.3334 0.4641 0.4500 0.4800
SAXPYing : 54.5456 0.4450 0.4400 0.4500
stream_s
SGI Crimson R4000 os 4.0.5 ftn 3.10
f77 -O2 -mips2
--------------------------------------
Single precision appears to have 7 digits of accuracy
Assuming 4 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 47.00000 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 57.1429 0.1471 0.1400 0.1500
Scaling : 53.3333 0.1571 0.1500 0.1700
Summing : 54.5455 0.2270 0.2200 0.2300
SAXPYing : 52.1739 0.2371 0.2300 0.2500
stream_s
SGI Challenge R4400 os 5.0 ftn 3.10
f77 -O2 -mips2
compiled on Challenge
--------------------------------------
Single precision appears to have 7 digits of accuracy
Assuming 4 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 61.00000 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 40.0000 0.2061 0.2000 0.2100
Scaling : 36.3636 0.2281 0.2200 0.2400
Summing : 41.3793 0.2981 0.2900 0.3100
SAXPYing : 48.0000 0.2550 0.2500 0.2600
stream_s
SGI Challenge R4400 os 5.0 ftn 3.10
f77 -O2 -mips2
compiled on Crimson
--------------------------------------
Single precision appears to have 7 digits of accuracy
Assuming 4 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 62.00000 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 53.3333 0.1591 0.1500 0.1700
Scaling : 47.0589 0.1741 0.1700 0.1900
Summing : 50.0000 0.2440 0.2400 0.2500
SAXPYing : 48.0000 0.2571 0.2500 0.2700
stream_d
SUN SS2 os 4.1.2 f77 SC1.0
f77 -O2 -cg89
--------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
Timing calibration ; time = 382.99998268485 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 20.5129 0.8168 0.7800 1.0100
Scaling : 18.6047 0.8640 0.8600 0.8900
Summing : 21.8182 1.1482 1.1000 1.5000
SAXPYing : 22.4299 1.1078 1.0700 1.3400
stream_d
SUN SS10/41 os 4.1.3 f77 SC2.0.1
f77 -O2 -cg92
--------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
Timing calibration ; time = 238.33334110677 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 34.2857 0.4868 0.4667 0.5000
Scaling : 38.4001 0.4267 0.4167 0.4333
Summing : 36.9231 0.6534 0.6500 0.6667
SAXPYing : 37.8947 0.6467 0.6333 0.6500
stream_s
SUN SS2 os 4.1.2 f77 SC1.0
f77 -O2 -cg89
--------------------------------------
Single precision appears to have 7 digits of accuracy
Assuming 4 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 191.000 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 18.6046 0.4411 0.4300 0.4500
Scaling : 17.7778 0.4640 0.4500 0.4700
Summing : 19.3548 0.6270 0.6200 0.6300
SAXPYing : 20.6897 0.6021 0.5800 0.6100
stream_s
SUN SS10/41 os 4.1.3 f77 SC2.0.1
f77 -O2 -cg92
--------------------------------------
Single precision appears to have 7 digits of accuracy
Assuming 4 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 131.667 hundredths of a second
Increase the size of the arrays if this is <30
and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 36.9232 0.2302 0.2167 0.2500
Scaling : 34.2857 0.2418 0.2333 0.2500
Summing : 37.8947 0.3301 0.3167 0.3333
SAXPYing : 36.0000 0.3401 0.3333 0.3500
stream_s
RS/6000-580 xlf 2.03
f77 -O
Test #1 Failed = picalc=piexact
Apparently Single=Double Precision
Proceeding to Test #2
--------------------------------------
Single precision appears to have 7 digits of accuracy
Assuming 4 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 67.00000000 hundredths of a second
Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 177.7778 .1880 .1800 .1900
Scaling : 118.5185 .2841 .2700 .2900
Summing : 137.1429 .3551 .3500 .3800
SAXPYing : 141.1765 .3551 .3400 .3700
--------------------------------------
stream_d
RS/6000-580 xlf 2.03
f77 -O
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
Timing calibration ; time = 133.000000000000000 hundredths of a second
Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 266.6667 .2512 .2400 .2700
Scaling : 246.1538 .2661 .2600 .2800
Summing : 234.1463 .4261 .4100 .4500
SAXPYing : 228.5714 .4352 .4200 .4600
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT