RE: funny BW on 820-based system

From: Cabot, Mason (mason.cabot@intel.com)
Date: Wed Oct 27 1999 - 16:03:50 CDT


John,

I don't think that there are any issues with the board they are using. There
is a newer BIOS (P04) available for the CC820 board that fixes some things,
but I don't think that these fixes are performance related.

I think the main problem here is that they are using the DOS version of
STREAM rather than recompiling to fit the hardware/OS. For the systems that
are reviewed in this article and the version of STREAM for NT posted on the
web site, I see:

*****i820 CuMine 733/133*****
Copy: 441.9930
Scale: 438.3188
Add: 553.0855
Triad: 544.3054

*****i820 Katmai 600/133*****
Copy: 379.1829
Scale: 386.3530
Add: 448.4531
Triad: 448.3660

*****BX Katmai 600/100*****
Copy: 342.2239
Scale: 340.2257
Add: 411.9562
Triad: 409.2117

I've attached the full output from these tests. I haven't sent you any of my
Athlon tests because I need to recompile to optimize for their FPU. Do you
know if the latest versions of GCC optimize for K7?

thanks,
Mason

-----Original Message-----
From: John D. McCalpin, Ph.D. [mailto:mccalpin@austin.ibm.com]
Sent: Wednesday, October 27, 1999 11:46 AM
To: mason.cabot@intel.com
Subject: funny BW on 820-based system

Hi Mason

A hardware review at:
        http://www.aceshardware.com/Spades/read.php?article_id=84
shows funny results for the 733EB+820 system with the Intel motherboard.
Specifically, the 733EB+820 runs 30% slower than the PIII-600/440BX
and more than 30% slower than the 600B+820 on a different vendor's
motherboard.

Are there any config issues that the authors need to be aware of?

-- 
John D. McCalpin, Ph.D.           mccalpin@austin.ibm.com
Senior Scientist                  IBM Future Systems
    "I am willing to make mistakes as long as
     someone else is willing to learn from them."

------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 999936 Total memory required = 24.9 MB. Each test is run 50 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 37238 microseconds. (= 37238 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 342.2239 0.0470 0.0468 0.0519 Scale: 340.2257 0.0472 0.0470 0.0495 Add: 411.9562 0.0585 0.0583 0.0588 Triad: 409.2117 0.0589 0.0586 0.0591

------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 999936 Total memory required = 24.9 MB. Each test is run 50 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 31814 microseconds. (= 31814 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 379.1829 0.0424 0.0422 0.0475 Scale: 386.3530 0.0416 0.0414 0.0443 Add: 448.4531 0.0536 0.0535 0.0540 Triad: 448.3660 0.0536 0.0535 0.0540

------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 999936 Total memory required = 24.9 MB. Each test is run 50 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 23162 microseconds. (= 23162 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 441.9930 0.0364 0.0362 0.0394 Scale: 438.3188 0.0366 0.0365 0.0382 Add: 553.0855 0.0435 0.0434 0.0437 Triad: 544.3054 0.0442 0.0441 0.0445



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT