From: Choon Peng Chng (choonpeng@mac.com)
Date: Sat Jan 18 2003 - 02:09:07 CST
Hi,
I've compiled STREAM from source:
gcc stream_d.c second_wall.c -o stream_d_wall -lm -O3 -mcpu=7450
-funroll-all-loops
where gcc is Apple's GCC based on version 3.1.
Machine specs:
Apple Titanium PowerBook G4
PowerPC G4 7450 @ 1GHz
256KB L2 cache
1MB L3 cache
133MHz system bus
512MB SDRAM
Mac OS X 10.2.3
Darwin kernel version 6.3
Results:
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 400000, Offset = 0
Total memory required = 9.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 3 microseconds.
Each test below will take on the order of 133294 microseconds.
(= 44431 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 460.6278 0.0147 0.0139 0.0202
Scale: 463.5671 0.0141 0.0138 0.0155
Add: 465.4089 0.0266 0.0206 0.0565
Triad: 471.4218 0.0205 0.0204 0.0206
Strangely, this is lower than the submitted results for the PowerMac
867MHz?
Also, if I compile with second_cpu.c instead, I got better results but
with:
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 10000 microseconds.
(= 1 clock ticks)
I notice some submitted results reports this. Isn't this wrong?
regards,
choon-peng
This archive was generated by hypermail 2.1.4 : Wed Jan 22 2003 - 08:24:39 CST