From: Craig Armour (Craig.Armour@ausbit.com.au)
Date: Tue Sep 17 2002 - 22:50:04 CDT
Hi,
I have some results for stream on the G4 450mhz Dual proc system (
standard apple ) uname of my box is as follows
Linux formaldehyde 2.4.19 #11 SMP Sun Sep 8 09:14:19 EST 2002 ppc unknown
stats:
2xG4@450mhz
128mb ram
The results I have are better than the ones currently listed, but not
necssarily mind blowing. I've also attached the source to the pthread
code used to obtain the mp information.
My code isn't the best but not sure for the crappy performance compared
to the other stats on your page. Perhaps the unified l2 cache doesn't
go well with the multi threading and you get a bit more cache thrashing
than prefered *shrug*. Also, gcc probably isn't the best compiler but
I'd be interested in comparing the code the other guys got from their
compilers to the code gcc produces
Cheers
Craig
craig@formaldehyde:~/src$ ./stream
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 400000, Offset = 0
Total memory required = 9.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 11773 microseconds.
(= 11773 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 298.4938 0.0217 0.0214 0.0226
Scale: 297.9653 0.0218 0.0215 0.0219
Add: 321.1885 0.0299 0.0299 0.0301
Triad: 327.1542 0.0297 0.0293 0.0298
craig@formaldehyde:~/src$ /export/local/bin/gcc -O3 -funroll-loops -fprefetch-loop-arrays -mcpu=604 -lm -o stream second_wall.c stream_d.c
craig@formaldehyde:~/src$ /export/local/bin/gcc -O3 -funroll-loops -fprefetch-loop-arrays -mcpu=604 -pthread -lm -o stream_mp second_wall.c stream.c
craig@formaldehyde:~/src$ /usr/bin/time ./stream_mp
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 400000, Offset = 0
Total memory required = 9.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 2 microseconds.
Each test below will take on the order of 11355 microseconds.
(= 5677 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 338.8759 0.0194 0.0189 0.0212
Scale: 325.3191 0.0201 0.0197 0.0212
Add: 341.7224 0.0284 0.0281 0.0286
Triad: 340.4137 0.0283 0.0282 0.0285
1.89user 0.09system 0:01.05elapsed 188%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (179major+2438minor)pagefaults 0swaps
This archive was generated by hypermail 2.1.4 : Fri Nov 08 2002 - 13:37:15 CST