From: Rico Pajarola (rico.pajarola@nexus-ag.com)
Date: Thu Oct 21 2004 - 11:45:07 CDT
Hi John,
just thought i'd share those numbers with you (I noticed there are no
halfways current Sun numbers on http://www.cs.virginia.edu/stream/).
The compiler was Sun Studio 8 (Sun C 5.5 2003/03/12).
One question though: I can't figure out (or find on the webpage) how to
convert these numbers to the equivalent MFLOPS and the relative cost...
The machine is:
Sun Fire V210, 1x UltraSPARC-IIIi 1GHz, 1GB Memory
SunOS test 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-V210
Compiled with
/opt/SUNWspro/bin/cc -fast -xarch=v8plusa -o bin/stream_sunwspro
stream_d.c second_wall.c -lm
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 214245 microseconds.
(= 214245 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 809.5735 0.4120 0.3953 0.4446
Scale: 958.0582 0.3423 0.3340 0.3786
Add: 839.4089 0.5893 0.5718 0.6175
Triad: 859.7407 0.5671 0.5583 0.6053
Compiled with (64bit)
/opt/SUNWspro/bin/cc -fast -xarch=v9a -xcache=native -o
bin/stream_sunwspro stream_d.c second_wall.c -lm
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 211488 microseconds.
(= 211488 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 798.8637 0.4050 0.4006 0.4249
Scale: 951.8596 0.3426 0.3362 0.3651
Add: 852.3875 0.5755 0.5631 0.5948
Triad: 905.0286 0.5397 0.5304 0.5560
regards
Rico
-- Rico Pajarola mail: rico.pajarola@nexus-ag.com Software Engineer web: http://www.nexus-ag.com/ Nexus Telecom AG tel: +41 76 355 66 51 Foerrlibuckstrasse 62 fax: +41 1 355 66 12 CH-8037 Zurich
This archive was generated by hypermail 2.1.4 : Wed Nov 03 2004 - 08:05:49 CST