Dear Dr. McCalpin,
We have measured STREAM benchmark on "Fujitsu SPARC M10-4S".
Please update the STREAM Web site.
System Name: Fujitsu SPARC M10-4S
CPU Name: SPARC64 X
CPU MHz: 3000
CPU(s) enabled: 512 cores, 32 chips, 16 cores/chip, 2 threads/core
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 24 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 4 TB (256 x 16 GB 2Rx4 PC3L-12800R-11, ECC)
Operating System: Solaris 11.1.8.4.0
Compiler: C/C++/Fortran: Version 12.3 of Oracle Solaris
Studio, 1/13 Platform Specific Enhancement
Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64x
-fma=fused -xipo=2 -xpagesize=4M -xlinkopt
-xvector -xprefetch_level=3 -xprefetch=latx:8.0
-Qoption cg -Qlp-dl=1,-Qms_pipe-prefdl=1
-xtypemap=integer:64
STREAM Source Code: Fortran version (v5.6) with format changes
for large arrays.
OS Settings: (/etc/system parameters) lpg_alloc_prefer=1
Shell Environment: OMP_NUM_THREADS=1024
SUNW_MP_PROCBIND="0-1023"
SUNW_MP_THR_IDLE=spin
LD_PRELOAD=madv.so.1
LD_PRELOAD_64=madv.so.1
MADV=access_lwp
Run: ppgsz -o heap=256M,stack=256M,anon=256M <stream>
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 20480000000
Offset = 1024
The total memory requirement is 468750 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 1024
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
(snip)
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 1793975.2239 0.1829 0.1827 0.1842
Scale: 1832375.2328 0.1793 0.1788 0.1805
Add: 2070682.8827 0.2377 0.2374 0.2391
Triad: 2071808.6779 0.2381 0.2372 0.2422
----------------------------------------------------
Solution Validates!
----------------------------------------------------
--
Akihiro SENOO
PA PROJECT
NEXT GENERATION TECHNICAL COMPUTING UNIT
FUJITSU Limited
senoo.akihiro@jp.fujitsu.com
Received on Wed Sep 11 20:10:43 2013
This archive was generated by hypermail 2.1.8 : Thu Sep 12 2013 - 09:03:10 CDT