Dear Dr. McCalpin,
We have measured STREAM benchmark on "Fujitsu SPARC M10-4S".
Please update the STREAM Web site.
System Name: Fujitsu SPARC M10-4S
CPU Name: SPARC64 X
CPU MHz: 3000
CPU(s) enabled: 64 cores, 4 chips, 16 cores/chip, 2 threads/core
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 24 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 512 GB (32 x 16 GB 2Rx4 PC3L-12800R-CL11, ECC,
running at 1600 MHz)
Operating System: Solaris 11.1.6.4.0
Compiler: C/C++/Fortran: Version 12.3 of Oracle Solaris
Studio, 1/13 Platform Specific Enhancement
Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64x
-fma=fused -xipo=2 -xpagesize=4M -xlinkopt
-xvector -xprefetch_level=3 -xprefetch=latx:8.0
-Qoption cg -Qlp-dl=1,-Qms_pipe-prefdl=1
-xtypemap=integer:64
STREAM Source Code: Fortran version (v5.6) with format changes
for large arrays.
OS Settings: (/etc/system parameters) lpg_alloc_prefer=1
Shell Environment: OMP_NUM_THREADS=128
SUNW_MP_PROCBIND="0-127"
SUNW_MP_THR_IDLE=spin
LD_PRELOAD=madv.so.1
LD_PRELOAD_64=madv.so.1
MADV=access_lwp
Run: ppgsz -o heap=256M,stack=256M,anon=256M <stream>
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 2560000000
Offset = 1024
The total memory requirement is 58593 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 128
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
(snip)
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 224696.8163 0.1827 0.1823 0.1846
Scale: 229333.0893 0.1789 0.1786 0.1806
Add: 259073.1983 0.2385 0.2372 0.2426
Triad: 259312.7784 0.2379 0.2369 0.2403
----------------------------------------------------
Solution Validates!
----------------------------------------------------
--
Akihiro SENOO
PA PROJECT
NEXT GENERATION TECHNICAL COMPUTING UNIT
FUJITSU Limited
senoo.akihiro@jp.fujitsu.com
Received on Mon May 20 19:39:40 2013
This archive was generated by hypermail 2.1.8 : Thu May 30 2013 - 13:50:29 CDT