Dear Dr. McCalpin,
We are ready to publish the results of Fujitsu SPARC M10-4S stream score.
Please update the STREAM web site.
System Name: Fujitsu SPARC M10-4S
CPU Name: SPARC64 X
CPU MHz: 3000
CPU(s) enabled: 1024 cores, 64 chips, 16 cores/chip, 2 threads/core
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 24 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 8320 GB (520 x 16 GB)
chip#0: 256 GB (16 x 16 GB 2Rx4 PC3L-12800R-CL11,
ECC, running at 1333 MHz)
chip#1-#63: 8064 GB (504 x 16 GB 2Rx4
PC3L-12800R-CL11, ECC, running at 1600 MHz)
Operating System: Oracle Solaris 11.1
Compiler: C/C++: Version 12.3 of Oracle Solaris Studio,
1/13 Platform Specific Enhancement
Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64x
-fma=fused -xipo=2 -xpagesize=4M -xlinkopt
-xvector -xprefetch_level=3 -xprefetch=latx:8.0
-Qoption cg -Qlp-dl=1,-Qms_pipe-prefdl=1
-xtypemap=integer:64
STREAM Source Code: Fortran version (v5.6) with format changes
for large arrays.
OS Settings: (/etc/system parameters) lpg_alloc_prefer=1
(change processor status) psradm -f 0-31
Shell Environment: OMP_NUM_THREADS=2016
SUNW_MP_PROCBIND="0-2015"
SUNW_MP_THR_IDLE=spin
LD_PRELOAD=madv.so.1
LD_PRELOAD_64=madv.so.1
MADV=access_lwp
Run: ppgsz -o heap=256M,stack=256M,anon=256M <stream>
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 40960000000
Offset = 1024
The total memory requirement is 937500 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 2016
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
(snip)
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 3474998.0651 4.2926 0.1886 10.1007
Scale: 3500799.8989 3.1858 0.1872 9.8144
Add: 3956102.3998 3.5077 0.2485 10.7638
Triad: 4002703.2472 2.4675 0.2456 10.1980
----------------------------------------------------
Solution Validates!
----------------------------------------------------
--
Akihiro SENOO
PA PROJECT
NEXT GENERATION TECHNICAL COMPUTING UNIT
FUJITSU Limited
senoo.akihiro@jp.fujitsu.com
Received on Tue Mar 26 19:59:08 2013
This archive was generated by hypermail 2.1.8 : Wed Mar 27 2013 - 10:00:49 CDT