tuned STREAM results (Fujitsu SPARC M12-2, 12 cores)
Dear Dr. McCalpin,
We have measured STREAM benchmark("tuned") on Fujitsu SPARC M12-2.
Please publish this score on the STREAM Web site on April 4, 2017 or later.
System Name: Fujitsu SPARC M12-2
CPU Name: SPARC64 XII
CPU MHz: 3900
CPU(s) enabled: 12 cores, 1 chip, 12 cores/chip, 8 threads/core
Primary Cache: 64 KB I + 64 KB D on chip per core
Secondary Cache: 512 KB I+D on chip per core
L3 Cache: 32 MB I+D on chip per chip
Other Cache: None
Memory: 512 GB (16 x 32 GB 2Rx4 PC4-2400T-R, ECC)
Operating System: Oracle Solaris 11.3 a next SRU
Compiler: Version 12.6 of Oracle Developer Studio
Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64xplus -xipo=2
$B!!!!!!!!!!!!!!!!!!!!!!(B -xpagesize=4M -xlinkopt -xvector -xprefetch_level=3
$B!!!!!!!!!!!!!!!!!!!!!!(B -xprefetch=latx:8.0
STREAM Source Code: The following tuning is applied to Fortran version(v5.6)
Tuning: xfill (stxa with ASI_XFILL_P(0xf2)) instructions are
used to reduce memory read transactions.
OS Settings: (/etc/system parameters)
autoup=86400$B!!(Bdoiflush=0$B!!(Bdopageflush=0
zfs:zfs_arc_max=1073741824
Shell Environment: OMP_NUM_THREADS=24
SUNW_MP_PROCBIND="1 5 9 13 17 21 25 29 33 37 41 45
$B!!!!!!!!!!!!!!!!!!!!!!!!(B 49 53 57 61 65 69 73 77 81 85 89 93"
Run: <stream>
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 2000000000
Offset = 48
The total memory requirement is 720 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 24
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 122797.3307 0.2608 0.2606 0.2610
Scale: 122306.8732 0.2618 0.2616 0.2619
Add: 127416.6518 0.3779 0.3767 0.3801
Triad: 127750.2441 0.3775 0.3757 0.3801
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Received on Sat Apr 01 2017 - 16:15:34 CDT
This archive was generated by hypermail 2.3.0
: Mon Apr 03 2017 - 19:20:37 CDT