STREAM results (Fujitsu SPARC M10-4S, 512 cores)

From: Akihiro SENOO <senoo.akihiro@jp.fujitsu.com>
Date: Wed Sep 11 2013 - 19:01:30 CDT

Dear Dr. McCalpin,

We have measured STREAM benchmark on "Fujitsu SPARC M10-4S".
Please update the STREAM Web site.

         System Name: Fujitsu SPARC M10-4S
            CPU Name: SPARC64 X
             CPU MHz: 3000
      CPU(s) enabled: 512 cores, 32 chips, 16 cores/chip, 2 threads/core
       Primary Cache: 64 KB I + 64 KB D on chip per core
     Secondary Cache: 24 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 4 TB (256 x 16 GB 2Rx4 PC3L-12800R-11, ECC)
    Operating System: Solaris 11.1.8.4.0
            Compiler: C/C++/Fortran: Version 12.3 of Oracle Solaris
                      Studio, 1/13 Platform Specific Enhancement
   Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64x
                      -fma=fused -xipo=2 -xpagesize=4M -xlinkopt
                      -xvector -xprefetch_level=3 -xprefetch=latx:8.0
                      -Qoption cg -Qlp-dl=1,-Qms_pipe-prefdl=1
                      -xtypemap=integer:64
  STREAM Source Code: Fortran version (v5.6) with format changes
                      for large arrays.
         OS Settings: (/etc/system parameters) lpg_alloc_prefer=1
   Shell Environment: OMP_NUM_THREADS=1024
                      SUNW_MP_PROCBIND="0-1023"
                      SUNW_MP_THR_IDLE=spin
                      LD_PRELOAD=madv.so.1
                      LD_PRELOAD_64=madv.so.1
                      MADV=access_lwp
                 Run: ppgsz -o heap=256M,stack=256M,anon=256M <stream>

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 20480000000
 Offset = 1024
 The total memory requirement is 468750 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 1024
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
(snip)
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 1793975.2239 0.1829 0.1827 0.1842
Scale: 1832375.2328 0.1793 0.1788 0.1805
Add: 2070682.8827 0.2377 0.2374 0.2391
Triad: 2071808.6779 0.2381 0.2372 0.2422
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

--
Akihiro SENOO
PA PROJECT
NEXT GENERATION TECHNICAL COMPUTING UNIT
FUJITSU Limited
senoo.akihiro@jp.fujitsu.com

Received on Wed Sep 11 20:10:43 2013

This archive was generated by hypermail 2.1.8 : Thu Sep 12 2013 - 09:03:10 CDT