Stream Results

From: Brian Whitney <Brian.Whitney@Sun.COM>
Date: Mon Apr 16 2007 - 15:18:35 CDT

Greetings,

Attached are 5 files, each containing a result for the new
machine which is a joint venture between Sun and Fujitsu.
The machines are sold as Sun SPARC Enterprise servers and
Fujitsu SPARC Enterprise servers.

These results are submitted against the Sun SPARC Enterprise server
name.

Thanks,

Brian

Brian Whitney
Sun Microsystems

Dear Dr. McCalpin,

Please find our STREAM results for Sun SPARC Enterprise
M8000 system to update the STREAM Web site.

         System Name: Sun SPARC Enterprise M8000
            CPU Name: SPARC64 VI
             CPU MHz: 2400
      CPU(s) enabled: 32 cores, 16 chips, 2 cores/chip, 2 threads/core
       Primary Cache: 128 KB I + 128 KB D on chip per core
     Secondary Cache: 6 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 64 GB (1GB x 64), 8-way interleaved
    Operating System: Solaris 10 11/06
            Compiler: Sun Studio 11 with patch 2006/11/24
   Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:12
  STREAM Source Code: the v5.6 f90 version with format changes
                      for large arrays.
         OS Settings: Processors of the odd nummer were off-line by
                      the "psradm -f". (Multi-thread disabled)
  Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
                      LD_PRELOAD=mpss.so.1:madv.so.1
                      (Large page and locality settings)

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 200000032
 Offset = 0
 The total memory requirement is 4577 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 32
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 60313.6386 0.0648 0.0531 0.1006
Scale: 60230.8166 0.0609 0.0531 0.0766
Add: 69301.0995 0.0756 0.0693 0.0998
Triad: 69629.9429 0.0755 0.0689 0.1199
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

Dear Dr. McCalpin,

Please find our STREAM results for Sun SPARC Enterprise
M9000 system to update the STREAM Web site.

         System Name: Sun SPARC Enterprise M9000
            CPU Name: SPARC64 VI
             CPU MHz: 2400
      CPU(s) enabled: 64 cores, 32 chips, 2 cores/chip, 2 threads/core
       Primary Cache: 128 KB I + 128 KB D on chip per core
     Secondary Cache: 6 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 128 GB (1GB x 128), 8-way interleaved
    Operating System: Solaris 10 11/06
            Compiler: Sun Studio 11 with patch 2006/11/24
   Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:16
  STREAM Source Code: the v5.6 f90 version with format changes
                      for large arrays.
         OS Settings: Processors of the odd nummer were off-line by
                      the "psradm -f". (Multi-thread disabled)
  Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
                      LD_PRELOAD=mpss.so.1:madv.so.1
                      (Large page and locality settings)

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 400000032
 Offset = 0
 The total memory requirement is 9155 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 64
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 114919.8268 0.0633 0.0557 0.0753
Scale: 114617.5625 0.0626 0.0558 0.0799
Add: 130035.3679 0.0837 0.0738 0.1200
Triad: 134369.1280 0.0854 0.0714 0.1052
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

Dear Dr. McCalpin,

Please find our STREAM results for Sun SPARC Enterprise
M9000 system to update the STREAM Web site.

         System Name: Sun SPARC Enterprise M9000
            CPU Name: SPARC64 VI
             CPU MHz: 2400
      CPU(s) enabled: 128 cores, 64 chips, 2 cores/chip, 2 threads/core
       Primary Cache: 128 KB I + 128 KB D on chip per core
     Secondary Cache: 6 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 1 TB (2GB x 512)
    Operating System: Solaris 10 7/07
            Compiler: Sun Studio 12
   Compilation Flags: -fast -xvector=no -m64 -xopenmp \
                      -xprefetch=latx:3.4 -Qoption cg \
                      -xchip=sparc64vi,-m_arch=sparcfmaf,-fma=fused \
                      -xtypemap=integer:64
         OS Settings: default
  Shell Environments: OMP_NUM_THREADS=128
                      Threads bound to cores with
                      setenv SUNW_MP_PROCBIND " 0 2 ... 252 254 "
                 Run: ppgsz -o heap=4m,stack=4m <stream>

Outputs:
----------------------------------------------------
 Array size = 675000000
 Offset = 524288
 The total memory requirement is 15449 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 224400.9987 0.0492 0.0481 0.0500
Scale: 223113.3641 0.0494 0.0484 0.0504
Add: 224271.3817 0.0734 0.0722 0.0743
Triad: 227059.3074 0.0725 0.0713 0.0735
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

Dear Dr. McCalpin,

Please find our STREAM results for Sun SPARC Enterprise
M4000 system to update the STREAM Web site.

         System Name: Sun SPARC Enterprise M4000
            CPU Name: SPARC64 VI
             CPU MHz: 2150
      CPU(s) enabled: 8 cores, 4 chips, 2 cores/chip, 2 threads/core
       Primary Cache: 128 KB I + 128 KB D on chip per core
     Secondary Cache: 5 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 16 GB (1GB x 16), 8-way interleaved
    Operating System: Solaris 10 11/06
            Compiler: Sun Studio 11 with patch 2006/11/24
   Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:6
  STREAM Source Code: the v5.6 f90 version with format changes
                      for large arrays.
         OS Settings: Processors of the odd nummer were off-line by
                      the "psradm -f". (Multi-thread disabled)
  Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
                      LD_PRELOAD=mpss.so.1:madv.so.1
                      (Large page and locality settings)

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 200000032
 Offset = 0
 The total memory requirement is 4577 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 8
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 12566.3696 0.2565 0.2546 0.2662
Scale: 12539.1797 0.2560 0.2552 0.2570
Add: 12708.7064 0.3782 0.3777 0.3793
Triad: 12709.2760 0.3804 0.3777 0.3869
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

Dear Dr. McCalpin,

Please find our STREAM results for Sun SPARC Enterprise
M5000 system to update the STREAM Web site.

         System Name: Sun SPARC Enterprise M5000
            CPU Name: SPARC64 VI
             CPU MHz: 2150
      CPU(s) enabled: 16 cores, 8 chips, 2 cores/chip, 2 threads/core
       Primary Cache: 128 KB I + 128 KB D on chip per core
     Secondary Cache: 5 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 32 GB (1GB x 32), 8-way interleaved
    Operating System: Solaris 10 11/06
            Compiler: Sun Studio 11 with patch 2006/11/24
   Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:8
  STREAM Source Code: the v5.6 f90 version with format changes
                      for large arrays.
         OS Settings: Processors of the odd nummer were off-line by
                      the "psradm -f". (Multi-thread disabled)
  Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
                      LD_PRELOAD=mpss.so.1:madv.so.1
                      (Large page and locality settings)

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 200000032
 Offset = 0
 The total memory requirement is 4577 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 16
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 24831.2272 0.1298 0.1289 0.1308
Scale: 24791.1867 0.1294 0.1291 0.1310
Add: 25239.2743 0.1909 0.1902 0.1918
Triad: 25292.3212 0.1903 0.1898 0.1909
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
Received on Mon Apr 16 18:52:57 2007

This archive was generated by hypermail 2.1.8 : Tue Apr 17 2007 - 07:28:09 CDT