Greetings,
Attached are 5 files, each containing a result for the new
machine which is a joint venture between Sun and Fujitsu.
The machines are sold as Sun SPARC Enterprise servers and
Fujitsu SPARC Enterprise servers.
These results are submitted against the Sun SPARC Enterprise server
name.
Thanks,
Brian
Brian Whitney
Sun Microsystems
Dear Dr. McCalpin,
Please find our STREAM results for Sun SPARC Enterprise
M8000 system to update the STREAM Web site.
System Name: Sun SPARC Enterprise M8000
CPU Name: SPARC64 VI
CPU MHz: 2400
CPU(s) enabled: 32 cores, 16 chips, 2 cores/chip, 2 threads/core
Primary Cache: 128 KB I + 128 KB D on chip per core
Secondary Cache: 6 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 64 GB (1GB x 64), 8-way interleaved
Operating System: Solaris 10 11/06
Compiler: Sun Studio 11 with patch 2006/11/24
Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:12
STREAM Source Code: the v5.6 f90 version with format changes
for large arrays.
OS Settings: Processors of the odd nummer were off-line by
the "psradm -f". (Multi-thread disabled)
Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
LD_PRELOAD=mpss.so.1:madv.so.1
(Large page and locality settings)
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 200000032
Offset = 0
The total memory requirement is 4577 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 32
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 60313.6386 0.0648 0.0531 0.1006
Scale: 60230.8166 0.0609 0.0531 0.0766
Add: 69301.0995 0.0756 0.0693 0.0998
Triad: 69629.9429 0.0755 0.0689 0.1199
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Dear Dr. McCalpin,
Please find our STREAM results for Sun SPARC Enterprise
M9000 system to update the STREAM Web site.
System Name: Sun SPARC Enterprise M9000
CPU Name: SPARC64 VI
CPU MHz: 2400
CPU(s) enabled: 64 cores, 32 chips, 2 cores/chip, 2 threads/core
Primary Cache: 128 KB I + 128 KB D on chip per core
Secondary Cache: 6 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 128 GB (1GB x 128), 8-way interleaved
Operating System: Solaris 10 11/06
Compiler: Sun Studio 11 with patch 2006/11/24
Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:16
STREAM Source Code: the v5.6 f90 version with format changes
for large arrays.
OS Settings: Processors of the odd nummer were off-line by
the "psradm -f". (Multi-thread disabled)
Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
LD_PRELOAD=mpss.so.1:madv.so.1
(Large page and locality settings)
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 400000032
Offset = 0
The total memory requirement is 9155 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 64
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 114919.8268 0.0633 0.0557 0.0753
Scale: 114617.5625 0.0626 0.0558 0.0799
Add: 130035.3679 0.0837 0.0738 0.1200
Triad: 134369.1280 0.0854 0.0714 0.1052
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Dear Dr. McCalpin,
Please find our STREAM results for Sun SPARC Enterprise
M9000 system to update the STREAM Web site.
System Name: Sun SPARC Enterprise M9000
CPU Name: SPARC64 VI
CPU MHz: 2400
CPU(s) enabled: 128 cores, 64 chips, 2 cores/chip, 2 threads/core
Primary Cache: 128 KB I + 128 KB D on chip per core
Secondary Cache: 6 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 1 TB (2GB x 512)
Operating System: Solaris 10 7/07
Compiler: Sun Studio 12
Compilation Flags: -fast -xvector=no -m64 -xopenmp \
-xprefetch=latx:3.4 -Qoption cg \
-xchip=sparc64vi,-m_arch=sparcfmaf,-fma=fused \
-xtypemap=integer:64
OS Settings: default
Shell Environments: OMP_NUM_THREADS=128
Threads bound to cores with
setenv SUNW_MP_PROCBIND " 0 2 ... 252 254 "
Run: ppgsz -o heap=4m,stack=4m <stream>
Outputs:
----------------------------------------------------
Array size = 675000000
Offset = 524288
The total memory requirement is 15449 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 224400.9987 0.0492 0.0481 0.0500
Scale: 223113.3641 0.0494 0.0484 0.0504
Add: 224271.3817 0.0734 0.0722 0.0743
Triad: 227059.3074 0.0725 0.0713 0.0735
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Dear Dr. McCalpin,
Please find our STREAM results for Sun SPARC Enterprise
M4000 system to update the STREAM Web site.
System Name: Sun SPARC Enterprise M4000
CPU Name: SPARC64 VI
CPU MHz: 2150
CPU(s) enabled: 8 cores, 4 chips, 2 cores/chip, 2 threads/core
Primary Cache: 128 KB I + 128 KB D on chip per core
Secondary Cache: 5 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 16 GB (1GB x 16), 8-way interleaved
Operating System: Solaris 10 11/06
Compiler: Sun Studio 11 with patch 2006/11/24
Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:6
STREAM Source Code: the v5.6 f90 version with format changes
for large arrays.
OS Settings: Processors of the odd nummer were off-line by
the "psradm -f". (Multi-thread disabled)
Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
LD_PRELOAD=mpss.so.1:madv.so.1
(Large page and locality settings)
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 200000032
Offset = 0
The total memory requirement is 4577 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 8
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 12566.3696 0.2565 0.2546 0.2662
Scale: 12539.1797 0.2560 0.2552 0.2570
Add: 12708.7064 0.3782 0.3777 0.3793
Triad: 12709.2760 0.3804 0.3777 0.3869
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Dear Dr. McCalpin,
Please find our STREAM results for Sun SPARC Enterprise
M5000 system to update the STREAM Web site.
System Name: Sun SPARC Enterprise M5000
CPU Name: SPARC64 VI
CPU MHz: 2150
CPU(s) enabled: 16 cores, 8 chips, 2 cores/chip, 2 threads/core
Primary Cache: 128 KB I + 128 KB D on chip per core
Secondary Cache: 5 MB I+D on chip per chip
L3 Cache: None
Other Cache: None
Memory: 32 GB (1GB x 32), 8-way interleaved
Operating System: Solaris 10 11/06
Compiler: Sun Studio 11 with patch 2006/11/24
Compilation Flags: -xarch=v9b -xO5 -xopenmp -xprefetch=auto,latx:8
STREAM Source Code: the v5.6 f90 version with format changes
for large arrays.
OS Settings: Processors of the odd nummer were off-line by
the "psradm -f". (Multi-thread disabled)
Shell Environments: MPSSHEAP=4MB MPSSSTACK=4MB MADV=access_lwp
LD_PRELOAD=mpss.so.1:madv.so.1
(Large page and locality settings)
Outputs:
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
----------------------------------------------
STREAM Version $Revision: 5.6 $
----------------------------------------------
Array size = 200000032
Offset = 0
The total memory requirement is 4577 MB
You are running each test 10 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------
Number of Threads = 16
----------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 24831.2272 0.1298 0.1289 0.1308
Scale: 24791.1867 0.1294 0.1291 0.1310
Add: 25239.2743 0.1909 0.1902 0.1918
Triad: 25292.3212 0.1903 0.1898 0.1909
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Received on Mon Apr 16 18:52:57 2007
This archive was generated by hypermail 2.1.8 : Tue Apr 17 2007 - 07:28:09 CDT