[stream] Sun Fire E25K STREAM Results

From: Brian Whitney <Brian.Whitney@Sun.COM>
Date: Fri Feb 20 2004 - 10:14:01 CST

STREAM Editor's Note 2004-06-21: After much deliberation, I have decided to relabel these results. What Sun calls a "72 processor" system actually contains 72 dual-processor chips. It appears to the user to be a 144-cpu system, and the STREAM benchmark was run on that system using 144 OpenMP threads. In order to maintain correlation with the Sun nomenclature, I have modified the names to include the number of UltraSparc IV chips included. There is no way to avoid confusion when commonly used terms are arbitrarily redefined as Sun has done here, but I think that this compromise is significantly more clear than the original publication.

Note that this is a bit more than a relabel, since the calculation of "Balance" was incorrect under the original (incorrect) processor count. The "Balance" of the machine is now correctly reported as twice the value that was in the tables originally.

Note that the Cray X1 employs a similar redefinition of the term "processor". However in Cray's case the compiler and operating system are capable of operating in a mode in which what most of us would call a group of four processors acts as a single entity from the user point of view. Cray calls this "MSP" (Multi-Streaming Processor) mode, as opposed to the more traditional "SSP" (Single-Streaming Processor) mode, which is also supported. I will probably clean that up one of these days as well.

Dr. McCalpin,

Sun Microsystems, Inc. would like to submit the following results:

Platform:
  Sun Fire E25K
  72 UltraSPARC s400 1050 MHz processors (UltraSPARC IV family)
  288.00 GB memory
  Solaris 9 04/04
  Sun Studio 9 EA2 compilers

Results:
  ncpus Copy Scale Add Triad
  ----- ------- ------- ------- -------
   72 68184.1 67462.7 75402.7 76097.8
   64 60412.3 59495.2 66334.7 66506.4
   48 44285.8 43229.0 48548.7 48806.3
   36 32221.8 31501.6 35419.3 35792.9
   32 28074.0 27606.2 31248.8 31306.0
   24 21156.6 20601.5 23097.2 23064.0
   12 10729.7 10353.5 11653.0 11631.1

  (Details below)


Suggested "Machine ID" would be something like Sun_Fire_E25K-1050

Thank you for your time,

Brian

Brian Whitney Sun Microsystems, Inc.
Strategic Applications Engineering 3295 NW 211th Terrace
office: (503) 617-3304 Hillsboro, OR 97124
fax: (503) 617-3363 e-mail: Brian.Whitney@sun.com

========================================================================
72 UltraSPARC s400 1050MHz processors

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 905969664
 Offset = 0
 The total memory requirement is 20736 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 68184.0912 0.2129 0.2126 0.2138
Scale: 67462.6949 0.2154 0.2149 0.2167
Add: 75402.6881 0.3157 0.2884 0.4348
Triad: 76097.8050 0.2990 0.2857 0.3298
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
========================================================================
64 UltraSPARC s400 1050MHz processors

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 905969664
 Offset = 0
 The total memory requirement is 20736 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 60412.3186 0.2407 0.2399 0.2422
Scale: 59495.1747 0.2445 0.2436 0.2466
Add: 66334.7132 0.3282 0.3278 0.3293
Triad: 66506.3955 0.3275 0.3269 0.3277
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
=======================================================================
48 UltraSPARC s400 1050MHz processors

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 603979776
 Offset = 0
 The total memory requirement is 13824 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 44285.8917 0.2197 0.2182 0.2214
Scale: 43229.0231 0.2246 0.2235 0.2268
Add: 48548.6532 0.3015 0.2986 0.3179
Triad: 48806.2619 0.2988 0.2970 0.3017
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
========================================================================
36 UltraSPARC s400 1050MHz processors
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 452984832
 Offset = 0
 The total memory requirement is 10368 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 32221.8627 0.2285 0.2249 0.2332
Scale: 31501.6015 0.2329 0.2301 0.2394
Add: 35419.2982 0.3090 0.3069 0.3113
Triad: 35792.9348 0.3077 0.3037 0.3110
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
========================================================================
32 UltraSPARC s400 1050MHz processors

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 452984832
 Offset = 0
 The total memory requirement is 10368 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 28074.0113 0.2615 0.2582 0.2652
Scale: 27606.1864 0.2667 0.2625 0.2774
Add: 31248.7853 0.3514 0.3479 0.3558
Triad: 31306.0028 0.3508 0.3473 0.3535
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
========================================================================
24 UltraSPARC s400 1050MHz processors

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 301989888
 Offset = 0
 The total memory requirement is 6912 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 21156.6389 0.2338 0.2284 0.2377
Scale: 20601.5085 0.2364 0.2345 0.2393
Add: 23097.1901 0.3160 0.3138 0.3181
Triad: 23064.0349 0.3180 0.3142 0.3241
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
========================================================================
12 UltraSPARC s400 1050MHz processors

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 150994944
 Offset = 0
 The total memory requirement is 3456 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 10729.7483 0.2329 0.2252 0.2454
Scale: 10353.4623 0.2374 0.2333 0.2438
Add: 11653.0178 0.3174 0.3110 0.3236
Triad: 11631.1440 0.3173 0.3116 0.3255
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
========================================================================
Received on Fri Feb 20 10:14:01 2004

This archive was generated by hypermail 2.1.8 : Sun Feb 22 2004 - 10:24:45 CST