Stream results for HP Integrity Superdome with the sx2000 chipset, and
1.6GHz/24MB Dual Core Intel(R) Itanium(R) 2 processors:
16 cells, 64 processors/128 cores, 512GB of memory (512x1GB DIMMs):
----------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 154504.1630 0.0814 0.0795 0.0896
Scale: 152999.3089 0.0813 0.0803 0.0842
Add: 169468.0928 0.1093 0.1088 0.1117
Triad: 170832.6902 0.1112 0.1079 0.1276
8 cells, 32 processors/64 cores, 256GB of memory (256x1GB DIMMs):
----------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 77274.3200 0.2271 0.1590 0.4369
Scale: 76845.2576 0.1608 0.1599 0.1631
Add: 85409.2677 0.3567 0.2158 0.8011
Triad: 85792.1817 0.3145 0.2148 0.5301
4 cells, 16 processors/32 cores, 128GB of memory (128x1GB DIMMs):
----------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 38614.6964 0.3187 0.3182 0.3195
Scale: 38494.8036 0.3199 0.3192 0.3208
Add: 42838.3551 0.4306 0.4303 0.4310
Triad: 42921.9548 0.4298 0.4294 0.4307
Note that with these processors the front side busses run at 533MT/sec.
The system was configured with half of the memory in each cell assigned to
local memory.
The runs used the v5.6 f90 version of the stream benchmark, with the
following changes:
97c97
< PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10)
--- > PARAMETER (n=768000680,offset=0,ndim=n+offset,ntimes=10) 101c101 < INTEGER j,k,nbpw,quantum --- > INTEGER*8 j,k,nbpw,quantum 106c106 < INTEGER bytes(4) --- > INTEGER*8 bytes(4) 124c124 < * COMMON a,b,c --- > COMMON a,b,c 245c245 < 9020 FORMAT (1x,a,i4,a) --- > 9020 FORMAT (1x,a,i6,a) 247c247 < 9040 FORMAT ('Function',5x,'Rate (MB/s) Avg time Min time Max time' --- > 9040 FORMAT ('Function',5x,'Rate (MB/s) Avg time Min time Max time' 249c249 < 9050 FORMAT (a,4 (f10.4,2x)) --- > 9050 FORMAT (a,f12.4,2x,3 (f10.4,2x)) compiled as follows: f90 -o stream_d.omp +Ofaster +DSitanium2 +DD64 +extend_source +autodbl4 +noppu -Wl,+pd,1M +Oopenmp stream.f mysecond.o and run with the "mpsched -T FILL" command to distribute threads across locality domains. By default each thread allocated memory from the local memory in each cell. Here are the outputs for each configuration: 16 cells, 64 processors, 512 GB of memory (512x1GB DIMMs): ------------------------------------------------------------- mpsched -T FILL stream_d.omp ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- ---------------------------------------------- STREAM Version $Revision: 5.6 $ ---------------------------------------------- Array size = 768000680 Offset = 0 The total memory requirement is 17578 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------- Number of Threads = 128 ---------------------------------------------- Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 154504.1630 0.0814 0.0795 0.0896 Scale: 152999.3089 0.0813 0.0803 0.0842 Add: 169468.0928 0.1093 0.1088 0.1117 Triad: 170832.6902 0.1112 0.1079 0.1276 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- 8 cells, 32 processors, 256 GB of memory (256x1GB DIMMs): ------------------------------------------------------------- mpsched -T FILL stream_d.omp ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- ---------------------------------------------- STREAM Version $Revision: 5.6 $ ---------------------------------------------- Array size = 768000680 Offset = 0 The total memory requirement is 17578 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------- Number of Threads = 64 ---------------------------------------------- Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 77274.3200 0.2271 0.1590 0.4369 Scale: 76845.2576 0.1608 0.1599 0.1631 Add: 85409.2677 0.3567 0.2158 0.8011 Triad: 85792.1817 0.3145 0.2148 0.5301 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- 4 cells, 16 processors, 128 GB of memory (128x1GB DIMMs): ------------------------------------------------------------- mpsched -T FILL stream_d.omp ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- ---------------------------------------------- STREAM Version $Revision: 5.6 $ ---------------------------------------------- Array size = 768000680 Offset = 0 The total memory requirement is 17578 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------- Number of Threads = 32 ---------------------------------------------- Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 38614.6964 0.3187 0.3182 0.3195 Scale: 38494.8036 0.3199 0.3192 0.3208 Add: 42838.3551 0.4306 0.4303 0.4310 Triad: 42921.9548 0.4298 0.4294 0.4307 ---------------------------------------------------- Solution Validates! ----------------------------------------------------Received on Fri Jul 14 09:44:47 2006
This archive was generated by hypermail 2.1.8 : Fri Jul 14 2006 - 10:14:19 CST