Stream results for HP 9000 Superdome with 1000MHz dual processor PA-8800
modules with 32MB external cache (shared by each processor pair):
System Copy Scale Add Triad
HP 9000 Superdome PA-8800 8 cells, 64 cpus 14662 14927 15727 15839
HP 9000 Superdome PA-8800 16 cells, 64 cpus 29028 27113 30289 30560
The 16 cell configuration with 2 PA-8800 modules per cell
(half populated) is orderable and fully supported by HP.
The system was running HP-UX 11i TCOE (December 2003), with
all memory interleaved across the cells.
The f90 version of the stream benchmark was compiled auto-parallel, with
the following changes (mysecond.c is a C routine that calls gettimeofday):
63c63
< PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10)
--- > PARAMETER (n=576002248,offset=0,ndim=n+offset,ntimes=10) 72c72 < INTEGER bytes(4) --- > INTEGER*8 bytes(4) 90c90 < * COMMON a,b,c --- > COMMON a,b,c 200c200 < 9020 FORMAT (1x,a,i4,a) --- > 9020 FORMAT (1x,a,i8,a) rm -f *.o stream_d.mp stream_d.uni stream_c.mp stream_c.uni cc +DD64 +O3 -c mysecond.c f90 -o stream_d.mp +Ofaster -Wl,+pd,1M +DD64 +Oautopar +Onoopenmp +autodbl4 +extend_source +noppu stream_d.f mysecond.o output for 8 cells, 64 processors, 128GB (256x512MB DIMMs): ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 576002248 Offset = 0 The total memory requirement is 13183 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity/precision appears to be 2 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 14662.0459 0.6509 0.6286 0.6634 Scale: 14927.3408 0.6364 0.6174 0.6789 Add: 15727.4002 0.9066 0.8790 0.9297 Triad: 15839.2678 0.8995 0.8728 0.9112 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- output for 16 cells, 64 processors, 256GB (512x512MB DIMMs): ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 576002248 Offset = 0 The total memory requirement is 13183 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 29027.5284 0.3187 0.3175 0.3198 Scale: 27112.9345 0.3414 0.3399 0.3435 Add: 30289.4702 0.4571 0.4564 0.4576 Triad: 30559.9206 0.5002 0.4524 0.8271 ---------------------------------------------------- Solution Validates! ----------------------------------------------------Received on Tue Feb 10 16:22:08 2004
This archive was generated by hypermail 2.1.8 : Wed Feb 11 2004 - 16:09:54 CST