Please find below STREAM results for a 1024-processor Altix 4700 system.
System details:
SGI Altix 4700 (SSI)
1024 Intel Itanium2 processors (1.6 GHz / 6 MB L3 cache)
Bandwidth configuration -- 1 processor per NUMA node
4 TB main memory
SUSE Linux Enterprise Server 10 + SGI ProPack 5
Please note that this is not a "depopulated" system. Altix 4700
supports one or two single- or dual-core Itanium2 cpus per memory node
as standard configurations.
Run details:
Intel compiler 9.1 build 20060523
Compilation flags: -O3 -i8 -openmp -extend_source
Standard STREAM source code, modified to handle formatting requirements
of large arrays.
The dplace tool was used to pin threads to cpus.
OMP_NUM_THREADS = 1022
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 164828160000
Offset = 4351
The total memory requirement is 3772617 MB
You are running each test 20 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 3661962.6484 0.7214 0.7202 0.7233
Scale: 3677481.5544 0.7197 0.7171 0.7222
Add: 4385584.9635 0.9052 0.9020 0.9121
Triad: 4350165.8327 0.9123 0.9094 0.9193
----------------------------------------------------
Solution Validates!
----------------------------------------------------
Regards,
John
--
John Baron jbaron@sgi.com
SGI Performance Engineering 651-683-3544
Received on Mon Jul 10 17:57:25 2006
This archive was generated by hypermail 2.1.8 : Tue Jul 11 2006 - 07:54:30 CST