From: Keith Josephson (KJosephson@ioncomputer.com)
Date: Mon Jul 07 2003 - 17:16:39 CDT
Dr. McCalpin,
Below are STREAM results for the ION I2X2 dual processor system based on the new Intel "Madison" Itanium 2 processors at 1.4GHz. I ran both the single processor version and the OMP version with one and two threads. There is a description of the hardware being used, the os, and the compilers used, along with the optimization switches given on the compiler invocation.
If the information is complete enough, we would ask that you post the results. If not, let us know what we should do to make the results postable.
Thanks for your help,
Keith Josephson
--------------------------------
Keith Josephson
ION Computer SystemsŪ Inc.
30 Oser Avenue
Hauppauge, NY 11788
(631) 630-0600 Ext.222
kjosephson@ioncomputer.com
http://server.ioncomputer.com/
--------------------------------
i2x4:/home/keith/stream # # ION Computer Systems, Inc. - http://itanium.ioncomputer.com
i2x4:/home/keith/stream # #
i2x4:/home/keith/stream # # ION I2X2 configured with:
i2x4:/home/keith/stream # # (2) 1.4GHz Intel Itanium 2 processors with 4MB L3 cache each
i2x4:/home/keith/stream # # 8GB DDR Memory, 8x1GB
i2x4:/home/keith/stream # # (1) 36GB Ultra160 10,000 rpm SCSI disk
i2x4:/home/keith/stream # # SuSE SLES 8 (powered by UnitedLinux 1.0) (ia64) Linux 2.4.19-SMP
i2x4:/home/keith/stream # #
i2x4:/home/keith/stream # # Intel(R) C++ Itanium(R) Compiler for Itanium(R)-based applications
i2x4:/home/keith/stream # # Version 7.1, Build 20030605
i2x4:/home/keith/stream # # Copyright (C) 1985-2003 Intel Corporation. All rights reserved.
i2x4:/home/keith/stream # #
i2x4:/home/keith/stream # #######################################################################
i2x4:/home/keith/stream # ecc -O3 -ip -openmp stream_d.c second_wall.c -o stream_d
i2x4:/home/keith/stream # ./stream_d
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 78391 microseconds.
(= 78391 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 2978.5719 0.1075 0.1074 0.1081
Scale: 3004.0450 0.1066 0.1065 0.1066
Add: 3419.7778 0.1404 0.1404 0.1406
Triad: 3425.6607 0.1402 0.1401 0.1403
i2x4:/home/keith/stream #
i2x4:/home/keith/stream # ecc -O3 -ip -openmp stream_d_omp.c second_wall.c -o stream_d_omp
stream_d_omp.c(102) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
stream_d_omp.c(119) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
stream_d_omp.c(143) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
stream_d_omp.c(149) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
stream_d_omp.c(155) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
stream_d_omp.c(161) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
i2x4:/home/keith/stream # export OMP_NUM_THREADS=1
i2x4:/home/keith/stream # ./stream_d_omp
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 80029 microseconds.
(= 80029 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 2882.4139 0.1111 0.1110 0.1117
Scale: 2879.2264 0.1112 0.1111 0.1112
Add: 3380.3053 0.1420 0.1420 0.1421
Triad: 3386.0531 0.1418 0.1418 0.1418
i2x4:/home/keith/stream # export OMP_NUM_THREADS=2
i2x4:/home/keith/stream # ./stream_d_omp
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 73630 microseconds.
(= 73630 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 3120.2142 0.1027 0.1026 0.1040
Scale: 3112.6596 0.1028 0.1028 0.1029
Add: 3523.5303 0.1363 0.1362 0.1363
Triad: 3534.8454 0.1359 0.1358 0.1360
This archive was generated by hypermail 2.1.4 : Sun Jul 13 2003 - 17:09:50 CDT