From: Mathew COLGROVE (mathew.colgrove@st.com)
Date: Tue Aug 10 2004 - 18:01:54 CDT
Hi John,
Enclosed is an updated result for your STREAM web site. We have
determined that when using the Portland Group Compilers (PGI) the
optimal flag set for STREAM is "-O2 -Mvect=sse -Mnontemporal". When
running the OMP version, add the "-mp" flag. Hopefully, this
information will help the users of the STREAM benchmark to achieve the
highest possible results!
System:
Model Name: AMD Opteron(tm) Processor 248
cpu Mhz: 2200
Motherboard: ASUS SK8N
Cache Size: 1024 KB
Memory: 4x512MB, DDR400, PC3200, Corsair, CL2
Operating System: SuSE 9.0
Kernel: 2.4.21-102-default
Compiler:
The Portland Group (PGI) pgcc Release 5.2-1
Output:
% pgcc -O2 -Mvect=sse -Mnontemporal -V second_wall.c stream_d.c -o stream
pgcc 5.2-1
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc. All Rights Reserved.
second_wall.c:
PGC/x86-64 Linux/x86-64 5.2-1
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc. All Rights Reserved.
stream_d.c:
PGC/x86-64 Linux/x86-64 5.2-1
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc. All Rights Reserved.
% stream
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 100 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 16489 microseconds.
(= 16489 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 4302.8156 0.0075 0.0074 0.0078
Scale: 4251.4326 0.0076 0.0075 0.0077
Add: 4496.9085 0.0107 0.0107 0.0109
Triad: 4457.6785 0.0108 0.0108 0.0110
Sincerely,
Mathew Colgrove
QA Engineer
The Portland Group
-- ---------------------------------------------------------------------- Mathew Colgrove - Quality Assurance Advanced Compilers and Tools AST Portland Lab, STMicroelectronics mathew.colgrove@st.com (503) 682-2806 (voice) (503) 682-2637 (FAX)
This archive was generated by hypermail 2.1.4 : Wed Aug 11 2004 - 22:32:30 CDT