Re: STREAM on Origin 128pe

From: Yoshio Ashizawa (ashizawa@crj.cray.com)
Date: Mon Jun 30 1997 - 04:23:54 CDT


Hi John,
I have a time to rerun stream on flurry again.
There are 126 pes today. So I can not get 128 pe number.
I recompiled the source and run it again.
I used -Ofast flag this time.
John McCalpin wrote:
>
> Good morning!
>
> These STREAM results look pretty close to what I expect. I am
> a bit surprised that you were able to do this well without one of
> the following flags:
> -TARG:processor=r10000
> -r10000
> -Ofast=ip27
>
> These flags definitely turn on prefetching in the cpu, and I thought
> that they were required to do so. If you have time, trying with the
> -r10000 flag might help.
>
Here is the copy on the console.

flurry 31% ./stream
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 6 microseconds
 The tests below will each take a time on the order
 of 69352 microseconds
    (= 11559 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 21857.5624 0.0940 0.0586 0.2056
Scale: 23351.7065 0.0725 0.0548 0.1105
Add: 24459.5172 0.0887 0.0785 0.1430
Triad: 22913.5965 0.1236 0.0838 0.1625
 Sum of a is = 9.2264062500933173E+19
 Sum of b is = 1.8452812500184785E+19
 Sum of c is = 2.4603750000403853E+19

The -Ofast flag increases the Scale, Add and Triad numbers.
In the HGC benchmark, they only evaluate the Copy number.
So I will used the last run for this benchmark.

> Concerning the core dump problem, you have probably already had to
> do an "unlimit stacksize" to run this -- it is absolutely required
> for large sizes. There is a bug in some versions of IRIX 6.5 that
> prevents you from growing the stack beyond 1 GB -- this should be
> fixed in patch 2121. To see if this is installed, run
> versions | grep -i patch | grep 2121
>
> The results are looking quite good already!
> john
>
I checked if the patch has already been installed or not.
There is no pactch.
Thank you and best regards,

        ashizawa

-- 

------------------------------------------------------------------ Yoshio Ashizawa Scalable Systems Technology Center, Nihon SiliconGraphics Cray K.K. E-mail: ashizawa@crj.cray.com PHONE : +81-3-3239-5101 FAX: +81-3-3239-1323



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:07 CDT