The STREAM2 Home Page

Introduction

STREAM2 is an attempt to extend the functionality of the STREAM benchmark in two important ways:

STREAM2 measures sustained bandwidth at all levels of the cache hierarchy, and

STREAM2 more clearly exposes the performance differences between reads and writes

STREAM2 is based on the same ideas as STREAM, but uses a different set of vector kernels:

FILL:        similar to bzero(), but fills with a constant instead of zero

COPY:        similar to bcopy(), and the same as STREAM Copy

DAXPY:    similar to STREAM Triad, but overwrites one of the input vectors instead of writing results to a third vector

SUM:        sum reduction on a single vector -- reads only, no writes

Kernel	Code	Bytes/iter read	Bytes/iter written	FLOPS/iter
FILL	a(i) = q	0 (+8)	8	0
COPY	a(i) = b(i)	8 (+8)	8	0
DAXPY	a(i) = a(i) + q*b(i)	16	8	2
SUM	sum = sum + a(i)	8	0	1

Table 1: Characteristics of the STREAM2 kernels. The value in parentheses in the "Bytes/iter read" column indicates the number of additional bytes read per iteration on machines with a "write allocate" cache policy.

Source Code

The STREAM2 source code is provided in Fortran77 -- you are welcome to translate it to C, but I have not gotten around to it yet. The control flow is a bit more complex than STREAM because of the looping over multiple iterations of many different vector lengths.

The main feature is that the same number of work is done for each vector length, so the shorter vector lengths are iterated many times and the longer vector lengths fewer times.

Sample Results

Here are some sample results off of machines in my house and office. The machines listed are described in Table 2.

Machine	CPU	MHz	L1 Data Cache	L2 Data Cache	Peak L2 cache Bandwidth	bus width @ speed	Peak Memory Bandwidth
IBM RS/6000-397	POWER2-SC	160	128kB @ 160 MHz	none	N/A	256 bits @ 80 MHz	2560 MB/s
Upgraded Mac clone	PowerPC G3	367.5	32kB @ 367.5 MHz	512 kB @ 183.75 MHz	2940 MB/s	64 bits @ 52.5 MHz	420 MB/s
PowerComputing PowerCurve 601/120	PowerPC 601	120	64kB (I+D) @ 120 MHz	256kB @ 40 MHz	320 MB/s	64 bits @ 40 MHz	320 MB/s
Mac Quadra 650	Motorola 68040	33	8 kB @ 33 MHz	none	N/A	32 bits @ 33 MHz	132 MB/s

FILL

COPY

DAXPY

SUM