Contents
Changelog:
- 28 October 2020: adjust compatibility note to reflect
valgrind
not being on all the department machines - 28 October 2020: fix comment in template file stating that there were 6.4 million accesses to state there are 64 million accesses and correct indices in description of how code works to match default SKIP value
- 2 November 2020: add explicit note under “Your Task” about being able to modify the skeleton code as much as wanted.
Your Task
Compatability note: This assignment requires a copy of valgrind, which is available on the department
portal
machines. If you are using the department NX (virtual desktop) machines, it is installed on some of them but not all of them: you can check by runningvalgrind
and seeing if it saysvalgrind: command not found...
. If so, you can do the assignment by logging into theportal
machines withssh portal
in a terminal from your NX instance. Otherwise, you can do the assignment directly on your NX virtual desktop.
-
Download the skeleton code we provide in this archive [last updated 2020-10-28 (to correct 6.4 million to 64 million in a comment in the template and correct indices in example of how array is setup to match default settings)]
-
Modify or replace the supplied programs
prog1.c
throughprog3.c
so that, in the simulation under valgrind cachegrind:(The skeleton code we supply has a structure that should be useful and comments suggesting particular modifications that might be useful, but you can perform whatever modifications you think would be helpful, including completely replacing the programs.)
-
For
prog1.c
: Write a program that achieves a <10% miss rate with 32KB, 2-way data cache with 64B blocks but a >90% miss rate with a 16KB, 2-way data cache with 64B blocks.Hint: a program that would perform best with between 16KB and 32KB of data cached would be likely to achieve this.
-
For
prog2.c
: Write a program that achieves a <10% miss rate with 32KB, 4-way data cache with 64B blocks but a >90% miss rate with a 32KB, 2-way data cache with 64B blocks. Submit your program asprog2.c
Hint: you want to create conflict misses by accessing values that are spaced apart in a particular way. Try working out what you would need to do with a very small cache first.
-
For
prog3.c
: Write a program that achieves a <60% miss rate with 32KB, 2-way data cache with 128B blocks but a >90% miss rate with a 32KB, 2-way data cache with 64B blocks. Submit your program asprog3.c
Hint: take advantage of spatial locality
-
-
Choose any one of the programs
prog4.c
andprog5.c
to modify or replace so that:-
For
prog4.c
: Write a program that achieves a <10% miss rate with 32KB, 2-way data cache with 64B blocks but a >90% miss rate with a 32KB, 2-way data cache with 128B blocks. Submit your program asprog4.c
Hint: take advantage of the smaller block size allowing the cache to store more blocks. You will probably need a less regular access pattern than the previous parts (changing MAX, SKIP, ITERS won’t be enough).
-
For
prog5.c
: Write a program that achieves a <40% miss rate with 32KB, 2-way data cache with 128B blocks but a >50% miss rate with a 32KB, 2-way data cache with 64B blocks. Submit your program asprog5.c
-
-
Test your programs using the instructions below.
-
Use
make submit
to create a .tar file, and upload that file to kytos.
Valgrind cachegrind
valgrind
, which is installed on the department machines, includes a cache simulator
that can simulate a variety of data cache configurations and report the results.
For example, running something like:
valgrind --D1=8192,2,64 --I1=8192,2,64 --tool=cachegrind ls
runs ls
simulating a 8KB, 2-way data cache with 64-byte blocks and a 8KB, 2-way instruction
cache with 64-byte blocks and reports the results of the simulation in the output,
which might look like:
==518718== Cachegrind, a cache and branch-prediction profiler
==518718== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==518718== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==518718== Command: ls
==518718==
--518718-- warning: L3 cache found, using its data for the LL simulation.
check_miss_rate.py part1a part1c prog1.c prog3 prog4.c prog5.c
Makefile part1b part1d prog2.c prog3.c prog5 prog-template.c
==518718==
==518718== I refs: 515,059
==518718== I1 misses: 1,961
==518718== LLi misses: 1,825
==518718== I1 miss rate: 0.38%
==518718== LLi miss rate: 0.35%
==518718==
==518718== D refs: 174,210 (127,904 rd + 46,306 wr)
==518718== D1 misses: 5,769 ( 4,585 rd + 1,184 wr)
==518718== LLd misses: 4,125 ( 3,049 rd + 1,076 wr)
==518718== D1 miss rate: 3.3% ( 3.6% + 2.6% )
==518718== LLd miss rate: 2.4% ( 2.4% + 2.3% )
==518718==
==518718== LL refs: 7,730 ( 6,546 rd + 1,184 wr)
==518718== LL misses: 5,950 ( 4,874 rd + 1,076 wr)
==518718== LL miss rate: 0.9% ( 0.8% + 2.3% )
Lines starting with ==NUMBER==
are the output of valgrind
’s cachegrind tool, the number
indicates the process ID (in case multiple programs are run in parallel). Cachegrind simulates
a two-level cache hierarchy (a first-level instruction cache and first-level data cache
which used a single last-level cache). The fields of the cachegrind output are:
I refs
: instruction cache references (accesses)I1 misses
: first-level instruction cache missesLLi misses
: last-level cache misses for instructionsI1 miss rate
: first-level instruction cache miss rateLLi misses
: last-level cache miss rate, only counting instruction accessesD refs
: data cache referencesD1 misses
: first-level data cache misses; also split up into reads (rd
) and writes (wr
)LLd misses
: last-level cache misses for data; also split up into reads (rd
) and writes(wr
)D1 miss rate
: first-level data cache miss rate; also split up into read miss rate and write miss rateLLd miss rate
: last-level cache miss rate, only counting data accessesLL refs
: last-level cache missesLL misses
: last-level cache miss countLL miss rate
: last-level cache miss rate
The fields fields we will be concerned about are the ones for the first-level data cache (D refs
, D1 misses
, D1 miss rate`).
Supplied files
Below, we will ask you to write programs to achieve certain cache miss rates in cachegrind’s simulation.
To do this we supply:
-
Makefile — scripts for compiling and running your programs and for creating an archive for submission
-
prog1.c, prog2.c, prog3.c, prog4.c, prog5.c — templates for the programs you need to produce below. The template program will achieve a high cache miss rate.
Template programs
The supplied template program which sets up a global array, and then accesses that global array in a loop. When accessing the array, the program reads an element of array to determine what element to access next. (You can think of the array as a big cyclic linked list.)
The way in which it accesses this global array can be changed by changing how the array is initialized; our supplied code performs a regular pattern explained in a comment in the template file. You can adjust this pattern in by changing some settings in the program.
If you would rather write your own code entirely to achieve these cache miss rates, you are welcome to do so, but I believe the supplied template code would make this task easier.
Compiling and Testing
Testing individual programs
You can compile your programs using a command like:
make prog1
Also, the Makefile provides testing commands like:
make run-prog1
which will compile (if needed) and run a program under valgrind cachegrind with the cache settings in the tasks above.
Testing everything
We have a standalone test:
make test-all
which runs all the programs and parses the output to determine whether you’ve achieved the miss rate targets. To run this test on department machines, you will need to run
module load python
first.