Changelog:
- 14 Sep 2020 7:05 PM: fix syntax error (use of
rrmovq
instead ofmrmovq
) insum.ys
. - 14 Sep 2020 10PM: in description of benchmark data, clarify that instructions that are never executed are not counted
- 15 Sep 2020: rename
instr-reports.tar.gz
toinstr-reports.tar
to reflect that it’s not compressed - 15 Sep 2020 7PM: in part 2, be explicit that we only use power-of-two register counts, so we don’t have to consider complex register number encoding schemes
- 15 Sep 2020 7:10PM correct
sum.ys
to use registers that actually make sense; we will accept object files based on either version. - 15 Sep 2020: state what columns in the benchmark outputs since it can be tricky to see where the column labels are in the output files; also reformat description of the two types of instruction counts earlier to make it clearer
- 15 Sep 2020: clarify “on a value in memory versus in register” to “on a value primarily kept in memory versus primarily kept in a register”
- 16 Sep 2020: remove table row from part 2 that we did not ask to be copied to the answer sheet to avoid confusion.
Your task
- For all of the tasks below, answer the corresponding question on the answer sheet. Remember that for the lab (but not the corresponding homework) you may collaborate with other students; if you do please, note this in the comments on the answer sheet.
Linking
-
(answer sheet part 1) Using the supplied object file format description
- translate these Y86 files sum-main.ys and sum.ys into “object files”, and copy-and-paste it to the answer sheet. (we originally linked a different, less useful version of sum.ys; you can submit an object file based on either version.)
You may (and we would strongly recommend) use the
yas
(Y86 assembler tool) distributed with HCLRS to help you. See the instructions below.
ISA tradeoffs
-
(answer sheet part 2) Consider a fixed-instruction-length ISA. Complete the table below describing options for instruction layout and register count:
instruction size (bits) opcode size (bits) maximum register count (2 operands) maximum register count (3 operands) 8 3 4 2 16 4 64 16 16 5 ~ ~ 16 8 ~ ~ 32 10 ~ ~ (You may assume we only use power-of-two register counts.)
Record your answers on the answer sheet.
-
We have supplied some statistics from instructions in some benchmark programs described below, which will be used for couple of the below questions and some questions on the corresponding homework.
-
(answer sheet part 3) For three of the benchmarks, we provide statistics about them compiled with 8 registers available and 16 registers available. Answer the following questions about these statistics:
-
(subpart a) Some of the programs have fewer total instructions executed when fewer registers are available. This is related to x86-64 allowing most instructions to compute on values in memory (unlike Y86-64, where they must be moved into a register first). Give an example of how a compiler would generate code with fewer instructions when computing on a value primarily kept in memory versus primarily kept in a register.
[Hint: consider what the compiler has to do when it’s done computing on a value in a register.]
-
(subpart b) Suppose we are comparing two processors, one with 8 registers with a 2.2 GHz clock rate (2.2 billion clock cycles per second) where instructions that only access registers take 1 cycle and instructions that access memory take an average of 3 cycles and another with 16 registers, a 2GHz clock rate and the same cycles per instruction.
For the benchmark programs, how much slower or faster would the benchmark programs be on the 16-register procesor based on our benchmark results? Describe your calculation.
-
using YAS
Our textbook authors have written a Y86 assembler, which we distribute with HCLRS. To use it:
-
First download hclrs.tar.
-
Extract hclrs.tar. You can do this from the command line using
tar xvf hclrs.tar
, which will create anhclrs
directory. -
In the
hclrs
directory, runmake
. -
In a text editor, create an example Y86 assembly program
simple.ys
:start: irmovq $100, %rax irmovq $1, %rbx irmovq $3, %rcx irmovq $4, %rdx loop: subq %rbx, %rax addq %rdx, %rcx jg loop halt
-
Run
tools/yas simple.ys
to producesimple.yo
:0x000: | start: 0x000: 30f06400000000000000 | irmovq $100, %rax 0x00a: 30f30100000000000000 | irmovq $1, %rbx 0x014: 30f10300000000000000 | irmovq $3, %rcx 0x01e: 30f20400000000000000 | irmovq $4, %rdx 0x028: | loop: 0x028: 6130 | subq %rbx, %rax 0x02a: 6021 | addq %rdx, %rcx 0x02c: 762800000000000000 | jg loop 0x035: 00 | halt
-
This file includes the machine code for the assembly program on the right-hand side. For example, reading the line
0x00a: 30f30100000000000000
tells us that the byte with index 0xa is0x30
, the byte with index0xb
is0xf3
, with0xc
is 0x01`, and so on.Note that this format is deliberately very similar to what our object file format expects, so you can copy-and-paste most of the instructions.
-
yas
only handles complete assembly programs, so if you try to assemble a file which references a label from elsewhere it will fail. To deal with this, I would recommend replacing the label with one that’s actually present in the assembly file soyas
will work, and then manually determining where that label’s address was placed.
benchmark statistics
We have built a tool that captures on 64-bit x86 Linux programs some statistics about the instructions in that program. We both check how many times the instruction is present in the program’s executable and its libraries and how many times the instruction is actually issued. For example, given the program:
main: movq $0, %rax start_loop: addq $1, %rax cmp $20, %rax jle start_loop ret
we would say that the cmp
instruction is in the program 1 time and is issued or executed 20 times (the number
of iterations the loop is executed).
Our tool identifies each instruction fits into categories like “register-to-register unconditional mov”, and outputs both counts:
- the number of times those instructions appear in the program and its libraries (assuming it was executed once); and
- the number of times the instruction was issued during a particular execution.
(I believe these statistics are mostly correct, but given the complexities of obtaining them and interpreting x86-64 instructions, it’s likely there are some inaccuracies.)
Our tool is based on processing a list of executed instructions, so it does not count instructions that are never executed, even if they are present in the executable/library.
You can download our tool here [last updated 2020-09-14];
it has only been tested on x86-64 Linux
and requires valgrind and objdump
(from GNU binutils) to be installed.
(I believe both of these are present on department machines like portal.cs.virginia.edu.)
However, you do not have to run this tool to complete this lab; instead, we will supply several outputs you will use for later questions:
- from a matrix multiply of size 1024x1024
- from a 1024-variable linear solve
- from a program counting the number of solutions to the N-queens problem for N=13
- from the chess engine LazyGull analyzing a particular chess position
- from GCC compiling the N-queens program with optimizations enabled
You can download these in this archive [last updated 2020-09-15 (to correct file format issue)]. The outputs each contain a table, whose columns are labeled in the second line of the output and are:
- category (description of type of instruction)
- issued (number of times instructions of that type executed)
- % of issued (number of times instructions of that type executed divided total number of times instructions exectued times 100)
- in program (number of instructions of that type)
- % of in program (number of instructions of that type divided by total number of instructions times 100)
(In addition, we supply the source code for the custom programs involved.)
For the first three programs, we also supply statistics from them compiled in a way where only 16 registers
(instead of 8) is available (for both general purpose and floating point/vector registers),
by telling GCC that the other registers are reserved using the -ffixed-REG
compiler option.
Note that some of the categories of statistics overlap; for example, we report both “instructions that write memory” and “non-mov/cmov/push instructions that write memory”.