1 Part 1: Linking
2 Part 2: Complex Addressing Modes
- 2.1 Subpart a
- 2.2 Subpart b
3 Part 3
- 3.1 Subpart a
- 3.2 Subpart b
4 Part 4
- 4.1 three- versus two-operand instructions
- 4.2 Subpart a

Changelog:

14 Sep 2020 approx 10PM: be explicit in Part 3b that we don’t expect insturctions that are never executed to be counted, since we don’t tell you about them in our benchmark data
17 Sep 2020: correct error in answer sheet where we asked for “times faster” instead of “times more instructions for Part 2b to actually match what we ask here. We’ll also accept answers that based on how times fewer instructions are executed for those who used the original interpretation.
18 Sep 2020: for part 3a and 3b, provide a clearer description of what the difference between the two ways we want instructions counted; pointing out that the way we want them counted for 3a is the same as the “issued” column in the benchmark data and the way for 3b is the same as the “in program” column.
18 Sep 2020: for part 2b, clarify that you do not need to consider instructions like movq 0x100000, %rax as using complex addressing that needs to be translated
18 Sep 2020: in part 3a, clarify the assumptions are about signed numbers, since this is what makes it possible to use the benchmark data most conveniently
21 Sep 2020: in part 3a, clarify that 16-bit constants should not need additional instructions with note that we’ll accept answers that don’t assume this
22 Sep 2020: in part 4a, correct typo of “necessary” for “unnecessary”
22 Sep 2020: on answer sheet, correct questions corresponding to Part 2b not making any sense (due to copy-and-paste issues)
23 Sep 2020: in part 4, add note re mulq needing to be imulq

Record your answers on the answer sheet.

Part 1: Linking

Using the supplied object file/executable format description (same as in the lab):

translate this following object files into an executable file

Copy-and-paste your executable into the answer sheet.

Part 2: Complex Addressing Modes

x86-64 allows for the complex addressing modes where a memory location is specified using two registers as in (%rax, %rbx), including the most complicated form of something like 1234(%rax, %rbx, 4). Although this makes instruction encoding more complicated, perhaps a x86-64 processor can obtain a benefit by including hardware specialized for performing the two-register+scale address computation.

Subpart a

Using only the 0x1234(%rax)-style of addressing memory (like supported in Y86-64), write out assembly code equivalent to

      movq 0x1234(%rax, %rbx, 8), %rcx

using not more than 5 instructions. Your assembly code can use x86-64 instructions that don’t exist in Y86-64, like imul or shl.

Subpart b

For this problem, use the instruction information from benchmarks originally supplied in the lab.

If the compiler naively translated these instructions to avoid the complex addressing (using only the offset(base register) addressing supported by Y86-64), then about how many times more instructions would the processor execute? Record your answers (for each benchmark program) on the answer sheet), along with an explanation of your method. (Since you do not have precise information about the programs, your estimate does not have to be exact. Since we are asking for naive translations, you do not have to come up with the best possible assembly translation. You do not need to handle converting instructions like movq 0x100000, %rax (no index register or scale) to the offset(base register) style addressing, as we don’t consider that a complex addressing mode (but we will not deduct credit if you attempted this).)

Part 3

Fixed-length encodings simplify instruction sets, but usually at the cost of making programs larger. In a fixed-length encoding, instructions that take fewer operands must take up the same amount of space as an instruction that uses many operands.

Based on the Y86-instruction set one might imagine that this means that an equivalent fixed-length instruction set would have 10-byte instructions. While this is certainly a possibility, typically fixed-length instruction sets compromise by limiting the size of constants in instructions. For example, the RISC V 64-bit ISA supports an encoding where all instructions are 32-bits, even though registers are 64-bits. So rather than being able to do something like:

      irmovq $0x12345678, %r10

to load the 32-bit value 0x12345678, one must use two instructions. If Y86-64 was designed more like RISC V, this might look like:

          lui $0x1234, %r10 /* load upper immediate (constant) --- move into top -bits of destination */
iaddq $0x5678, %r10 /* add immediate (constant) --- with 16-bit immediate */

where the encoding of lui and iaddq wouldn’t have room for much more than a 16-bit immediate after fitting in the opcode and register number into the 32-bit instructions.

To put a 64-bit value like 0x1234567890ABCDEF in a register in this design, one would need a sequence of several instructions like:

          /* "load upper immediate" */
lui $0x1234, %r10
/* "immediate add" */
iaddq $0x5678, %r10
/* "logical shift left by immediate (constant)" */
isll $16, %r10
iaddq $0x90AB, %r10
isll $16, %r10
iaddq $0xCDEF, %r10

    
  

Or, alternately, a compiler might prefer to generate a rmmovq instruction rather than attempt to use the equivalent of irmovq.

Subpart a

For this problem, use the instruction information from benchmarks originally supplied in the lab.

Since our benchmark programs were run on X86-64, instructions supporting 32- and 64-bit constants were readily available to the compiler. Let’s consider how the compiler would perform if instructions using larger constants required a sequence to load the larger constant value. To get a rough estimate of the effect of a fixed-length instruction set, lets suppose the compiler would need:

2 instructions to construct a 32-bit signed value
4 instructions to construct a 48-bit signed value
6 instructions to construct a 64-bit signed value

[Note (added 18 Sep): originally we did not specify that these are signed values (just said “32-bit value”, etc.), but we think you would have trouble using the benchmark data we provide with assuming this.]

[Added 21 Sep:] Assume that using 16-bit signed values requires no additional instructions (we originally neglected to specify this explicitly, so we’ll also accept answers that don’t use this assumption).

Most likely, the compiler would also need to use some extra registers. To keep things simpler, let’s assume that effect is negligible. Based on these assumptions, about how many times more instructions would be executed in the benchmark programs? (When counting instructions executed for this part, if an instruction is executed 10 times, count it 10 times. This is what the “issued” column in our benchmark data does.)

Subpart b

Under the same assumptions as subpart a, about how much larger would the benchmark programs be in terms of number of instructions stored in the program’s executable or libraries that were executed at least once? (When counting instructions stored and executed at least once for this part, count each instruction once. This is what the “in program” column in our benchmark data does.)

(Since variable-length instruction sets typically have shorter instructions, this is probably an underestimate of increase in file size.)

Part 4

three- versus two-operand instructions

Making a typical instruction have three operands, so that assembly to compute C = A + B is like:

      add A, B, C

rather than like:

          mov B, C
add A, C

should allow many mov instructions to be avoided. x86-64, of course, requires the second form. However, compilers can usually find ways of allocating registers to avoid copying the value. For example:

if B is not used after C = A + B, the compiler can select to store C in the register previously being used for B and avoid a mov instruction; or
if the original code is doing an operation A = A + B instead of C = A + B, the compiler can store the result in A by doing add B, A (taking advantage of the add being communicative).

One might worry that these cases are rare and so having only two operands will have a very large cost, but looking at what compilers do in practice often seems to show that it is not.

Subpart a

Consider the following x86-64-style assembly code but with three register instead of two register instructions (“source, source, destination” instead of “source, source and destination”):

              mulq    %rdx, %r9, %rax
    mulq    %rbx, %r10, %rcx
    addq    %rax, %r12, %r11
    addq    %r11, %rcx, %r12
    mulq    %rdx, %r13, %r11
    mulq    %r10, %rcx, %r13

    
  

(This snippet is based on an excerpt of a matrix multiply routine compiled for an architecture with three-operand arithmetic instructions.)

Naively translated into two-register assembly would result in code like:

              movq    %r9, %rax
    mulq    %rdx, %rax
    movq    %r10, %rcx
    mulq    %rbx, %rcx
    movq    %r12, %r11
    addq    %rax, %r11
    movq    %rcx, %r12
    addq    %r11, %r12
    movq    %r13, %r11
    mulq    %rdx, %r11
    movq    %rcx, %r13
    mulq    %r10, %r13

    
  

([Added 23 Sep:] assume mulq is a two-argument multiply instruction that works like add; due to me getting confused, imul would be the actual x86-64 instruction.)

But some of these movq instructions are unnecessary. Rewrite this two-register assembly as assembly that omits at least two mov instructions but results in the same values in registers.

Contents

Part 1: Linking

Part 2: Complex Addressing Modes

Subpart a

Subpart b

Part 3

Subpart a

Subpart b

Part 4

three- versus two-operand instructions

Subpart a