F2022 CS 3330 quizzes KEY

Question 1 (6 pt; mean 5.46)

In lecture, we discussed the idea of a processor being connected to memory and I/O devices via common bus. Suppose we are executing the instruction movq (%rbx), %rax in this model. The processor will send messages over the bus in order to ___. Select all that apply.

88%
⊤
request a copy of the machine code for this instruction.
6%
request a copy of the value of %rbx
93%
⊤
request a copy of the value at whatever address is contained in %rbx
1%
request a copy of the value at whatever address is contained in %rax
2%
modify the value in at whatever address is contained in %rbx
27%
modify the value in %rax

part of this instruction, but not something done by communicating over the bus

Regrade request

Consider the following C code:

int x;
int *p;
x = 1077;
p = &x;

Suppose this is executed on a little-endian system where ints are 4 bytes and:

the value of p (after execution) is 0x50000, and
the variable p is stored in memory at address 0x60000

Identify the value of the byte in memory at each of the following addresses. If not enough information is given, instead write unknown and explain briefly in the comment.

Question 2 (2 pt; mean 1.63) (see above)

0x50000

Answer:

Key: 0x35 or 53

1077 = 0x435 is composed of bytes 0x00 (most sig), 0x00, 0x04, 0x35 (least sig); The least significant byte goes in the base address which is 0x50000, because that's what a pointer to x contains.

Regrade request

Question 3 (2 pt; mean 1.57) (see above)

0x50002

Answer:

Key: 0

0x50002 contains the third least significant byte of 1077, which is 0

Regrade request

Question 4 (2 pt; mean 1.77) (see above)

0x50004

Answer:

Key: unknown

no information about what, if anything, is stored at this address since it's just after the int 'x' in memory. (We know it's not p, since it's stored at address 0x60000.)

Regrade request

Question 5 (5 pt; mean 3.97)

Consider the following C snippet:

c = str[x+2];

where c is declared as a char, str is declared as a char * (pointer to char; and based on the context of the snippet a pointer to the beginning of an array of char), and x is declared as a long.

Suppose str is stored in the register %rax, x in %rdi, c in the register %bl, and %r8 through %r15 can be used for temporary values. Which (if any) of the following are correct translations to assembly? Select all that apply.

94%
⊤
movq $2, %r8
addq %rdi, %r8
movb (%rax, %r8, 1), %bl

r8 <- 2; r8 <- x + 2; c <- *(str + (x+2) * 1) = str[x+2]
66%
⊤
movq %rdi, %r8
addq $2, %r8
movb (%r8, %rax, 1), %bl

r8 <- x; r8 <- x + 2; c <- *((x+2) + str * 1) = str[x+2]
6%
addq $2, (%rax, %rdi)
mov (%rax, %rdi, 1), %bl

first instruction is str[x] <- str[x] + 2
2%
movb 1(%rax, %rdi, 2), %bl

c <- str[x * 2 + 1]
58%
movq $0, %r8
addq 2(%rax, %rdi, 1), %r8
movb (%r8), %bl

r8 <- str[x + 2]; c <- *r8 = *str[x+2]

Regrade request

quiz for week 2

Question 1 (4 pt; mean 3.59)

Consider the following C snippet:

unsigned long x;
unsigned long *p;
...
*p = x * 2;

If:

x is stored in %rax,
p is stored in %r8, and
%r9 is a register available to store temporary values

then which of the following x86-64 assembly snippets are equivalent to the C statement above? Select all that apply.

16%
movq %rax, (%r8)
leaq (%r8,2), %r8
14%
leaq (%rax, 2), %r8
movq (%r8), %r8
1%
leaq (%r8, 2), %r9
addq %rax, (%r9)
90%
⊤
leaq (%rax,%rax), %r9
movq %r9, (%r8)

Regrade request

Consider the following assembly snippet:

start:
    addq %r8, %r9
    cmpq %r9, %r10
    jle start
end:

Question 2 (4 pt; mean 2.18) (see above)

Suppose that while this assembly snippet executes, the computations performed by the addq and cmpq instructions do not result in integer overflow (or underflow) if we interpret their operands and results as signed integers. Then, after the snippet finishes executing, what will the values of the condition codes SF and ZF be?

52%
⊤
SF = 0 and ZF = 0

to get to end, jle needs to not jump back to start; when jle does jump to start the last arith result was <= 0 [ignoring overflow]; so when it does not, the last result was > 0 [ignoring overflow]; since the last result != 0, ZF = 0; since last result >= 0, SF = 0.
3%

SF = 1 and ZF = 0
0%

SF = 0 and ZF = 1
41%

it depends on the initial values of the registers %r8, %r9, and %r10
2%

there is not enough information provided (explain in comments)

Regrade request

Question 3 (4 pt; mean 3) (see above)

Assuming that the variables r8, r9, and r10 represent the values in the registers %r8, %r9, and %r10 respectively, which is equivalent C code to the above assembly snippet?

2%

do { r9 += r8; } while (r10 < r9);
68%
⊤
do { r9 += r8; } while (r10 <= r9);
15%
½
do { r9 += r8; } while (r10 >= r9);
8%

do { r9 += r8; } while (r10 > r9);
1%

while (r10 < r9) { r9 += r8; }
3%

while (r10 <= r9) { r9 += r8; }
2%

while (r10 >= r9) { r9 += r8; }
0%

while (r10 > r9) { r9 += r8; }
1%

none of the above

Regrade request

Suppose we have C code in two different files. One file, upperstr.c contains the following:

void upperstr(char *str) {
    for (int i = 0; str[i] != '\0'; i += 1) {
        if (str[i] >= 'a' && str[i] <= 'z') {
            str[i] += ('A'-'a');
        }
    }
}

And another file, printbig.c contains the following:

#include <stdio.h> /* for puts() */

void printbig(char *str) {
    upperstr(str);
    puts(str);
}

An executable is made from code in these files and others. As part of that process an object file upperstr.o is produced from upperstr.c and an object file printbig.o is produced from printbig.c.

Recall from lecture that object files generally contain:

machine code
data (such as string constants and the initial values of global variables)
a table of relocations that identify values within the machine code or data that must be filled in by the linker
a symbol table that identifies the names of labels (that were declared in the original assembly file) and the locations in the machine code or data they represent. (Often these labels correspond to functions or global variables.)

Question 4 (4 pt; mean 3.83) (see above)

Ignoring information only present for debugging, we would expect upperstr.o to contain ____. Select all that apply.

95%
⊤
an entry in its symbol table for upperstr
8%
a relocation specifying a location that needs to contain a pointer to upperstr
1%
an entry in its symbol table for printbig
3%
a relocation specifying a location that needs to contain a pointer to printbig

Regrade request

Question 5 (4 pt; mean 3.71) (see above)

Ignoring information only present for debugging, we would expect printbig.o to contain ____. Select all that apply.

16%
an entry in its symbol table for upperstr

dropped this option; the way we described the symbol table in lecture and above, no; but Linux (ELF) symbol tables have a way to represent uesd labels that we don't have an address for. (These symbol table entrie can't be used to fix machine code, but help the linker tell what symbols to look for.)
93%
⊤
a relocation specifying a location that needs to contain a pointer to upperstr
93%
⊤
an entry in its symbol table for printbig
9%
a relocation specifying a location that needs to contain a pointer to printbig

Regrade request

quiz for week 3

Question 1 (4 pt; mean 1.49)

Consider the expression

(((x & 0xFF0F) >> 2) & 0xFF0) == (((x & 0xFF0) >> 2) & 0xFF0F)

where x is a 32-bit unsigned integer. The set of values x for which the above expression is true is the same as the set of values for which the expression (x & Y) == 0 is true for some constant Y. What is the value of Y? Write your answer as a hexadecimal number.

Answer:

Key: 0x3330

If x is the bit pattern abcd efgh ijkl mnop (letting letters represent variable values for each bit), then the right hand side is:
(x & 0xFF0F) = abcd efgh 0000 mnop
(x & 0xFF0F) >> 2 = 00ab cdef gh00 00mn
((x & 0xFF0F) >> 2) & 0xFF0 = 0000 cdef gh00 0000

and the left-hand side:
(x & 0xFF0) = 0000 efgh ijkl 0000
((x & 0xFF0) >> 2) = 0000 00ef ghij kl00
((x & 0xFF0) >> 2) & 0xFF0F = 0000 00ef 0000 kl00

So for the two results to be the same the bit pattern 0000 cdef gh00 0000 needs to be the same as 0000 00ef 0000 kl00. This is only true if bits cdghkl are all 0s, so we need a bit mask of 0011 0011 0011 0000.

Regrade request

Consider the following incomplete C code (where ___s represent omitted constants):

const unsigned long A = ___;
const unsigned long B = ___;
unsigned long swapAndFlip(unsigned long x) {
    unsigned long low = x & 0xFF;
    unsigned long high = x >> 56;
    return (((low ^ A) << 56) | high | (x & B);
}

Suppose we want to make this function take a 64-bit integer, and return a new 64-bit integer, such that:

the most significant byte of the new integer is equal to the least significant byte of the original integer with the least and most significant bit of that byte flipped
the least significant byte of the new integer is equal to the most significant byte of the original integer
the rest of the integer is the same as the original integer.

Question 2 (4 pt; mean 3.37) (see above)

What should the value of the constant A be? Write your answer as a hexadecimal integer.

Answer:

Key: 0x81

Regrade request

Question 3 (4 pt; mean 3.25) (see above)

What should the value of the constant B be? Write your answer as a hexadecimal integer.

Answer:

Key: 0x00FF FFFF FFFF FF00

Regrade request

For each of the following pairs of instruction set design choices, identify which is more consistent with the reduced instruction set computer (RISC) design philosophy

Question 4 (2 pt; mean 1.28) (see above)

11%

adding an instruction that finds the length of a string given a starting memory location

violates idea of separate instructions to access memory, requires a loop in one instruction
58%
⊤
adding an instruction that finds the index of the first zero byte in an 8-byte value stored in a register

This seems like a loop, but since it's a fixed, constant number of checks, it's not really one. Also this exposes an operation that could be done efficiently in hardware, and avoids accessing memory multiple times.
30%

both are about as consistent with the RISC design philosophy

Regrade request

Question 5 (2 pt; mean 1.24) (see above)

10%

adding a special "increment register by 1" instruction which needs fewer bytes of machine code than an equivalent "add constant to register" instruction
62%
⊤
providing a "add constant to register" instruction whose machine code is always the same length, no matter how large the constant is
26%

both are about as consistent with the RISC design philosophy

Regrade request

quiz for week 4

For the following questions, consider this Y86-64 machine code. (The machine code is written as a series of bytes in hexadecimal, written in order from lowest to highest memory address.)

30 f8 08 00 00 00 00 00 00 08 60 9a 50 ba 00 00 00 00 00 00 00 00

Question 1 (2 pt; mean 1.96) (see above)

What is the mnemonic for the first instruction in the above machine code snippet?

Regrade request

Question 2 (4 pt; mean 3.7) (see above)

The first instruction in the above assembly snippet has a 64-bit integer constant in it. What is it? Write your answer as a hexadecimal number.

Answer:

Key: 0x0800000000000008

Regrade request

Question 3 (4 pt; mean 3.85) (see above)

What is the mnemonic for the second instruction in the above machine code snippet?

Regrade request

Consider the following HCLRS snippet:

register xY {
    foo : 64 = 0x0;
    bar : 64 = 0x5;
}
x_foo = Y_foo + 1;
x_bar = x_foo + Y_bar;
pc = x_foo;

Question 4 (4 pt; mean 3.18) (see above)

Since the above snippet sets pc, it will read from the instruction memory, setting i10bytes based on the contents of memory (even though the snippet does not use that value). Before the first rising edge of the clock signal, what bytes of memory will be used to construct the value of i10bytes? Identify the bytes by their memory addresses. Select all that apply

Regrade request

Question 5 (4 pt; mean 3.6) (see above)

Initially, the value of the bar register is 0x5. After two rising edges of the clock signal, it will be ___. Write your answer as a hexadecimal number.

Answer:

Key: 0x8

before first rising edge: x_foo = 0 + 1, x_bar = 1 + 5; after 1st: x_foo = 1 (previous x_foo) + 1; x_bar = 2 + 6 (previous x_bar) = 8; after 2nd: Y_foo = previous x_foo

Regrade request

quiz for week 5

Question 1 (6 pt; mean 5.53)

Suppose we start with the mov-to-register CPU we described in lecture which implements the Y86 instructions irmovq, mrmovq, and rrmovq. This processor design had MUXes (represented in HCLRS by case expressions) that chose between several inputs based primarily on what the current instruction was. When adding a new instruction, for some of these MUXes, we can use an existing input and modify when it is selected to include the new instruction. For others, we may need to add a new input to this MUX for the new instruction (or some other logic that achieves this effect). In a few cases, we may even need to add a new a MUX (or other logic that achieves the same effect).

Suppose we added the addq instruction to the mov-to-register CPU. For which of the following parts of the circuit would we need to add a new MUX or a new MUX input (or equivalent logic) rather than modifying the conditions for selecting existing MUX inputs? Select all that apply.

14%
controlling the input to the PC (program counter) register

already have needed cases
89%
⊤
controlling one or more of the ALU's inputs
11%
controlling the memory write enable input (mem_writebit in HCLRS)

already have needed cases
19%
(accepted any answer)
controlling one or more of the register file's "destination register index" inputs (reg_dstE and reg_dstM in HCLRS)

I expected the solution of using the existing reg_dstE input (without changes, so this option would be false), but there is an option to instead use reg_dstM (with a new MUX) and avoid a MUX on reg_inputM (by having it directly connected to the ALU.)
8%
controlling the memory read enable input (mem_readbit in HCLRS)

already have needed cases
67%
⊤ (accepted any answer)
controlling one or more of the register file's "register value to write" inputs (reg_inputE and reg_inputM in HCLRS)

I expected the solution of using modifying the reg_inputE input MUX (to add the option for the value to come from the ALU) so this option would be true. But one could also connect reg_inputM to the ALU directly (which is not an additional MUX) and control whether it is used with reg_dstM

Regrade request

Question 2 (5 pt; mean 3.58)

In Y86, the popq instruction is encoded using 2 bytes. Suppose we made a variant of Y86 that instead encoded popq in one byte, where the high-order four bits of the first byte would be 0xB (like popq is in normal Y86) and the low-order four bits of the first byte would be the register index corresponding to the operand to popq.

Supporting this new encoding would likely ____. Select all that apply.

3%
require a register file with more read ports (ability to read more values at once)
67%
require changes to the logic controlling the register source index inputs to the register file (e.g. reg_srcA and reg_srcB in HCLRS)
68%
⊤
require changes to the logic controlling the register destination index inputs to the register file (e.g. reg_dstE and reg_dstM in HCLRS)
25%
require changes to the logic controlling the register value inputs to the register file (e.g. reg_inputE and reg_inputM in HCLRS)
17%
require changes to the logic controlling the ALU inputs

Regrade request

In the single-cycle processor design discussed in lecture, executing the code

pushq %rax
popq %rbx

involves the following operations:

1. reading the machine code for the pushq instruction
1. reading the value of %rax
1. writing a (non-machine-code) value to memory
1. writing a value to %rsp
1. reading the machine code for the popq instruction
1. reading a (non-machine-code) value from memory
1. writing a value to %rbx
1. writing a value to %rsp

Question 3 (7 pt; mean 5.45) (see above)

Which of the above operations could occur simulatenously or in any order in the single-cycle processor design? Select all that apply

29%
1 and 2
20%
2 and 3

need to read machine code to know to read %rax versus other registers
90%
⊤
3 and 4
15%
4 and 5

not a pipelined processor, so one instruction runs at a time; it's true that the write to %rsp taking effect is triggered by the rising edge of the clock, and the machine code read starting is triggered by the PC changing on the rising edge of the clock. But they aren't really simulatenous --- one finishes as the other starts.
29%
5 and 6

need to read machine code to know use stack pointer to compute memory location to read from
42%
6 and 8
88%
⊤
7 and 8

Regrade request

Question 4 (4 pt; mean 2.22)

Suppose we added the instruction

immovq CONSTANT, DISPLACEMENT(rA)

to the single-cycle Y86-64 processor design we discussed in lecture. This instruction would store the 64-bit value CONSTANT in memory at an address computed by adding the value of the register rA to the 64-bit constant DISPLACEMENT.

While this instruction is executing, the value input to the data memory (called mem_input in HCLRS) will be equal to ______.

12%

one of the outputs of the register file
55%
⊤
part of the output of the instruction memory
4%

the input to the instruction memory
27%

the output of the ALU (arithmetic logic unit)

Regrade request

quiz for week 6

Suppose we are building a two-stage pipelined Y86-64 processor from the following components:

a register file that requires 100 ps for reads and 100 ps for writes
a program counter increment circuit that requires 20 ps to compute the incremented program counter (from the icode and previous program counter)
an ALU that requires 200 ps; and
a data memory that requires 120 ps for reads and 150 ps for writes
an instruction memory that requires 120 ps for reads
pipeline registers and a program counter register that have some small register delay

(and that other components use negligible time to make the following questions simpler.)

Most simply, each our two stages could perform the work of one or more stages from the five-stage pipelined design we discussed in lecture.

Question 1 (4 pt; mean 2.25) (see above)

If the first stage of the processor performs the work for Fetch and the second stage performs the work of Decode+Execute+Memory+Writeback, then the cycle time would be around _____ ps plus pipeline and/or program counter register delays.

Answer:

Key: 520

slowest path: reg read + ALU + mem read + reg write

(3/4ths credit for reg read + ALU + mem write + reg write, but we never use the output of memory writes to do a register write)

Regrade request

Question 2 (4 pt; mean 3.89) (see above)

To minimize the cycle time, these two stages should perform the work of _______ (for the first stage) and _______ (for the second stage).

1%

Fetch / Decode+Execute+Memory+Writeback

520 + reg delay cycle time; first stage: 140 ps = (120 ps + 20 ps) / second stage: 520 ps = (100 ps + 200 ps + max(150 ps, 120 ps + 100 ps))
15%
⊤
Fetch+Decode / Execute+Memory+Writeback

420 + reg delay cycle time; 220 ps = (120 ps + max(20 ps, 100 ps)) / 420 ps = (200 ps + max(120 ps + 100 ps, 150 ps)); meant to change these numbers so there wouldn't be two correct answers but negleceted to properly account for data memory read/write split.
82%
⊤
Fetch+Decode+Execute / Memory+Writeback

420 + reg delay cycle time; 420 ps = (120 ps + max(20 ps, 100 ps + 200 ps)) / max(120 + 100, 150) = 220
1%

Fetch+Decode+Execute+Memory / Writeback

570 + reg delay cycle time; 570 ps = (120 ps + max(20 ps, 100 ps + 200 ps + 150 ps)) / 100 ps

Regrade request

Question 3 (4 pt; mean 2.86)

Suppose a single-cycle processor design has a cycle time of 3000 ps. If this design is modified into an 8-stage pipelined processor (using the same storage components and adding pipeline registers) and this results in a performance improvement, then the new cycle time would be greater than _____ ps and less than _____ ps. Choose the tightest bounds you can infer from the information in the question. Write your answer as two numbers of picoseconds seperated by a comma. (For example, if you think the answer is "greater than 100 ps and less than 200 ps" write "100,200".)

Answer:

Key: 375,3000

with 8 stages, ideally each stage takes around 3000 ps/8 = 375 ps; in the worst case, we'd have one stage that took almost 3000 ps plus we'd have some additional delay from added pipeline registers/etc. — so the result could be worse than the original single-cycle processor. We are told in the question that there's a performance improvement, so we can at least say that the new performance is better than 3000 ps.

Regrade request

Question 4 (4 pt; mean 3.15)

Suppose we construct a pipelined processor by modifying the single-cycle processor design we saw in lecture and that our pipeline processor has two stages:

Fetch and Decode; and
Execute and Memory and Writeback

With these stages, the pipeline registers should contain (at least when certain instructions are being run) ____. Select all that apply.

80%
⊤
the opcode (icode) field from the instruction or equivalent information
60%
⊤
one or more register indices from the instruction
16%
the result of an ALU operation
90%
⊤
the values of registers read from the register file

Regrade request

quiz for week 7

For the following questions, consider the following assembly code:

addq %r8, %rcx                 
mrmovq 8(%rcx), %rax           
subq %rcx, %rax

Question 1 (4 pt; mean 3.51) (see above)

Suppose the above assembly code was executed on a six-stage pipelined processor with the following pipeline stages:

Fetch
Decode
Execute
Memory 1
Memory 2
Writeback

The result of reading the data memory is not available until near the end of the Memory 2 stage. The processor uses a combination of stalling and forwarding to resolve data hazards. It supports all forwarding paths that would not result in a large increase in cycle time.

If the addq instruction completes its fetch stage during cycle 1, then the subq instruction will complete its writeback stage during cycle ___.

Answer:

Key: 10

                      1  2  3  4  5  6

addq %r8, %rcx        F  D  E  M1 M2 W  7

mrmovq 8(%rcx), %rax     F  D  E  M1 M2 W 8  9  10

subq %rcx, %rax             F  D* D* D  E M1 M2 W

Regrade request

Question 2 (4 pt; mean 3.17) (see above)

Suppose the above assembly code was executed on seven-stage pipelined processor, where the pipeline stages are:

Fetch
Decode
Execute 1
Execute 2
Execute 3
Memory
Writeback

In this processor, the result of an ALU operation (addition or subtraction) is not available until near the end of the Execute 3 stage, but the inputs to the operation must be available near the beginning of the Execute 1 stage. This processor uses a combination of stalling and forwarding to resolve hazards. It supports all forwarding paths that would not result in a large increase in cycle time.

If the addq instruction completes its fetch stage during cycle 1, then the subq instruction will complete its writeback stage during cycle ____.

Answer:

Key: 14

                    1  2  3  4  5  6  7

addq %r8, %rcx      F  D  E1 E2 E3 M  W  8  9  10

mrmovq 8(%rcx), %rax   F  D* D* D  E1 E2 E3 M  W  11 12 13 14

subq %rcx, %rax           F* F* F  D* D* D* D  E1 E2 E3 M  W

Regrade request

Question 3 (6 pt; mean 5.47)

Consider the following assembly code:

addq %r8, %r9
mrmovq 16(%r8), %r9
subq %r9, %r10
xorq %r9, %r11
rrmovq %r9, %r12

When the above instruction is executed on a five-stage pipelined processor which uses forwarding to resolve hazards in a way that minimizes stalling like we discussed in lecture, which of the following forwarding operations must occur? Select all that apply

2%
of the value of %r8 from addq to mrmovq
17%
of the value of %r9 from addq to mrmovq

overwritten only
4%
of the value of %r9 from addq to subq

more recent version from mrmovq
98%
⊤
of the value of %r9 from mrmovq to subq
88%
⊤
of the value of %r9 from mrmovq to xorq
16%
of the value of %r9 from mrmovq to rrmovq

rrmovq can read normally instead

Regrade request

quiz for week 8

Consider a 7-stage pipelined processor with the following stages:

Fetch
Decode 1
Decode 2
Execute 1
Execute 2
Memory
Writeback

Assume these stages do the same thing as in the 5-stage processor we discussed in lecture, except that the decode and execute stages are split into two stages. For the split decode stage, the inputs to the original decode stage (like register indices) need to be available near the beginning of the decode 1 stage and outputs of the original decode stage (like register values) are only available near the end of the decode 2 stage. Similar is true for the execute 1 and execute 2 stages.

Assume the processor implements whatever forwarding is possible without dramatically increasing cycle time.

Question 1 (4 pt; mean 3.24) (see above)

Suppose the 7-stage processor is used to execute the following:

     addq %rax, %rcx       
     jle foo             
     rmmovq 8(%rax), %rax
     subq %rcx, %rdx
     halt
foo:
     xorq %r8, %r9
     xorq %r10, %r11
     xorq %r12, %r13
     xorq %r8, %r9
     xorq %r10, %r11
     xorq %r12, %r13

The processor implements jle using branch prediction that predicts the branch as always taken. The actual outcome of the branch prediction is computed during the conditional jump's execute 2 stage (which will use the condition codes updated by the previous instruction's execute 2 stage) and can be used to determine what instruction to fetch in the next cycle (but not earlier).

When the assembly above is executed, the jle is not taken, so some xorq instructions are fetched and then squashed. If the addq instruction is fetched during cycle 1, then the rmmovq instruction will be fetched during cycle ___.

Answer:

Key: 7

Regrade request

Question 2 (4 pt; mean 3.01)

Consider the five-stage pipelined processor design discussed in lecture but with a different branch prediction strategy: instead of predicting conditional jumps as always taken, predict them as taken if they target an address that is less than the jump instruction's and as not taken if they target an address than is greater than the jump instruction's. (This strategy is called "forward not-taken, backwards taken").

In this processor, incorrectly predicted branches require 2 extra cycles, instructions that use a value computed in memory by the previous instruction in the ALU require 1 extra cycle, and returns require 3 extra cycles. (And no other cases require extra cycles due to hazards.)

Suppose the instructions run in a program are:

10% conditional jumps, of which
- 40% are taken and jump to an address that is less than the jump instruction's
- 20% are taken and jump to an address that is greater than the jump instruction's
- 10% are not taken and jump to an address that is less than the jump instruction's
- 30% are not taken and jump to an address that is greater than the jump instruction's
10% instructions that use (for an ALU operation) a value loaded from memory by the previous instruction; and
no ret instructions

About how many average cycles per instruction would the program experience on the processor described above?

Answer:

Key: 1.16

1 + 10% * (20% + 10%) * 2 + 10% * 1 = 1.16

Regrade request

Consider the following C++ snippet:

for (int i = 0; i < 10000; ++i) {
    A[i] += B[i];
    A[i] *= C[i];
}
std::cout << "last A is " << A[9999] << std::endl;

Question 3 (2 pt; mean 1.69) (see above)

The accesses to elements of A will have _____ temporal locality than the accesses to i.

85%
⊤
less
11%

about the same (or there's not enough information)
4%

more

Regrade request

Question 4 (2 pt; mean 1.88) (see above)

Assuming about the same number of bytes of machine code are required for each, the accesses to the machine code for the += and *= statements in the loop will have _____ spatial locality than the machine code for the std::cout statement.

5%

less
40%
⊤
about the same (or there's not enough information)

I think this is the better answer, but understand thinking the future access to nearby is more likely in the other case.
54%
⊤
more

Regrade request

Question 5 (2 pt; mean 1.86) (see above)

Assuming about the same number of bytes of machine code are required for each, the accesses to the machine code for the += and *= statements in the loop will have _____ temporal locality than the machine code for the std::cout statement.

2%

less
3%

about the same (or there's not enough information)
93%
⊤
more

Regrade request

Question 6 (5 pt; mean 4.7)

Suppose the contents of a two-entry direct-mapped cache is as follows:

index (binary)	valid	tag (binary)	data bytes (hexadecimal)
00	1	001	`1F 3F 5F 7F`
01	1	110	`2E 3E 4E 5E`
10	1	001	`33 44 55 8F`
11	1	010	`9E AE BF 01`

(In the table above, data bytes are listed with the lowest address's value first (left-most).)

The byte of memory at address 0x11 (binary: 10001) is stored in this cache and has the value 0x3F.

Bytes from which of the following memory addresses are also stored in the cache shown above? (Note that addresses may be written without all leading 0s.) Select all that apply.

Regrade request

quiz for week 9

Consider a two-way set associative cache with the following contents:

set index	valid	tag	data bytes	valid	tag	data bytes
0	1	0xA20	`1F 3F 2A 93 27 00 45 8A`	1	0xC20	`33 44 55 92 91 90 0A 0C`
1	0	—	—	1	0xC20	`0F 1C 11 12 13 14 15 16`
2	1	0xC20	`20 21 2F 2A 40 41 55 98`	1	0x332	`95 33 12 59 43 3A 3C 3D`
3	0	—	—	0	—	—

In the representation above, the data bytes are listed in hexadecimal, starting with the byte with the lowest memory address.

Question 1 (4 pt; mean 3.1) (see above)

The data byte 1C in the second way of the set with index 1 was retrieved from memory at address _____. Write your answer as a hexadecimal number.

Answer:

Key: 0x18409

concatenate: tag (0xC20 = 1100 0010 0000), index (01), offset (001) = 1 1000 0100 0000 1001 = 0x18409
([update 26 Oct:] original key had the wrong intermediate values, now fixed)

Regrade request

Question 2 (4 pt; mean 3.86) (see above)

Give an example of an address that, if accessed, would result in a value being replaced in the cache above. Write your answer as a hexadecimal number.

Answer:

Key: has to be in set 0 or set 2 and not have tag 0xA20 or 0xC20 (for set 0); or tag 0xC20 or 0x332

has to be in set 0 or set 2 and not have tag 0xA20 or 0xC20 (for set 0); or tag 0xC20 or 0x332 (for set 2)

Regrade request

Question 3 (4 pt; mean 2.3)

Suppose a program accesses a cache in the following pattern:

it reads a 4 byte array A, one byte at a time, from beginning to end; then
it overwrites the same array A, one byte at a time, from end to beginning; then
it repeats the above steps 9 more times; then
it reads a different 4-byte array B, one byte at a time, from beginning to end

Assume the above accesses use a cache that is not also used for other accesses and that the cache is initially empty.

If the cache used for the accesses above is a 4-byte, direct-mapped cache with a write-through, write-no-allocate policy and 2 byte blocks, how many accesses (reads or writes) to the next level of cache (or main memory if there is not next level) will be performed when the above cache accesses happen?

Answer:

Key: 44 if arrays start at addresses that are multiples of 2, which I meant to specify when writing the question, but neglected to. This made the question a bunch more complicated than I intended. If arrays were not "aligned", then 45 accesesses (only B not aligned); 63 accesses (only A not aligned); or 64 accesses (A + B not aligned)

2 reads of A + 40 writes of A + 2 reads of B (if arrays at addreses that are multipes of 2)
additional read of B if address of B is not multiple of 2 (read block contianing byte before array + first byte; then read block of second and third; then read block of fourth byte + byte after array)
if address of A is not multiple of 2, additoinal read for first pass (to read block contining last byte of array + byte after array), plus 2 additional reads for each of 9 following passes (replace block containing last byte with block containing byte before array + first byte; then replace that block with block contiaing last byte again) or 1 + 18 = 19 additional reads

Regrade request

Consider a 4MB (4194304 byte) cache, 4-way cache with 64-byte blocks, a first-in, first-out replacement policy, and a write-back, write-allocate policy.

Suppose the cache is initially empty (all valid bits and dirty bits set to 0), then the following accesses happen:

write to 0x4444000
write to 0x4444008
read from 0x4444000
read from 0x4444800
read from 0x4443320
write to 0x4445000
read from 0x8445000

Question 4 (4 pt; mean 2.6) (see above)

Assuming 64-bit memory addresses, how many bits does the cache have (in total, across the whole cache) available to store tags?

Answer:

Key: 2883584

4194304 / (64 * 4) = 2^14 sets; (64 - 14 index bits - 6 offset bits) * (4194304/64 blocks)

Regrade request

Question 5 (3 pt; mean 2.33) (see above)

After the accesses above, how many dirty bits will be set to 1?

Answer:

Key: 2

Regrade request

Question 6 (3 pt; mean 2.3) (see above)

After the accesses above, how many valid bits will be set to 1?

Answer:

Key: 5

Regrade request

quiz for week 10

Question 2 (4 pt; mean 3.11)

Many caches store tags, valid bits, and other metadata separately from the data. One result of this design is that misses can be detected more quickly, after completing an access to likely faster "tag store".

Suppose a cache using this design:

experiences a 90% hit rate on a benchmark
has a 5 cycle hit time,
takes 2 cycles to detect a hit or miss,
once a miss is detected, starts an access with the next level of the memory hierarchy, which will take 100 cycles

Without the optimization to detect hits or misses early, this cache would have an average memory access time of 5 + 10% * 100 = 15 cycles. What would the average memory access time with this optimization be?

Answer:

Key: 14.7

5 + 10% * (100 - 3 [cycles saved by starting next level faster])

Regrade request

Question 3 (4 pt; mean 2.83)

Suppose we have a 2-way, 2-set data cache with 8 byte blocks and an LRU replacement policy and run the following C code:

int array[12];
... /* code omitted */
int sum = 0;
for (int j = 0; j < 2; j += 1) {
    for (int i = 0; i < 4; i += 1) {
        if (sum > array[i * 3 + j]) {
            sum += array[i * 3 + j];
        }
    }
}

Assume that:

the compiler does not reorder operations in the loop;
only accesses to array in the loops use the data cache;
the data cache is initially empty when the loop starts;
the address of array[0] is a multiple of 1024

How many data cache misses will occur when the loops above run?

[Note that unlike some of the examples in lecture, the number of misses for each of the two sets of the cache may be different.]

Answer:

Key: 6

access pattern is array[0], array[3], array[6], array[9], array[1], array[4], array[7], array[10]

Each cache block has two array elements. Let's say 0+1, 4+5, 8+9 map to set 0; and 2+3, 6+7, 10+11 map to set 1.

access pattern + cache contents identifying array elements by their indices:
0 miss: set 0 {0+1,--}; set 1 {--,--}
3 miss: set 0 {0+1,--}; set 1 {2+3,--}
6 miss: set 1 {0+1,--}; set 1 {2+3,6+7}
9 miss: set 0 {0+1,8+9}; set 1 {2+3,6+7}
1 hit
4 miss: set 0 {0+1,4+5} (8+9 was least recently used in this set since we just accessed array[1]); set 1 {2+3,6+7}
7 hit
10 miss

Regrade request

Consider the following C snippets:

/* version A */
for (int i = 0; i < N; i += 1) {
    for (int j = 0; j < i; j += 1) {
        A[i * N + j] = D[j * N + i] + B[i] * C[j];
    }
}

/* version B */
for (int j = 0; j < N; j += 1) {
    for (int i = 0; i < j; i += 1) {
        A[i * N + j] += D[j * N + i] + B[i] * C[j];
    }
}

Question 4 (4 pt; mean 3.7) (see above)

Which version has better spatial locality for accesses to A?

92%
⊤
version A
3%

version B
4%

they are about the same

Regrade request

Question 5 (4 pt; mean 3.55) (see above)

If the cache can store four array elements in each cache block (for each of the arrays A, B, C, or D) and the cache is too small to store N elements of any of the arrays, approximately how many cache misses would we expect to occur in version A for accesses to the array B only? (N^2 below represents N squared.)

Regrade request

Question 6 (4 pt; mean 1.81) (see above)

If the cache can store four array elements in each cache block (for each of the arrays A, B, C, or D) and the cache is too small to store N elements of any of the arrays, approximately how many cache misses would we expect to occur in version A for accesses to the array D only? (N^2 below represents N squared.)

Regrade request

quiz for week 11

Question 1 (4 pt; mean 3.96)

Consider the following C code:

for (int i = 0; i < N; i += 1) {
    for (int j = 0; j < i; j += 1) {
        A[i] += B[i * N + j] * C[j * N + i];
    }
}

Consider the following attempts at optimizing the above C code:

/* version A */
for (int ii = 0; ii < N; ii += 2) {
    for (int i = 0; i < ii + 2 && i < N; i += 1) {
        for (int jj = 0; jj < N; jj += 2) {
            for (int j = 0; j < jj + 2 && j < i; j += 1) {
                A[i] += B[i * N + j] * C[j * N + i];
            }
        }
    }
}

/* version B */
for (int ii = 0; ii < N; ii += 2) {
    for (int jj = 0; jj < ii + 2; jj += 2) {
        for (int i = 0; i < ii + 2 && i < N; i += 1) {
            for (int j = 0; j < jj + 2 && j < i; j += 1) {
                A[i] += B[i * N + j] * C[j * N + i];
            }
        }
    }
}

/* version C */
for (int i = 0; i < N; i += 1) {
    int j = 0;
    for (; j + 1 < i; j += 2) {
        A[i] += B[i * N + j] * C[j * N + i];
        A[i] += B[i * N + j+1] * C[(j+1) * N + i];
    }
    if (j < i)
        A[i] += B[i * N + j] * C[j * N + i];
}

/* version D */
for (int j = 0; j < N - 1; j += 1) {
    int i;
    for (i = 0; i + 1 <= j ; i += 2) {
        A[i] += B[i * N + j] * C[j * N + i];
        A[i] += B[(i+1) * N + j] * C[j * N + i+1];
    }
    if (i <= j)
        A[i] += B[i * N + j] * C[j * N + i];
}

Assuming N is very large (relative to the cache size), which of the above attempts at such optimizations are most likely to be successful at reducing the number of data cache misses?

11%
⊤
version A
49%
⊤
version B
30%
⊤
version C
9%
⊤
version D

Accepted all answers due to error in this question; meant to write version A and version B with i = ii and j = jj instead of i = 0 and j = 0 (and A[i+1] instead of A[i] in version D); if so than B would be the correct answer.

With this correction, versions A and C would hvae the same access pattern as the original, while version D would hurt the locality because the accesses to A would have worse temporal locality. version B would improve the temporal locality in B without hurting the temporal/spatial locality in the other arrays much.

Regrade request

Question 2 (4 pt; mean 3.74)

Consider the folowing assembly code:

xorq %rbx, %r8
andq %rbx, %r9
subq %r8, %rbx
addq %rbx, %r9

after register renaming, the physical register that andq reads to obtain the value of %rbx will be the same as ____. Select all that apply.

95%
⊤
the physical register that xorq reads %rbx from
90%
⊤
the physical register that subq reads %rbx from
7%
the physical register that subq writes to
5%
the physical register that addq reads %rbx from

Regrade request

Consider the following assembly code:

    xorq    %rax, %rax
outer_loop:
    movq    %rax, %r8           /* A */
    movq    %rax, %rcx          /* B */
    imul    $8192, %r8          /* C */
    addq    %rdi, %r8           /* D */
inner_loop:
    movq    (%rdx,%rcx,8), %r9  /* E */
    movq    (%rsi,%rax,8), %r10 /* F */
    addq    %r10, %r9           /* G */
    movq    %r9, (%r8,%rcx,8)   /* H */
    incq    %rcx                /* I */
    cmpq    $1024, %rcx         /* J */
    jne     inner_loop          /* K */
    incq    %rax                /* L */
    cmpq    $1024, %rax         /* M */
    jne     outer_loop

In this code there are two nested loops. The outer one which uses %rax as its index variable, and the inner one uses %rcx as its index varaiable.

Question 3 (4 pt; mean 2.08) (see above)

In an out-of-order processor with branch prediction and several execution units (such as ALUs and data cache read/write ports) of each type, multiple instances of addq instruction labelled G (from different iterations of the inner loop) can be performed _____.

(Assume this out of order processor can perform loads and stores out-of-order.)

45%

only one at a time
51%
⊤
in parallel
2%

there is not enough information to tell (explain in comments)

Regrade request

Question 4 (4 pt; mean 3.27) (see above)

In order to unroll the inner loop in the code above, one would most likely duplicate some of the instructions above (possibly with changes to where they expect their operands, etc.) and/or convert some of them to a form that performs the same work that multiple copies of the instruction would perform (such as converting an incq %rax into an addq $2, %rax).

In order to unroll the inner loop, which instructions would be so duplicated or transformed? (Instructions are identified by the letters in comments above.)

Regrade request

Question 5 (3 pt; mean 0.89) (see above)

After loop unrolling is performed, the addq instruction labeled G would ______ from a multiple accumulators transformation.

70%

benefit
29%
⊤
not benefit

no sequence of additions

Regrade request

quiz for week 12

Question 1 (4 pt; mean 2.3)

Consider the following assembly function 'satSum':

satSum:
    movq %rdi, %rax
    addq %rsi, %rax
    cmpq %rsi, %rax
    jb retMax
    cmpq %rdi, %rax
    jb retMax
    ret
retMax:
    movq $-1, %rax
    ret

Inlining this function would most likely allow an optimizing compiler to avoid including an instruction very similar to ___. Select all that apply.

16%
⊤
the first movq

probably compiler can modify code below (and around function call) to use %rdi (or original register containing first argument) instead of %rax
33%
the first jb

yes, still need to do this comparison somehow
38%
⊤ (accepted any answer)
the first ret

after inlining, I think the best thing for the compiler to do would be to place the code around the call to satSum will appear below instead of needing a return. This would assume that we place the retMax code elsewhere and replace its ret with a jmp back. An alternative might be replacing the first ret with a jmp to the end of the snippet, and removing the second ret --- in the latter strategy there would be jmp that is arguably very similar to the ret.
16%
the second movq

Regrade request

For the following questions, consider the following C snippet:

for (int i = 0; i < N; i += 1) {
    for (int j = 0; j < N; j += 1) {
        A[i] += (A[i] - B[j]) * C[j];
    }
}

where A, B, and C are declared as int*s and N is declared as an int.

Question 2 (4 pt; mean 1.53) (see above)

On the snippet above, which of the following optimizations cannot be performed without adding extra checks due to aliasing? Select all that apply.

25%
(accepted any answer)
unrolling the inner most loop

I was not as careful at specifying the conditions for N for this option as I should have been.
No problem assuming the compiler can provide N is not pointed to by A, B, C. If no pointers were taken to N, this would be straightforward.
37%
⊤
loading and storing A[i] at most once per iteration of the outermost loop (rather than N times)

updating A[i] could update B[j], C[j]
24%
⊤
swapping the i and j loops to increase temporal locality in B and C

updating A[i] could update B[j], C[j] and changing order in which B[j], C[j] read relative to when A[i] written

Example, suppose N = 2, A = {1,1}, B = {0,0}, A = C
/* for (i) for (j) order */
/* i = 0, j = 0 */ A[0] = 1 + (1 - 0) * 1 = 2
/* i = 0, j = 1 */ A[0] = 2 + (2 - 0) * 1 = 4
/* i = 0, j = 0 */ A[1] = 1 + (1 - 0) * 4 = 5
/* i = 1, j = 1 */ A[1] = 5 + (5 - 0) * 5 = 25
/* versus */
/* i = 0, j = 0 */ A[0] = 1 + (1 - 0) * 1 = 2
/* i = 1, j = 0 */ A[1] = 1 + (1 - 0) * 2 = 3
/* i = 0, j = 1 */ A[0] = 2 + (2 - 0) * 3 = 8
/* i = 1, j = 1 */ A[1] = 3 + (3 - 0) * 3 = 9
33%
⊤
cache blocking to improve cache locality

same type of problem as before

Regrade request

Question 3 (4 pt; mean 2.39) (see above)

On the snippet above, which of the following optimizations would be likely to result in substantially worse performance if N was very small (but still varied between executions of the code above)? Select all that apply.

41%
⊤
unrolling the inner most loop

checks for handling non-multiples of unrolling count is likely to be significant for small N
16%
loading and storing A[i] at most once per iteration of the outermost loop (rather than N times)

still helpful
11%
swapping the i and j loops to increase temporal locality in B and C

basically no effect
25%
⊤
cache blocking to improve cache locality

extra code to handle more complicated indexing

Regrade request

Question 4 (4 pt; mean 2.74) (see above)

On the snippet above, which of the following optimizations would be likely to significantly increase the amount of machine code used to store the above loops? Select all that apply.

49%
⊤
unrolling the innermost loop

more copies of loop body
17%
loading and storing A[i] at most once per iteration of the outermost loop (rather than N times)

probably same number of instructions. Perhaps one or two more mov instructions (though in exchange for simplifying machine code for add in the loop), depending on details.
5%
swapping the i and j loops to increase temporal locality in B and C

probably same number of instructions
47%
⊤
cache blocking to improve cache locality

more code to handle loop indexing logic + extra loops

Regrade request

quiz for week 14

Question 1 (4 pt; mean 2.82)

The vector instructions we discussed in lectures supported loading a vector of values from (or storing a vector of values to) a single location in memory. In addition to these types of vector load and store instructions, some vector instruction sets include support for an instruction that takes a vector of memory locations and fills a vector by loading an element from each of those memory locations (or stores a vector by storing an element to each of those locations). These more featureful vector load and store instructions are typically substantially slower than those that load from (or store to) a single location.

When vectorizing which of the following snippets would these special vector load and store instructions be useful? Select all that apply.

34%
for (int i = 0; i < 8192; i += 1) { A[i] += B[i-1] + B[i] + B[i+1]; }
55%
for (int i = 0; i < 8192; i += 1) { for (int j = 0; j < 8192; j += 1) { A[i*8192+j] = B[i*8192 + j] * C[j]; } }
70%
⊤
for (int i = 0; i < 8192; i += 1) { B[i] = A[T[i]]; }
25%
⊤ (accepted any answer)
for (int i = 0; i < 8192; i *= 3) { A[i] *= B[i]; }

meant to write i = 1 so this wouldn't be an infinite loop

Regrade request

Question 2 (6 pt; mean 5.09)

Consider a single-core system with an operating system that uses time multiplexing (with context switching) to share the processor between multiple processes.

Initially, process A is running the following assembly snippet:

movq $100, %rax
movq $200, %rcx
addq %rcx, %rax

After the second movq and before the addq instruction can complete, the operating system switches to process B. After process B starts running, ____. Select all that apply.

13%
the value 200 is stored in one or more of the processor's registers
92%
⊤
the value 200 is stored somewhere in memory managed by the operating system
34%
the value 200 is stored on the stack of process A
12%
the value 300 is stored in one or more of the processor's registers
17%
the value 300 is stored somewhere in memory managed by the operating system
8%
the value 300 is stored on the stack of process A

Regrade request

Suppose a process attempts to execute the following, whose machine code is located in memory at address 0x200000:

movq $0x100000, %r8
movq (%r8), %r9
addq %r9, %r10

Suppose the memory address 0x100000 is not accessible, so accessing it causes a crash in the second movq instruction.

[editing note: originally the above sentence had some minor typos: it erroneously duplicated "causes a crash" and had the wrong number of 0s in 0x100000]

Question 3 (4 pt; mean 2.86) (see above)

An exception will occur as part of the execution of the above. When it does part or parts of the exception table will be used. These part(s) will contain ____. Select all that apply.

17%
the address 0x100000
51%
the address of the instruction that triggered the exception
83%
⊤
the address of an instruction containing operating system code
30%
the value of %rax (the register generally used for return values)

Regrade request

Suppose an operating system with a single core is running two processes.

Initially process A is active, and it attempts to reads from a file. The data from the file is not yet available because it has not been read from the disk. While it is being read from the disk, process B runs, performing a part of a long computation. Then process A finishes reading from the file.

Question 4 (4 pt; mean 1.86) (see above)

During this procedure, there is a context switch from process A to process B. This context switch most likely occurs ____.

40%

while the exception handler for an interrupt triggered by the disk device is running
12%

while the exception handler for a page fault or protection fault is running
46%
⊤
while the exception handler for a system call is running
2%

while process A is executing a function in the standard library in user mode

Regrade request

Question 5 (4 pt; mean 2.01) (see above)

Suppose process B stops running and process A is restarted almost immediately after the data is retrieved from the file. For this to happen, most likely ___.

5%

process B was running code that checked whether process A could start running again
49%
⊤
the disk device triggered an exception that started the operating system running when the data was ready
45%

the processor still had process A's previous register values stored in a backup set of registers and switched to them when it received a signal from the disk device that the disk read was complete

Regrade request

quiz for week 15

Consider a system with 2048-byte (2 to 11th power bytes) pages, and the following page table:

virtual page #	valid	physical page #
0x0	0	—
0x1	1	0x34
0x2	1	0x35
0x3	1	0x36
0x4	1	0x12
0x5	1	0x10
0x6	1	0x06
0x7	1	0x04

(Since this system has 2048 byte pages, page offsets are 11 bits.)

Question 1 (4 pt; mean 2.94) (see above)

If a program running on this system accesses address 0x1432, then what physical address will be accessed? Write your answer as a hexadecimal number. If an exception will occur instead, write "fault".

Answer:

Key: 0x1ac32

Regrade request

Suppose a system has the following virtual memory configuration:

one-level page tables, stored in memory
4096-byte pages (and therefore 12-bit page offsets),
4-byte page table entries, which have:
- a valid bit as their least significant bit,
- a user-mode-accessible bit as their second least significant bit
- a writable bit as their third least significant bit
- an executable bit as their fourth least significant bit, and
- a 20-bit physical page number as their most significant 20 bits
a page table base pointer set to physical byte address 0x400000 (which has physical page number 0x400)
- (This means that the page table entry for virtual page 0 is at address 0x400000 and the page table entry for virtual page 1 is at address 0x400004 and so on.)
28-bit virtual addresses
32-bit physical addresses

Question 2 (4 pt; mean 2.73) (see above)

When a program accesses a virtual address, the processor reads a page table entry from physical address 0x402000 and then reads a value from physical address 0x201234. What virtual address did the program access? Write your answer as a hexadecimal number. If not enough information is provided, write "unknown" and explain in the comment.

Answer:

Key: 0x800234

Regrade request

Question 3 (3 pt; mean 2.41) (see above)

Suppose a program accesses virtual address 0x1007. The processor reads a page table entry which, when interpreted as 4-byte integer, is 0x00099003 as part of trying to resolve this access. Which of the following is true about this virtual memory access? Select all that apply.

88%
⊤
if the program is trying to write to this address, it will result in an exception
68%
⊤
if the program is trying to jump to code at this address, it will result in an exception
16%
if the program is running in user-mode and trying to read from this address, it will result in an exception

Regrade request

Suppose a system has the following virtual memory configuration:

three-level page tables, stored in memory
4096-byte pages (so page offsets are 12 bits)
8-byte page table entries
page tables with 512 (2 to the 9th power) entries at each level (so each page table at each level is 4096 bytes, and the parts of the virtual page number used for lookups at each level are 9 bits)
a page table base pointer set to physical byte address 0x400000 (which has physical page number 0x400)
39-bit virtual addresses
36-bit physical addresses

Question 4 (4 pt; mean 1.58) (see above)

While executing a move instruction that copies from address 0x123456789A into a register, the processor looks up a first-level page table entry at address 0x400240. That page table entry has its valid bit and user-mode-accessible bit set, and a physical page number of 0x456. From what physical address will the processor try to read the second-level page table entry for this address from?

Remember to account for page table entries being larger than 1 byte.

Write your answer as a hexadecimal number. If not enough information is provided, write "unknown" and explain in the comments. If an exception will occur instead, write "unknown" and explain in the comments

Answer:

Key: 0x456d10

Regrade request