Question 1: Suppose a program has three tasks to perform:
Initially, the program performs each of these tasks one at a time. Which of the following changes will result in the fastest runtime? (Assume doing two tasks in parallel does not make either of the parallel tasks slower.)
Question 2: Which of the following are true about object files?
Select all that apply
Question 3: If the bytes 0x12
, followed by 0x34
, followed by 0x56
, followed
by 0x78
are interpreted as a 4-byte little endian integer, what
value will they have?
Question 4: After the statements char array[3] = {10, 12, 14}; char *ptr = array;
, which of the following C expressions
are true?
Select all that apply
Question 1: After int array[3] = {10, 6, 2}; int *ptr = array;
, which of the following C
expressions are true:
Select all that apply
Question 2: After the declaration:
typedef struct foo {
int x;
} bar;
which of the following C statements declare a instance of the struct?
Select all that apply
Question 3: What is the value of %rax
after running the following? (You may
assume this assembly does not segfault or otherwise crash.)
movq %rsp, %rax
pushq %rax
subq %rsp, %rax
leaq (%rax, %rax, 2), %rax
Question 4: What is the value of %rax
after running the following?
movq $10, %rax
movq $20, %rbx
subq %rbx, %rax
jg skip
addq $10, %rax
skip:
Question 1: Which of the following usually true about RISC-like instruction sets
compared to CISC-like instruction sets?
Select all that apply
Question 2: Consider the following C code:
while (a + b > 0) {
b += foo();
}
If a
stored in %r12
and b
is stored in %rbx
, which
of the following are valid compilations of this code? (Note
that both %r12
and %rbx
are callee-saved registers.)
Select all that apply
Question 3: Which of the following are valid Y86-64 assembly?
Select all that apply
Question 4: Which of the following are attributes of an instruction set?
Select all that apply
Question 1: Which of the following are true about combinatorial circuits?
Select all that apply
Question 2: In the registers described in section 4.2.5 of our textbook, if a register is outputting the value a, and its input changes from a to b, the output will change:
Question 3: In the register file described in section 4.2.5 of our textbook, changes to the read register ID input will cause changes to the corresponding register value being outputted by the register file:
Question 4: According to the description of conceptual "stages" described in our textbook for a Y86-64 processor in section
4.3.1, which of the following instructions may need to perform work in the "execute" stage?
Select all that apply
Question 5: Our textbook's description of the conceptual "write back" stage for the Y86-64 processor notes that the
processor may write "up to two results to the register file". Which instructions require two
values to be written to the register file?
Select all that apply
Question 1 (0 points): If x
and y
are positive unsigned ints, then which of the following C expressions
are always true?
Question 2: If x
is a signed short on our lab machines (two bytes, two's complement), then which
of the following are possible results of x >> 3
?
Select all that apply
Question 3: Given a signed integer x
, which of the following C expressions results in
x
with the first and third least significant bits cleared?
Select all that apply
For these questions, consider the processor "state" components we discussed in lecture --- the register file, ALU, instruction memory, data memory and PC register --- and the single-cycle processor designs we discussed based on connecting these with MUXes.
Question 4: (see above) Given the processor components we discussed, a processor executing the addq
instruction
would update the value stored in the result register in the register file ________
when it updates the value stored in the program counter register.
Question 5: (see above) If we were building a processor to implement the Y86 jmp
, addq
and mrmovq
instructions only,
which of the following inputs would need to be controlled by a MUX or similar logic
depended on the instructions icode
?
Select all that apply
Question 1: In HCL (our flavor or the one described in the book), for which of the following values of x and y is the result 42?
[ x > y : 41 ; x > y + 2 : 42 ; x + y < 0 : 43 ; 1 : x + y ]
Select all that apply
Question 2: Suppose you are debugging a processor you wrote in HCLRS and discover that %rax
is being written
by an instruction that should not modify any register.
Which of the following are likely causes of this bug?
Select all that apply
Question 3: In the processor design in the book, some instructions require a constant that is not
part of the instruction's machine code to be fed to the ALU. These include:
Select all that apply
Question 4: During the execution of the ret
instruction in the processor design discussed in the book, the input to the program counter register should
be equal to:
Question 5: During the execution of a popq %rax
instruction in the processor design discussed in
the book, which operations occur significantly before the value of %rsp
stored in the register file
changes?
Select all that apply
Question 1: The textbook's processor design performs an addition by 0 in the ALU for irmovq
instruction to produce the value for the register file to write. What
is an alternative to this design?
Question 2: Our textbook divides its single-cycle processor design into six stages. What is true about
these stages?
Select all that apply
Question 3: During execution of popq
instruction, the address that is passed to the data memory
is equal to one of the outputs of
Question 4: Given the following HCLRS code
register xY {
a : 64 = 0;
b : 64 = 0;
}
x_a = Y_a + Y_b;
x_b = Y_b + 1;
During the initial clock cycle Y_a
and Y_b
are 0. What is the value of
Y_a
after three rising clock edges?
Question 1: A circuit that can be easily broken into multiple, approximately parts that take approximately equal amounts of time is a ________ candidate for pipelining than a circuit that can be easily broken into one complex, slow part and many fast, simple parts.
Question 2: Adding pipelining to a processor will usually ________ the amount of time it takes to execute a single function of several hundred instructions.
Question 3: Adding pipelining to a processor will usually ________ the amount of time it takes to execute a single instruction.
Question 4: Which of the following usually increases as the number of pipeline stages
in a procesor design increases?
Select all that apply
Question 1: Consider the following Y86 assembly snippet running on the PIPE processsor described in our textbook:
andq %rax, %rbx
subq %rcx, %rdx
rrmovq %r8, %r9
xorq %r10, %r11
When the andq
instruction is in its memory stage, then the
rrmovq
instruction is in the ______ stage.
Question 2: Consider the following Y86 assembly snippet running on the PIPE processor described in our textbook:
andq %rax, %rbx
call foo
xorq %r10, %r11
halt
foo:
subq %rcx, %rdx
ret
When the xorq
instruction is in its fetch stage, then the
subq
instruction is in the _____ stage.
Question 3: Assume we are executing a program on the PIPE processor design described
in the textbook.
If one instruction in this program writes to the register %rax
and a later instruction
in the program reads that value from %rax
, then
Select all that apply
Question 4: Which of the following statements are true about forwarding, stalling, and branch prediction are true?
Select all that apply
Question 5: When branch prediction predicts the wrong instruction,
Select all that apply
Question 1 (2 points): Which of the following assembly snippets exercises a data hazard in the PIPE processor design described in our textbook?
Select all that apply
Question 2: If we want to squash the instruction currently in the execute stage, what should we do in HCL?
Question 3: Consider the following assembly code:
mrmovq 0(%rbx), %rcx
jne foo
addq %rcx, %rdx
...
foo:
On the PIPE processor described in the textbook and in lecture
(that uses forwarding and branch prediction), if the jne
is not taken (control goes to addq
),
the addq
instruction will be fetched when the
mrmovq
instruction is in the ____ stage.
Consider splitting up the "decode" and "writeback" stages of our PIPE processor in order to accommodate a slower, internally pipelined register file. This register file accepts register numbers to read every cycle, and outputs a result near the end of the following clock cycle. It also accepts register numbers and value pairs to write every cycle, but doesn't complete the write until the end of the following clock cycle.
In the resulting processor, the pipeline stages will be:
For one instruction to read a value another wrote from the register file, the Writeback 2 stage of the writing instruction needs to complete by the time the reading instruction starts its Decode 1 stage.
Question 4: (see above) Suppose this modified processor is executing the following assembly snippet:
irmovq $0x1234, %rax
mrmovq 4(%rax), %rbx
subq %rbx, %rcx
If this processor resolves hazards in this assembly with stalling only, how many cycles of stalling will be required?
Question 5: (see above) Suppose this modified processor is executing the following assembly snippet:
irmovq $0x1234, %rax
mrmovq 4(%rax), %rbx
subq %rbx, %rcx
If this processor resolves hazards in this assembly with stalling and forwarding, how many cycles of stalling will be required?
Question 1: Which of the following are examples of temporal locality?
Select all that apply
Question 2: Which of the following are examples of spatial locality?
Select all that apply
Question 3: If a cache line is storing the values in memory around address X, then the tag in this cache line contains
Question 4: In order to identify whether a memory access to the byte at address X hits in a cache, it is necessary to access:
Select all that apply
In section 6.4.7, our textbook talks about miss rate, hit time and miss penalty. Together these determine the effective performance of a cache.
Question 5: (see above) Which of the following are likely to be improved by switching from a direct-mapped
to set-associative cache of the same size?
Select all that apply
Question 6: (see above) Which of the following are likely to be improved by
decreasing the size of the cache (but leaving the block size and associativity the same)?
Select all that apply
Consider a 1.5 MB 3-way set associative cache with 64B cache blocks, an LRU replacement policy, and a write-allocate, write-back policy.
Question 1: (see above) Values for which of the following addresses would be stored
as part of the same cache block as 0x12345678
in this cache?
Select all that apply
Question 2: (see above) Which of the following addresses have the same tag in this
cache as 0x12345678
in this cache?
Select all that apply
Question 3: (see above) How many sets are in this cache?
Consider a program that reads a byte at each of the following addresses in the following order:
Question 4: (see above) How many cache misses will a 32B direct-mapped cache with 16B blocks experience on this access pattern, assuming the cache is initially empty?
Question 5 (0 points): (see above) How many cache misses will a 4-way set associative 32B cache with 2B blocks experience on this access pattern, assuming the cache is initially empty?
Question 6: Which of the following cache design changes increase the amount of metadata (non-data storage) required
for a cache?
Select all that apply
Question 1: In section 5.1, the textbook discusses how memory aliasing can prevent compilers from performing
some optimizations. Which of the following optimizations might be prohibited
due to the possiblity of memory aliasing? Assume that pa
and pb
and declared as
unsigned long *
s, and x
and y
are declared as a unsigned long
s.
[Dropped two options because the textbook says "two pointers point to the same place" rather than a more general description of aliasing. Also, the compiler can observe that the two droppped options would be an infinite loop if there were aliasing and still optimize them. (C allows compilers to remove infinite loops without side effects.)]
[I should have included setting y
or *pb
to 0 on the options below, too, though I think it
was clear enough for most for the remaining options.]
Select all that apply
Question 2: Which of the following optimizations are likely to increase the size of an executable?
Select all that apply
Question 3: Suppose I try to optimize code by using function inlining, but discover that the resulting
code is slower. Which of the following are likely causes of this?
Select all that apply
Question 4: Section 5.7.2 and FIgure 5.12 indicates that that the Intel Haswell microarchitecture has
a fully pipelined functional unit that can issue one multiplication per cycle, but with a
3 cycle latency. Given this, how many cycles does it take to use this
functional unit to compute (x * y) * z
? (Give the total latency of the computation.
Assume no extra time is required to provide inputs to or read outputs from the functional unit.)
Consider the following three pieces of C code:
/* version 1 */
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
A[i * N + j] = B[j * N + i];
/* version 2 */
for (int j = 0; j < N; ++j)
for (int i = 0; i < N; ++i)
A[i * N + j] = B[j * N + i];
/* version 3 */
for (int ii = 0; ii < N; ii += 64)
for (int j = 0; j < N; ++j)
for (int i = 0; i < ii + 64; ++i)
A[i * N + j] = B[j * N + i];
[Version 3 was meant to start the innermost loop from ii
. Given that, version 2
would have better spatial locality in B. Without that, it's ambiguous whether
version 2 has better spatial locality in B in than version 3. Also, given that version 3 has better temporal locality in A
.]
Assume N
is a large multiple of 64.
Question 1: (see above) In which version do accesses to A have the best temporal locality?
Question 2: (see above) In which version do accesses to A have the best spatial locality?
Question 3: (see above) In which version do accesses to B have the best spatial locality?
Consider the following C code:
long array[1024 * 1024];
for (int x = 0; x < 10; ++x)
for (int i = 0; i < 1024 * 1024; i += 1024)
array[i] += 1;
Assume that all variables but array
are placed into registers
and that array is placed in memory such that the address of
array[0]
is a multiple of the cache block size (i.e. all of
its block offset bits are 0).
Also, assume that long
s are 8 bytes and
all caches are initially empty.
[The numbers in parenthesis were meant to correspond to the numbers before, but due to some silly mathematical errors, one did not.]
Question 4: (see above) How many data cache misses will this C code experience on a 64KB (64 * 1024 byte) direct-mapped cache with 64B cache blocks?
Question 5 (0 points): (see above) [Dropped, because I needed to specify the replacement policy for you to answer this question.]
How many data cache misses will this C code experience on a 1.5MB (1536 * 1024 byte) 3-way set associative cache with 64B cache blocks?
Question 6: Which of the following optimizations are often difficult for compilers to do because of
aliasing?
Select all that apply
Question 7: Which of the following optimizations reduce the number of instructions a program executes?
Select all that apply
Question 1: Using multiple accumulators increases performance by
Select all that apply
Question 2: Which of the following assembly snippets should be faster on a modern processor (like that described in section 5.7.1-2 of the book)?
In section 8.1.2, our book classifies hardware exceptions into four categories:
For this question, indicate what kind of exception each scenario is, based on the book's terminology. None of the below are aborts, so we have omitted that answer.
Question 3: (see above) If a process requests the operating system read some data from a file into the process's memory, this will trigger what kind of exception?
Question 4: (see above) If a process accesses memory that is not valid, this will trigger what kind of exception?
Question 5: Each process has:
Select all that apply
Question 6: Exception handlers (of the kind discussed in our textbook) are:
Mutlquestion Consider the following code:
u = a * b
v = c * d
w = u + v
x = a + b
y = x + c
z = y * d
Question 1: (see above) If this is run on a processor with the following functional units:
Then, how long should the computations take? (Assume there is no penalty for forwarding the results of a functional unit to another calculation.)
Question 2: Which of the following are helpful properties for a function to have if a programmer or
compiler is going to have it take advantage of vector instructions?
Select all that apply
Question 3: If process A performs a system call and the exception handler for the system call performs a context switch to process B, then the context of process A is stored where?
Question 4: The exception table contains:
Question 5: Which of the following are likely to run in kernel mode?
Select all that apply
Question 6: Which of the following are part of a process's context?
Select all that apply
Question 1: Which of the following are ways in which Linux's signals are
similar to exceptions?
Select all that apply
Question 2: Some Linux signals correspond to hardware exceptions. When the corresponding hardware exception occurs and a process previously registered a signal handler for the signal,
Question 3: If values are part of the same virtual page, then, if the values are loaded in memory, the values are
Question 4: When virtual memory is used for caching, what entity is responsible for loading data in response to a cache miss?
Question 5: A page table entry contains
Select all that apply
Question 1: A typical operating system may disable interrupts
Question 2: Signal handlers
Select all that apply
Consider a processor which has:
Question 3: (see above) How large is a physical page number on this processor?
Question 4: (see above) Which of the following virtual addresses are part of the same virtual page as
0x12345678
on this processor?
Select all that apply
Question 5: Suppose a processor has 4096-byte pages and 4 byte page table entries,
and its page table base register is set to physical byte address 0x1234000
.
Assume the processor uses the scheme we discussed
in lecture to store its page tables as a contigous array of page table entries in memory.
At what physical address will the processor look for
the page table entry for the virtual page number 3?
Question 1: When a page fault occurs, and the operating system loads the virtual page that caused the fault from disk and updates the page table accordingly (the scenario depicted in Figure 9.13(b)), it will
Question 2: In a multi-level page table like described in section 9.6.3, the page table entries for the first level contain
Question 3: Multi-level page tables are most efficient at representing
Question 4: The Core i7 address translation described by the textbook uses a four-level page table. When the processor successfully fetches an one-byte instruction using this page table, that can require up to ___ memory accesses.
Question 1: The TLB index is computed from
Question 2: When the processor tries to access a memory address X and a TLB miss happens, in the design described in the book, the MMU
Question 3: In the design described in the example in section 9.6.4 of the book,
when performing a memory access the result of the TLB access is needed to
Select all that apply
Consider a system with:
Question 1: (see above) When the processor is accessing the virtual address 0x00CAFECAFE
, what is the index
of the TLB set being accessed? (The index of the first TLB set is 0
.)
Question 2: (see above) If the page table base register contains the byte address 0x100000
, then where will
the processor look for the first-level page table entry for the virtual address
0x0012345678
Question 3: (see above) Suppose a process has the following virtual page numbers allocated in its page tables:
0x10
0xFF
0x5000
0x5001
All other pages are invalid. How many pages of page tables does this process need on this processor?
Question 4: (see above) Suppose the system wants to have an L1 cache organized based on physical
addresses (so, e.g., a given physical address always maps to the same set),
but lookup the appropriate set of the L1 cache before it
completes the TLB lookup.
Which of the following L1 data cache designs would allow this?
Select all that apply