Question 1: Suppose a program has three tasks to perform:
Initially, my program performs each of these tasks one at a time. Which of the following change will result in the fastest runtime? (Assume doing two tasks in parallel does not make either of the parallel tasks slower.)
Question 2: If the bytes 0x12
, followed by 0x34
, followed by 0x56
, followed
by 0x78
are interpreted as a 4-byte little endian integer, what
value will they have?
Question 3 (0 points): If a program compiles as C with a C compiler and is strictly standards-conformant C, it will ___ compile as C++ with a standards-conformant C++ compiler.
Skim chapter 1, and review figures 3.2, 3.3, and 3.28 and answer the following questions.
Question 1: When we compile a C program with multiple source files, we can produce object (.o
on Linux) files,
which we combine into the final program. What is the process of combining these files into an executable
called?
For these questions, look at figure 1.13 and consider the following program:
static int y = 42;
int main(void) {
int x;
scanf("%d", &x);
printf("Input was %d\n", x);
printf("Input + 42 is %d\n", x + y);
}
Question 2: (see above) If the variable x
is placed in memory, what region of memory would it most likely
be in?
Question 3: (see above) If the variable y
is placed in memory, what region of memory would it most likely
be in?
Question 4: On 64-bit x86, if a function void foo(int w, int x, int y, int z)
is called using foo(1, 2, 3, 4)
,
then where is the value 4
placed?
Question 5: On 64-bit x86, if %rax
contains 0x1000
and %rdx
contains 0x1
, then, in AT&T syntax, what
does 0x10(%rax,%rdx,4)
represent?
Quiz on second week's material.
Consider the C program main.c
#include <stdio.h>
void sayHello(void) {
printf("Hello, World!\n");
}
int main(void) {
sayHello();
}
Suppose it is turned into an executable main
using the following steps:
gcc -O -S main.c -o main.s
gcc -c main.s -o main.o
gcc main.o -o main.exe
Question 1: (see above) Which of the following files could contain the memory address of the function sayHello
?
Select all that apply
Question 2: (see above) Which of the following files contains the string "Hello, World!"?
Select all that apply
Question 3: Given the following code, which of the following are valid variable declarations?
typedef struct bar {
int x;
} foo;
Select all that apply
Question 4: What does the following code output?
int x = 0;
int y = 1;
foo:
while (y < 5) {
y += 2;
x += 1;
if (y == 6)
goto bar;
}
y -= 5;
goto foo;
bar:
printf("%d\n", x);
Consider the following variable declarations and initializations:
int foo[32];
int* bar = foo;
Question 5: (see above) Which of the following pairs of statements are equivalent?
Select all that apply
Skim for 2150 topics you do not remember in chapter 2. Skim sections 3.6.7, 3.7, and figures 3.1, 3.2, and 3.3.
The textbook presents the following two implementations of factorial
fact_do
and fact_while
in Figures 3.19 and 3.20:
fact_do: fact_while:
movl $1, %eax movl $1, %eax
L2: jmp L5
imulq %rdi, %rax L6:
subq $1, %rdi imulq %rdi, %rax
cmpq $1, %rdi subq $1, %rdi
jg L2 L5:
rep; ret cmpq $1, %rdi
jg L6
rep; ret
(rep; ret
is an alternate form of ret
, which is equivalent for
the purposes of this course.)
Question 1: (see above) If the function fact_do
above is invoked with the argument 4, how many times
is the cmpq
instruction executed?
Question 2: (see above) If the function fact_while
above is invoked with the argument 4, how many times
is the cmpq
instruction executed?
Question 3: Figure 3.35 in the text presents a recursive implementation of factorial:
rfact:
pushq %rbx
movq %rdi, %rbx
movl $1, %eax
cmpq $1, %rdi
jle .L35
leaq -1(%rdi), %rdi
call rfact
imulq %rbx, %rax
.L35:
popq %rbx
ret
Calling this function with an argument of 0
requires 16 bytes of stack space,
including the space for the return address used by the final ret
. How much stack
space does calling it with an argument of 3
require?
Question 4: Which bit sequence is the 6-bit 2's complement representation of the decimal number -16?
Quiz on three week's material.
Question 1: Consider the following code:
unsigned int v; // input
int f; // output
f = v && !(v & (v - 1));
What does the output f
tell us about the input v
?
Question 2: Consider the following code:
int x; // input 1
int y; // input 2
int r; // output
r = y ^ ((x ^ y) & -(x < y));
What does r
represent here? Hint: -1
is represented in binary by all 1
's, and a < b
returns 0 or 1.
Question 3: What is the value of the address computed by (%rdx,%rcx,4)
, if %rdx is 0xf000
and %rcx is 0x0100
?
Question 4: Code segment 1:
push %ebx //store ebx value to stack
subq $10, %ebx
pop %ebx //restore ebx value from stack
Code segment 2:
push %ebx //store ebx value to stack
cmpq $10, %ebx
pop %ebx //restore ebx value from stack
What is the difference between code segment 1 and 2?
Read sections 4.1 through 4.3
Consider the instruction pushq %rax
on Y86-64.
Question 1: (see above) How many program registers (e.g. %rax
, %rbx
, etc.) are written by this instruction?
Question 2: (see above) How many program registers (e.g. %rax
, %rbx
, etc.) are read by this instruction?
Question 3: In order for a Y86-64 processor
to determine the length of the currently executing instruction in memory, what information
might it need?
Select all that apply
Question 4: Which of the following operations that can be
done in one instruction on X86-64
can NOT be done in one instruction on Y86-64?
Select all that apply
Question 5: Consider a register (of the kind our textbook describes)
which takes as input value I
and a clock signal, and which outputs a value O
.
Which of the following statements is true?
Quiz on three four's material.
Question 1: Each of the following statements about Y86-64 are true. Which
of them is an attribute that makes Y86-64 more RISC-like?
Select all that apply
Question 2: What Y86-64 assembly is equivalent to cmovne %rax, %rbx
?
Select all that apply
Question 3: What determines the length of one cycle in the single cycle microarchitecture?
Select all that apply
Question 4: After the decode
stage what do we not know about the instruction?
Select all that apply
Read sections 4.3.2 and 4.3.4. Review sections 4.2.2-4 and read the HCL2D document, sections 2 and 3.
Question 1: Consider the instruction subq rA, rB
. This instruction performs R[rB] <— R[rB] sub R[rA]
. The instruction passes through the five stages (Fetch, Decode, Execute, Memory, Writeback). Which (if any) stages are not involved in handling this instruction?
Question 2: Consider the instruction rmmovq D(rB), rA
. This instruction performs EA <— D + R[rB]; MEM[EA] <— R[rA]
. The instruction passes through the five stages (Fetch, Decode, Execute, Memory, Writeback). In which stage does the hardware calculate the effective address EA (ValE in the book)?
Question 3: Consider the instruction mrmovq D(rB), rA
. This instruction performs EA <— D + R[rB]; R[rA] <— MEM[EA]
. The instruction passes through the five stages (Fetch, Decode, Execute, Memory, Writeback). In which stage does the hardware retrieve the value stored in register rB?
Question 4 (0 points): What is the result of the following HCL (or HCL2D) expression if x
is 10 and y
is 5?
[
y in {4,6} : x + y,
y <= 5 : x - y,
1 : 0,
]
Quiz on week five's material.
Question 1: In the single-cycle processor design we discussed in class, one
option was for the memory address input to the data memory
to come directly from the single ALU. For which of
the following memory-accessing
instructions was this not done because the ALU was used
for something else?
Select all that apply
Question 2: In the single-cycle processor design we discussed in class,
some instructions do not use the memory stage. What is true
about these instructions?
Select all that apply
Question 3: In the single-cycle processor design we discussed in class,
from which sources could the input of the program counter register come?
Select all that apply
Question 4: Which of the following are illegal in HCL2D?
Select all that apply
No quizzes for week 6; good luck on the exam!
Read sections 4.4 through 4.5.4
Question 1: Adding pipelining to a previously unpipelined processor ______ the latency of instructions.
Question 2: Adding pipelining to a previously unpipelined processor ______ the throughput of instructions
Question 3: In the pipelined version of the Y86-64 processor, which values are passed in
pipeline registers between the fetch and decode stage?
Select all that apply
Quiz on week seven's material.
Question 1: Consider the pipelined Y86-64 implementation we discussed in class. Suppose the register delay is 10 ps and the length of the critical path through memories and combinatorial logic in each stage is:
What is the minimum clock cycle time this processor could have and still operate correctly?
Question 2: Consider the following Y86-64 assembly snippet:
mrmovq 4(%r11), %r10
addq %rax, %rbx
subq %rcx, %rdx
andq %rsi, %rdi
xorq %rsp, %rbp
irmovq $10, %r9
In our pipelined implementation,
when the subq
instruction is in its decode stage, what stage is
the mrmovq
instruction in?
Question 3: For the pipelined Y86-64 design we discussed in class, for which of the following instruction
sequences is there a data hazard?
Select all that apply
Question 4: For the pipelined Y86-64 design we discussed in class, for which of the following instruction
sequences is there a control hazard?
Select all that apply
Read sections 4.5.5, 4.5.8, 4.5.10, 5.7-5.7.2
Question 1: Using our textbook's notion of "stall" and "bubble" signals, if a pipeline register's input is receiving a stall signal of 1 and a bubble signal of 0, then its output after the next rising edge of the clock will be:
Question 2: Our textbook talks about both data dependencies and data hazards.
Which are differences between these?
Select all that apply
Question 3: We can perform forwarding of %rax
to implement
irmovq $15, %rax
addq %rbx, %rax
without any stalling. Suppose the addq
instruction receives the forwarded
value for %rax
from irmovq
in its decode stage. Where can it retrieve
(i.e. forward) this value from?
Question 4: Figure 5.12 indicates that the integer multiplication functional units on the Intel "Haswell" microarchitecture have an issue time of 1 cycle and a latency of 3 cycles. How long does it take one of these functional units to perform two integer multiplications from when the first multiplcation is started until the last finishes?
For each of the following sequences of instructions, indicate how many cycles of pipeline stalls occur in the pipelined Y86-64 implementation we discussed in class.
Question 1: (see above) :
popq %rax
addq %rax, %rbx
popq %rax
rmmovq %rax, 8(%rbx)
Question 2: (see above) : (accepted both 0 and 3 --- depending on whether you counted stalls "during" ret
)
xorq %rax, %rax
je foo // always taken
foo: subq %rbx, %rax
ret
Question 3: Consider a pipelined Y86-64 processor with six stages, resulting from splitting the memory stage into two parts. That is, the processor has the following stages:
Suppose the results of any data memory load or store is not available until near the end of the second
memory stage. On this processor, even after implementing all possible forwarding, which of the following
instruction sequences would require a pipeline stall to resolve hazards:
Select all that apply
Question 4: IBM 360/91 had out-of-order instruction execution and completion but no support for precise exceptions. Suppose you were writing a floating point program in this machine. Which statement is true?
Select all that apply
skim section 6.1.1 and read 6.2-6.3
Question 1: Which component in the memory hierarchy is the fastest?
Question 2: Which component in the memory hierarchy is non-volatile (can retain data without power)?
Question 3: Suppose the memory access pattern of your program looks like this: a[1], a[2], a[3], a[7], a[8], a[9], a[13], a[14], a[15]
. What kind of locality correctly describes this access pattern?
Question 4: Suppose the memory access pattern of your program looks like this: a[1], b[1], c[1], a[1], b[1], c[1]
. What kind of locality correctly describes this access pattern?
Question 1: Which of the following are true about programs in the data-flow model?
Select all that apply
Question 2: On an out-of-order processor, which of the following can happen while
the processor is completing a very long load (mrmovq
) instruction?
Select all that apply
Question 3: Suppose a level-1 cache has an access time of 2 nanoseconds and a 95% hit rate, and the corresponding level-2 cache has a preceived access time of 10 nanoseconds. What is the preceived access time of the level 1 cache?
Question 4: Assume that you are designing a cache for low-power sensor nodes. It measures the environment temperature every minute and sends the average temperature from the last 24 hours to the base station. This sensor requires the caches to be extremely power-efficient. Which design decision would you make?
Section 6.5
Suppose your program sums up all elements of an array. Each element of the array takes 4 bytes of space. Your cache block size is 64B. Initially your cache is empty and your first access v[0]
misses the cache.
int sum = 0;
for(int i = 0; i < N; i += 1)
sum += v[i];
Question 1: (see above) Which of the following statements are true?
Select all that apply
Question 2: (see above) What kind of locality this access pattern is exploiting in the cache?
Question 3: (see above) If N
is set to be 100, how many misses this code would see while accessing the array v
?
Question 4: (see above) If we operate on every even element in the array, the code becomes:
int sum = 0;
for(int i = 0; i < N; i += 2)
sum += v[i];
The cache parameters remain the same. Which of the following statements are true?
Select all that apply
Consider a 1.5MB 3-way set-associative cache with 64 byte cache blocks which uses a true LRU (least recently used) replacement policy.
Question 1: (see above) How many sets does this cache have?
Question 2: (see above) The byte at address 0x654321 will be stored in the same cache block as the byte at
Select all that apply
Question 3: (see above) The byte at address 0x654321 will be stored in the same cache set as the byte at
Select all that apply
Question 4: Which of the following techniques are likely to reduce the number
of conflict cache misses programs experience, assuming
the cache size remains fixed?
Select all that apply
Question 5: Switching a cache from a write-through to write-back policy is likely
to have which of the following effects on the cache?
Select all that apply
Section 5.1-2, 5.4-6, 5.8-11, skim 5.14
Question 1: Consider the following two functions:
void sumArray1(int *pSum, int *array, int n) {
for (int i = 0; i < n; ++i)
*pSum += array[i];
}
void sumArray2(int *pSum, int *array, int n) {
int temp = 0;
for (int i = 0; i < n; ++i)
temp += array[i];
*pSum += temp;
}
Suppose an optimizing compiler generates much slower code
for sumArray1
than sumArray2
. What is a likely cause
of this?
Select all that apply
Question 2: Which of the following optimizations the
textbook discusses are likely to substantially increase
machine code size?
Select all that apply
Question 3 (0 points): (Question dropped.) The textbook discusses how a loop like:
for (int i = 0; i < N; i += 2) {
acc = acc * (a[i] * b[i]);
acc = acc * (a[i+1] * b[i+1]);
}
is usually faster than one like:
for (int i = 0; i < N; i += 2) {
acc = (acc * a[i]) * b[i];
acc = (acc * a[i+1]) * b[i+1];
}
(Assume the compiler obeys the explicitly specified order of operations.) Which of the following are different about how a modern out-of-order processor will execute the two loops?
The first answer was meant to say acc * (a[i] * b[i])
and acc * (a[i+1] + b[i+1])
to
make the question unambiguous. This is why the question was dropped.
However, it's still the case the processor can execute a[i+1] * b[i+1]
from the next iteration
of the loop in parallel with acc * (a[i+1] * b[i+1])
from the previous.
Select all that apply
Question 4: Which of the following transformation our textbook discusses likely to decrease the number of instructions a program executes? (This should have been a select-all, but since it wasn't, either of the correct answers gets full credit.)
Question 1: Some of the optimizations we talked about compilers do not automatically
perform aggressively because they can make code much slower when done
excessively. Which are examples of these?
Select all that apply
Question 2: Which of these are a way to eliminate aliasing so a compiler can perform more optimizations?
Question 3: Using mulitple accumulators improves performance by
Question 4: In the example in lecture, trying to use too many accumulators resulted in lower performance than using fewer (with the same amount of loop performance). This was most likely because of
Section 8.1-8.3
Question 1: Suppose that your program reads a file from the disk using the fread
function. The library you are using implements this fread
function using a system call referred to as read
. How would you characterize the exception caused by read
?
Question 2: Which of the following events will initiate an interrupt?
Select all that apply
Question 3 (0 points): Which of the following events will initiate an abort?
Select all that apply
Question 4: Suppose that your program writes to a read-only page. Which of the following exceptions will occur in this case?
Question 1: Each of the following is either synchronous or asynchronous. Which are synchronous?
Select all that apply
Question 2: We have seen this code in our lecture.
void fork5()
{
printf("L0\t”);
if (fork() == 0) {
printf("L1\t”);
if (fork() == 0) {
printf("L2\t”);
}
}
printf("Bye\t”);
}
Which of the following are infeasible?
Select all that apply
Question 3: Which of the following are true about signal handlers?
Select all that apply
Question 4: Which of the following are true about signals?
Select all that apply
Section 9.3-9.6, skim 9.7
Question 1: Which of the following statements are false?
Select all that apply
Question 2: Virtual memory can be thought of as a mechanism that caches
virtual pages in the physical memory. A virtual page can be stored at any
physical page frame. Which of the following statement is true?
Question 3: If we need 32 bits to represent a virtual address and our page size is 4KB, how many page table entries (PTE) do we need?
Question 4: A page table is a map of <vpn, ppn>
, where vpn is a virtual page number and ppn is a physical page number. The process of generating the appropriate ppn
for a vpn
is referred to as:
Question 1: The PTBR contains the base address of the page table. Which of the following statements are false?
Select all that apply
Question 2: A single-level page table has only one table, where a multi-level page table has multiple tables organized in a hierarchical manner. Which of the following statements are true?
Select all that apply
Question 3: Which of the following can have a virtually-indexed-physically-tagged L1 cache?
Select all that apply
Question 4: Which of the following statements are true about the translation lookaside buffer (TLB)?
Select all that apply