All quizzes for Spring 2017

Quiz 00

Question 1: Suppose a program has three tasks to perform:

  • Task A, which takes 10 seconds;
  • Task B, which takes 5 seconds; and
  • Task C, which takes 3 seconds

Initially, my program performs each of these tasks one at a time. Which of the following change will result in the fastest runtime? (Assume doing two tasks in parallel does not make either of the parallel tasks slower.)

  1. perform task A, B, and C in parallel;

  2. speedup B and C by a factor of 5;

  3. speedup A and B by a factor of 4;

  4. speedup A by a factor of 2 and run it in parallel with B and C;

Question 2: If the bytes 0x12, followed by 0x34, followed by 0x56, followed by 0x78 are interpreted as a 4-byte little endian integer, what value will they have?

  1. 0x12345678

  2. 0x21436587

  3. 0x87654321

  4. 0x78563412

  5. none of the above

Question 3 (0 points): If a program compiles as C with a C compiler and is strictly standards-conformant C, it will ___ compile as C++ with a standards-conformant C++ compiler.

  1. always

  2. never

  3. sometimes

Quiz 01

Skim chapter 1, and review figures 3.2, 3.3, and 3.28 and answer the following questions.

Question 1: When we compile a C program with multiple source files, we can produce object (.o on Linux) files, which we combine into the final program. What is the process of combining these files into an executable called?

  1. assembling

  2. archiving

  3. linking

  4. compiling

  5. relocating

For these questions, look at figure 1.13 and consider the following program:

static int y = 42;
int main(void) {
    int x;
    scanf("%d", &x);
    printf("Input was %d\n", x);
    printf("Input + 42 is  %d\n", x + y);
}

Question 2: (see above) If the variable x is placed in memory, what region of memory would it most likely be in?

  1. kernel virtual memory

  2. user stack

  3. memory-mapped region for shared libraries

  4. run-time heap

  5. read/write data

  6. read-only code and data

Question 3: (see above) If the variable y is placed in memory, what region of memory would it most likely be in?

  1. kernel virtual memory

  2. user stack

  3. memory-mapped region for shared libraries

  4. run-time heap

  5. read/write data

  6. read-only code and data

Question 4: On 64-bit x86, if a function void foo(int w, int x, int y, int z) is called using foo(1, 2, 3, 4), then where is the value 4 placed?

  1. on the stack

  2. %ecx

  3. %rax

  4. %esi

  5. on the heap

Question 5: On 64-bit x86, if %rax contains 0x1000 and %rdx contains 0x1, then, in AT&T syntax, what does 0x10(%rax,%rdx,4) represent?

  1. the value in memory at address 0x1014

  2. the value in memory at address 0x1011

  3. the value in memory at address 0x4011

  4. the value in memory at address 0x4014

  5. the value 0x1015

  6. the value 0x4011

Quiz 02

Quiz on second week's material.

Consider the C program main.c

#include <stdio.h>
void sayHello(void) {
    printf("Hello, World!\n");
}
int main(void) {
    sayHello();
}

Suppose it is turned into an executable main using the following steps:

gcc -O -S main.c -o main.s
gcc -c main.s -o main.o
gcc main.o -o main.exe

Question 1: (see above) Which of the following files could contain the memory address of the function sayHello?
Select all that apply

  1. main.c

  2. main.s

  3. main.o

  4. main.exe

Question 2: (see above) Which of the following files contains the string "Hello, World!"?
Select all that apply

  1. main.c

  2. main.s

  3. main.o

  4. main.exe

Question 3: Given the following code, which of the following are valid variable declarations?

typedef struct bar {
    int x;
} foo;    

Select all that apply

  1. foo *a;

  2. bar* b;

  3. struct foo *c;

  4. struct *foo d;

  5. struct bar *e;

Question 4: What does the following code output?

int x = 0;
int y = 1;

foo:
while (y < 5) {
    y += 2;
    x += 1;
    if (y == 6)
        goto bar;
}
y -= 5;
goto foo;

bar:
printf("%d\n", x);
  1. 1

  2. 2

  3. 3

  4. 4

  5. 5

  6. 6

  7. none of the above

Consider the following variable declarations and initializations:

int foo[32];
int* bar = foo;

Question 5: (see above) Which of the following pairs of statements are equivalent?
Select all that apply

  1. foo + 1, bar + 1

  2. sizeof(foo[0]), sizeof(bar[0])

  3. &foo[1], &bar[1]

  4. sizeof(foo), sizeof(bar)

  5. sizeof(&foo[1]), sizeof(bar)

  6. foo[1], bar[1]

  7. *foo, *bar

Quiz 03

Skim for 2150 topics you do not remember in chapter 2. Skim sections 3.6.7, 3.7, and figures 3.1, 3.2, and 3.3.

The textbook presents the following two implementations of factorial fact_do and fact_while in Figures 3.19 and 3.20:

fact_do:                    fact_while:
    movl $1, %eax               movl $1, %eax
L2:                             jmp L5
    imulq %rdi, %rax        L6:
    subq $1, %rdi               imulq %rdi, %rax
    cmpq $1, %rdi               subq $1, %rdi
    jg  L2                  L5:
    rep; ret                    cmpq $1, %rdi
                                jg L6
                                rep; ret

(rep; ret is an alternate form of ret, which is equivalent for the purposes of this course.)

Question 1: (see above) If the function fact_do above is invoked with the argument 4, how many times is the cmpq instruction executed?

  1. 2

  2. 3

  3. 4

  4. 5

  5. 6

Question 2: (see above) If the function fact_while above is invoked with the argument 4, how many times is the cmpq instruction executed?

  1. 2

  2. 3

  3. 4

  4. 5

  5. 6

Question 3: Figure 3.35 in the text presents a recursive implementation of factorial:

rfact:
    pushq %rbx
    movq %rdi, %rbx
    movl $1, %eax
    cmpq $1, %rdi
    jle .L35
    leaq -1(%rdi), %rdi
    call rfact
    imulq %rbx, %rax
.L35:
    popq %rbx
    ret

Calling this function with an argument of 0 requires 16 bytes of stack space, including the space for the return address used by the final ret. How much stack space does calling it with an argument of 3 require?

  1. 16 bytes

  2. 32 bytes

  3. 40 bytes

  4. 48 bytes

  5. 64 bytes

Question 4: Which bit sequence is the 6-bit 2's complement representation of the decimal number -16?

  1. 110001

  2. 110000

  3. 10001

  4. 101111

  5. 001111

Quiz 04

Quiz on three week's material.

Question 1: Consider the following code:

unsigned int v; // input
int f;         // output
f = v && !(v & (v - 1));

What does the output f tell us about the input v?

  1. f is true if v is a negative number

  2. f is true if v is a power of two

  3. f is the sign extended version of v

  4. f is true if any bit in v is set

Question 2: Consider the following code:

int x;  // input 1
int y;  // input 2
int r;  // output 
r = y ^ ((x ^ y) & -(x < y)); 

What does r represent here? Hint: -1 is represented in binary by all 1's, and a < b returns 0 or 1.

  1. minimum of x and y

  2. maximum of x and y

  3. conditionally setting bits in x

  4. summation of x and y

Question 3: What is the value of the address computed by (%rdx,%rcx,4), if %rdx is 0xf000 and %rcx is 0x0100?

  1. 0xf400

  2. 0xf100

  3. 0xf500

  4. 0x4f00

Question 4: Code segment 1:

push   %ebx            //store ebx value to stack
subq   $10, %ebx
pop    %ebx            //restore ebx value from stack

Code segment 2:

push   %ebx            //store ebx value to stack
cmpq   $10, %ebx
pop    %ebx            //restore ebx value from stack

What is the difference between code segment 1 and 2?

  1. 1 may change ZF, but 2 will not

  2. 1 may change SF, but 2 will not

  3. 2 may change ZF, but 1 will not

  4. 2 may change SF, but 1 will not

  5. none of the above

Quiz 05

Read sections 4.1 through 4.3

Consider the instruction pushq %rax on Y86-64.

Question 1: (see above) How many program registers (e.g. %rax, %rbx, etc.) are written by this instruction?

  1. 0

  2. 1

  3. 2

  4. 3

  5. 4

Question 2: (see above) How many program registers (e.g. %rax, %rbx, etc.) are read by this instruction?

  1. 0

  2. 1

  3. 2

  4. 3

  5. 4

Question 3: In order for a Y86-64 processor to determine the length of the currently executing instruction in memory, what information might it need?
Select all that apply

  1. the first byte of the instruction

  2. the immediate or displacement value in that instruction

  3. the register numbers rA, rB in that instruction

  4. the current value of the condition codes

  5. the current value of the status code Stat

Question 4: Which of the following operations that can be done in one instruction on X86-64 can NOT be done in one instruction on Y86-64?
Select all that apply

  1. accessing a memory location computed by adding a register value and a constant offset

  2. accessing a memory location computed by adding a register value and another register value

  3. adding a constant value to a register

  4. multiplying a register value by another register value

  5. storing an 8-byte value in memory

Question 5: Consider a register (of the kind our textbook describes) which takes as input value I and a clock signal, and which outputs a value O. Which of the following statements is true?

  1. Each time the input I changes, the output O changes.

  2. As long as the clock signal is high, each time the input I changes, the output O will change.

  3. As long as the clock signal is low, each time the input I changes, the output O will change.

  4. If I changes and, later, the clock signal changes from low to high, then the output O will change.

  5. If C changes from low to high and, later, I changes, then the output O will change.

Quiz 06

Quiz on three four's material.

Question 1: Each of the following statements about Y86-64 are true. Which of them is an attribute that makes Y86-64 more RISC-like?
Select all that apply

  1. the OPq instructions operate only on registers

  2. there are few ways to specify instruction operands in Y86-64

  3. simpler instructions can take up less space than more complicated ones

  4. Y86-64 has only 18 instructions (counting conditional moves and conditional jumps each as one instruction)

  5. Y86-64 has instructions like pop which modify two registers, allowing for shorter programs

Question 2: What Y86-64 assembly is equivalent to cmovne %rax, %rbx?
Select all that apply

  1. rrmovq %rax, %rbx

  2.     je after
        rrmovq %rax, %rbx
    after:
    
  3.     jne after
        rrmovq %rax, %rbx
    after:
    
  4. cmove %rbx, %rax

  5.     cmovl %rax, %rbx
        cmovg %rax, %rbx
    

Question 3: What determines the length of one cycle in the single cycle microarchitecture?
Select all that apply

  1. the slowest stage (e.g., fetch, decode)

  2. the slowest instruction (e.g., multiplication)

  3. the number of registers

  4. the number of available instructions in the ISA

Question 4: After the decode stage what do we not know about the instruction?
Select all that apply

  1. instruction type

  2. source register

  3. output of the ALU

  4. destination register

  5. memory address

Quiz 07

Read sections 4.3.2 and 4.3.4. Review sections 4.2.2-4 and read the HCL2D document, sections 2 and 3.

Question 1: Consider the instruction subq rA, rB. This instruction performs R[rB] <— R[rB] sub R[rA]. The instruction passes through the five stages (Fetch, Decode, Execute, Memory, Writeback). Which (if any) stages are not involved in handling this instruction?

  1. Fetch

  2. Decode

  3. Execute

  4. Memory

  5. Writeback

Question 2: Consider the instruction rmmovq D(rB), rA. This instruction performs EA <— D + R[rB]; MEM[EA] <— R[rA]. The instruction passes through the five stages (Fetch, Decode, Execute, Memory, Writeback). In which stage does the hardware calculate the effective address EA (ValE in the book)?

  1. Fetch

  2. Decode

  3. Execute

  4. Memory

  5. Writeback

Question 3: Consider the instruction mrmovq D(rB), rA. This instruction performs EA <— D + R[rB]; R[rA] <— MEM[EA]. The instruction passes through the five stages (Fetch, Decode, Execute, Memory, Writeback). In which stage does the hardware retrieve the value stored in register rB?

  1. Fetch

  2. Decode

  3. Execute

  4. Memory

  5. Writeback

Question 4 (0 points): What is the result of the following HCL (or HCL2D) expression if x is 10 and y is 5?

[
    y in {4,6} : x + y,
    y <= 5 : x - y,
    1 : 0,
]
  1. 15

  2. 10

  3. 5

  4. 0

  5. none of the above

Quiz 08

Quiz on week five's material.

Question 1: In the single-cycle processor design we discussed in class, one option was for the memory address input to the data memory to come directly from the single ALU. For which of the following memory-accessing instructions was this not done because the ALU was used for something else?
Select all that apply

  1. mrmovq

  2. ret

  3. pushq

  4. rmmovq

Question 2: In the single-cycle processor design we discussed in class, some instructions do not use the memory stage. What is true about these instructions?
Select all that apply

  1. these instructions change the PC more quickly than other instructions;

  2. when these instructions execute, the address input (mem_addr) to the data memory is 0;

  3. when these instructions execute, in an HCL2D implementation, the mem_readbit control signal will be 0;

  4. these instructions all write a value to the register file;

Question 3: In the single-cycle processor design we discussed in class, from which sources could the input of the program counter register come?
Select all that apply

  1. a register file output

  2. the data memory output

  3. the output of an adder which takes the current PC as input

  4. the instruction memory output

  5. the output of the ALU that is also used for data memory address computations

Question 4: Which of the following are illegal in HCL2D?
Select all that apply

  1. reg_srcA = pc;

  2. reg_inputE = pc;

  3. wire a : 8, b : 8; a = b; b = 4;

  4. wire a : 64; a = i10bytes[2..8];

  5. reg_srcA = 3;

Quiz 09

No quizzes for week 6; good luck on the exam!

Quiz 10

Read sections 4.4 through 4.5.4

Question 1: Adding pipelining to a previously unpipelined processor ______ the latency of instructions.

  1. increases

  2. decreases

  3. can increase or decrease

  4. does not change

Question 2: Adding pipelining to a previously unpipelined processor ______ the throughput of instructions

  1. increases

  2. decreases

  3. can increase or decrease

  4. does not change

Question 3: In the pipelined version of the Y86-64 processor, which values are passed in pipeline registers between the fetch and decode stage?
Select all that apply

  1. the rA and rB fields from the instruction (if the instruction has them)

  2. the immediate or displacement value from the instruction (if the instruction has one)

  3. the result of the instruction's ALU operation (if the instruction has one)

Quiz 11

Quiz on week seven's material.

Question 1: Consider the pipelined Y86-64 implementation we discussed in class. Suppose the register delay is 10 ps and the length of the critical path through memories and combinatorial logic in each stage is:

  • 100 ps for fetch;
  • 75 ps for decode;
  • 80 ps for execute
  • 100 ps for memory
  • 60 ps for writeback

What is the minimum clock cycle time this processor could have and still operate correctly?

  1. 100 ps

  2. 110 ps

  3. 415 ps

  4. 465 ps

  5. 455 ps

  6. 60 ps

  7. 70 ps

Question 2: Consider the following Y86-64 assembly snippet:

mrmovq 4(%r11), %r10
addq %rax, %rbx
subq %rcx, %rdx
andq %rsi, %rdi 
xorq %rsp, %rbp
irmovq $10, %r9

In our pipelined implementation, when the subq instruction is in its decode stage, what stage is the mrmovq instruction in?

  1. fetch

  2. decode

  3. execute

  4. memory

  5. writeback

  6. none of the above

Question 3: For the pipelined Y86-64 design we discussed in class, for which of the following instruction sequences is there a data hazard?
Select all that apply

  1. addq %rax, %rbx; addq %rbx, %rcx

  2. addq %rax, %rbx; addq %rax, %rcx

  3. addq %rax, %rbx; addq %rax, %rbx

Question 4: For the pipelined Y86-64 design we discussed in class, for which of the following instruction sequences is there a control hazard?
Select all that apply

  1. irmovq $0, %rax; mrmovq 0(%r10), %r11; ret

  2. irmovq $0, %rax; addq %rbx, %rbx; jle foo

  3. irmovq $0, %rax; addq %rbx, %rbx; call foo

Quiz 12

Read sections 4.5.5, 4.5.8, 4.5.10, 5.7-5.7.2

Question 1: Using our textbook's notion of "stall" and "bubble" signals, if a pipeline register's input is receiving a stall signal of 1 and a bubble signal of 0, then its output after the next rising edge of the clock will be:

  1. the same as the current input of the register

  2. the same as the current output of the register

  3. the nop value

Question 2: Our textbook talks about both data dependencies and data hazards. Which are differences between these?
Select all that apply

  1. what is a data hazard depends on the hardware, but what is a data dependency depends only on the ISA

  2. two instructions can have a data dependency without creating a data hazard

Question 3: We can perform forwarding of %rax to implement

irmovq $15, %rax
addq %rbx, %rax

without any stalling. Suppose the addq instruction receives the forwarded value for %rax from irmovq in its decode stage. Where can it retrieve (i.e. forward) this value from?

  1. the pipeline registers between fetch and decode

  2. the pipeline registers between decode and execute

  3. the pipeline registers between execute and memory

  4. the pipeline regsiters between memory and writeback

Question 4: Figure 5.12 indicates that the integer multiplication functional units on the Intel "Haswell" microarchitecture have an issue time of 1 cycle and a latency of 3 cycles. How long does it take one of these functional units to perform two integer multiplications from when the first multiplcation is started until the last finishes?

  1. 3 cycles

  2. 4 cycles

  3. 5 cycles

  4. 6 cycles

  5. 7 cycles

Quiz 13

For each of the following sequences of instructions, indicate how many cycles of pipeline stalls occur in the pipelined Y86-64 implementation we discussed in class.

Question 1: (see above) :

popq %rax
addq %rax, %rbx
popq %rax
rmmovq %rax, 8(%rbx)
  1. 0

  2. 1

  3. 2

  4. 3

  5. 4

  6. none of the above

Question 2: (see above) : (accepted both 0 and 3 --- depending on whether you counted stalls "during" ret)

     xorq %rax, %rax
     je foo // always taken
foo: subq %rbx, %rax
     ret
  1. 0

  2. 1

  3. 2

  4. 3

  5. 4

  6. none of the above

Question 3: Consider a pipelined Y86-64 processor with six stages, resulting from splitting the memory stage into two parts. That is, the processor has the following stages:

  • Fetch
  • Decode
  • Execute
  • Memory Part 1
  • Memory Part 2
  • Writeback

Suppose the results of any data memory load or store is not available until near the end of the second memory stage. On this processor, even after implementing all possible forwarding, which of the following instruction sequences would require a pipeline stall to resolve hazards:
Select all that apply

  1. addq %rcx, %rax; rmmovq %rbx, 8(%rax); addq %rcx, %rax

  2. mrmovq 8(%rax), %rbx; subq %rcx, %rax; addq %rbx, %rax

  3. pushq %rax; addq %rsp, %rbx; popq %rbx

Question 4: IBM 360/91 had out-of-order instruction execution and completion but no support for precise exceptions. Suppose you were writing a floating point program in this machine. Which statement is true?
Select all that apply

  1. Each instruction in your program executed in one cycle

  2. Debugging was hard

  3. Instructions got executed in-order

Quiz 14

skim section 6.1.1 and read 6.2-6.3

Question 1: Which component in the memory hierarchy is the fastest?

  1. Register

  2. Cache

  3. Memory

  4. Storage

Question 2: Which component in the memory hierarchy is non-volatile (can retain data without power)?

  1. SRAM

  2. DRAM

  3. Disk

  4. None of the above

Question 3: Suppose the memory access pattern of your program looks like this: a[1], a[2], a[3], a[7], a[8], a[9], a[13], a[14], a[15]. What kind of locality correctly describes this access pattern?

  1. Spatial locality

  2. Temporal locality

  3. No locality

  4. None of the above

Question 4: Suppose the memory access pattern of your program looks like this: a[1], b[1], c[1], a[1], b[1], c[1]. What kind of locality correctly describes this access pattern?

  1. Spatial locality

  2. Temporal locality

  3. No locality

  4. None of the above

Quiz 15

Question 1: Which of the following are true about programs in the data-flow model?
Select all that apply

  1. Operations in the program appear to execute in the order in which they were written.

  2. Independent operations can execute as soon as their inputs are available.

  3. Multiple operations with the same input can execute in parallel.

Question 2: On an out-of-order processor, which of the following can happen while the processor is completing a very long load (mrmovq) instruction?
Select all that apply

  1. Independent instructions from later in the program can be fetched.

  2. Dependent instructions from later in the program can be fetched.

  3. Some instructions from later in the program can be executed.

  4. Dependent instructions from later in the program can be executed.

  5. Some instructions from later in the program can be committed.

Question 3: Suppose a level-1 cache has an access time of 2 nanoseconds and a 95% hit rate, and the corresponding level-2 cache has a preceived access time of 10 nanoseconds. What is the preceived access time of the level 1 cache?

  1. 2 nanoseconds

  2. 2.05 nanoseconds

  3. 2.4 nanoseconds

  4. 2.5 nanoseconds

  5. 9.6 nanoseconds

  6. 11.5 nanoseconds

  7. none of the above

Question 4: Assume that you are designing a cache for low-power sensor nodes. It measures the environment temperature every minute and sends the average temperature from the last 24 hours to the base station. This sensor requires the caches to be extremely power-efficient. Which design decision would you make?

  1. Serial look up for tag store and data store in the cache

  2. A huge L1 cache

  3. No cache, directly access memory

Quiz 16

Section 6.5

Suppose your program sums up all elements of an array. Each element of the array takes 4 bytes of space. Your cache block size is 64B. Initially your cache is empty and your first access v[0] misses the cache.

int sum = 0;
for(int i = 0; i < N; i += 1)
    sum += v[i];

Question 1: (see above) Which of the following statements are true?
Select all that apply

  1. v[1] will hit the cache

  2. v[1] to v[15] will hit the cache

  3. v[16] will miss the cache

  4. v[0], v[16], v[32] will miss the cache

Question 2: (see above) What kind of locality this access pattern is exploiting in the cache?

  1. Spatial locality

  2. Temporal locality

  3. No locality

  4. None of the above

Question 3: (see above) If N is set to be 100, how many misses this code would see while accessing the array v?

  1. 0

  2. 6

  3. 7

  4. 10

Question 4: (see above) If we operate on every even element in the array, the code becomes:

int sum = 0;
for(int i = 0; i < N; i += 2)
    sum += v[i];

The cache parameters remain the same. Which of the following statements are true?
Select all that apply

  1. v[0] will hit the cache

  2. v[2] will hit the cache

  3. v[16] will miss the cache

  4. v[0], v[16], v[32] will miss the cache

Quiz 17

Consider a 1.5MB 3-way set-associative cache with 64 byte cache blocks which uses a true LRU (least recently used) replacement policy.

Question 1: (see above) How many sets does this cache have?

  1. 1048576 (2 to the 20th)

  2. 16384 (2 to the 14th)

  3. 8192 (2 to the 13th)

  4. 4096 (2 to the 12th)

  5. 2048 (2 to the 11th)

  6. none of the above

Question 2: (see above) The byte at address 0x654321 will be stored in the same cache block as the byte at
Select all that apply

  1. 0x654300

  2. 0x654350

  3. 0x600021

  4. 0x054320

Question 3: (see above) The byte at address 0x654321 will be stored in the same cache set as the byte at
Select all that apply

  1. 0x654300

  2. 0x654350

  3. 0x600021

  4. 0x054320

Question 4: Which of the following techniques are likely to reduce the number of conflict cache misses programs experience, assuming the cache size remains fixed?
Select all that apply

  1. increased cache associativity

  2. better choices of cache replacement policy

  3. increased cache block size

Question 5: Switching a cache from a write-through to write-back policy is likely to have which of the following effects on the cache?
Select all that apply

  1. the cache will take better of advantage of locality in writes

  2. the cache will write to memory more often

  3. cache replacement will be simpler

Quiz 18

Quiz 19

Section 5.1-2, 5.4-6, 5.8-11, skim 5.14

Question 1: Consider the following two functions:

void sumArray1(int *pSum, int *array, int n) {
    for (int i = 0; i < n; ++i)
        *pSum += array[i];
}

void sumArray2(int *pSum, int *array, int n) {
    int temp = 0;
    for (int i = 0; i < n; ++i)
        temp += array[i];
    *pSum += temp;
}

Suppose an optimizing compiler generates much slower code for sumArray1 than sumArray2. What is a likely cause of this?
Select all that apply

  1. pSum might alias array

  2. pSum might alias n

  3. sumArray1 is too large to be inlined

  4. sumArray1 has side-effects

  5. the loop in sumArray1 can't be unrolled because of *pSum

Question 2: Which of the following optimizations the textbook discusses are likely to substantially increase machine code size?
Select all that apply

  1. loop unrolling

  2. eliminating loop inefficiencies by moving computations outside of loops

  3. function inlining

  4. using multiple accumulators

  5. eliminating unneeded memory references

Question 3 (0 points): (Question dropped.) The textbook discusses how a loop like:

for (int i = 0; i < N; i += 2) {
    acc = acc * (a[i] * b[i]);
    acc = acc * (a[i+1] * b[i+1]);
}

is usually faster than one like:

for (int i = 0; i < N; i += 2) {
    acc = (acc * a[i]) * b[i];
    acc = (acc * a[i+1]) * b[i+1];
}

(Assume the compiler obeys the explicitly specified order of operations.) Which of the following are different about how a modern out-of-order processor will execute the two loops?

The first answer was meant to say acc * (a[i] * b[i]) and acc * (a[i+1] + b[i+1]) to make the question unambiguous. This is why the question was dropped. However, it's still the case the processor can execute a[i+1] * b[i+1] from the next iteration of the loop in parallel with acc * (a[i+1] * b[i+1]) from the previous.
Select all that apply

  1. the processor can execute a[i+1] * b[i+1] and acc * (a[i+1] * b[i+1]) in parallel in the first loop, but not the second

  2. the processor can load a[i] earlier in the first loop than the second loop

  3. the processor can more easily predict branches in the second loop

  4. the processor can load a[i] and b[i] in parallel with the first loop, but not the second

Question 4: Which of the following transformation our textbook discusses likely to decrease the number of instructions a program executes? (This should have been a select-all, but since it wasn't, either of the correct answers gets full credit.)

  1. cache blocking

  2. loop unrolling

  3. function inlining

  4. using multiple accumulators

Quiz 20

Question 1: Some of the optimizations we talked about compilers do not automatically perform aggressively because they can make code much slower when done excessively. Which are examples of these?
Select all that apply

  1. loop unrolling

  2. removing redundant operations from loops

  3. function inlining

  4. replacing multiplication in a loop with addition

Question 2: Which of these are a way to eliminate aliasing so a compiler can perform more optimizations?

  1. place values accessed through pointers temporarily in a local variable

  2. iterate through arrays with pointer arithmetic instead of using array subscripts

  3. place array indices in a local variable rather than computing it each time

  4. transform your loops to use cache blocking

Question 3: Using mulitple accumulators improves performance by

  1. avoiding redundant computations or memory accesses

  2. performing more operations in parallel

  3. eliminating loop overheads

  4. simplifying the leftover work that must be done after an unrolled loop

  5. none of the above

Question 4: In the example in lecture, trying to use too many accumulators resulted in lower performance than using fewer (with the same amount of loop performance). This was most likely because of

  1. register spilling

  2. more instruction cache misses

  3. poorer spatial locality

  4. all of the above

Quiz 21

Section 8.1-8.3

Question 1: Suppose that your program reads a file from the disk using the fread function. The library you are using implements this fread function using a system call referred to as read. How would you characterize the exception caused by read?

  1. Interrupt

  2. Trap

  3. Fault

  4. Abort

Question 2: Which of the following events will initiate an interrupt?
Select all that apply

  1. A mouse click

  2. Typing on the key board

  3. A divide by zero operation

  4. A completed data transfer from the disk to memory

Question 3 (0 points): Which of the following events will initiate an abort?
Select all that apply

  1. A mouse click

  2. Data transfer complete from the disk to memory

  3. A divide by zero operation

  4. A completed data transfer from the disk to memory

Question 4: Suppose that your program writes to a read-only page. Which of the following exceptions will occur in this case?

  1. Interrupt

  2. Trap

  3. Fault

  4. Abort

Quiz 22

Question 1: Each of the following is either synchronous or asynchronous. Which are synchronous?
Select all that apply

  1. traps

  2. faults

  3. interrupts

  4. signals

Question 2: We have seen this code in our lecture.

void fork5()
{
        printf("L0\t”);
        if (fork() == 0) {
            printf("L1\t”);
            if (fork() == 0) {
                printf("L2\t”);
        }
        }
        printf("Bye\t”);
}

Which of the following are infeasible?
Select all that apply

  1. L0 Bye L1 L2 Bye Bye

  2. L0 Bye L1 Bye Bye L2

  3. L0 L1 Bye Bye L2 Bye

  4. L0 Bye Bye L1 L2 Bye

Question 3: Which of the following are true about signal handlers?
Select all that apply

  1. they run in user mode

  2. can access the global variables

  3. they can be executed in response to an external event

  4. they can be interrupted by other handlers

Question 4: Which of the following are true about signals?
Select all that apply

  1. sent from kernel to a user process

  2. it is possible to queue multiple pending signals of the same type

  3. can be used to communicate between two user processes via kernel

  4. received by the user process as soon as they are send

Quiz 23

Section 9.3-9.6, skim 9.7

Question 1: Which of the following statements are false?
Select all that apply

  1. a process can always access data from the address space of some other process

  2. a process can never access data from its own address space

  3. all processes share the same address space

  4. all processes can access kernel address space

Question 2: Virtual memory can be thought of as a mechanism that caches virtual pages in the physical memory. A virtual page can be stored at any physical page frame. Which of the following statement is true?

  1. virtual memory is direct mapped

  2. virtual memory is set associative

  3. virtual memory is fully associative

Question 3: If we need 32 bits to represent a virtual address and our page size is 4KB, how many page table entries (PTE) do we need?

  1. 2^10

  2. 2^12

  3. 2^20

  4. 2^22

Question 4: A page table is a map of <vpn, ppn>, where vpn is a virtual page number and ppn is a physical page number. The process of generating the appropriate ppn for a vpn is referred to as:

  1. address protection

  2. address translation

  3. address swapping

  4. address counting

Quiz 24

Question 1: The PTBR contains the base address of the page table. Which of the following statements are false?
Select all that apply

  1. It contains a virtual address

  2. It contains the same address for each process

  3. It is updated on a contest switch

  4. It is the CR3 register for x86 machines

Question 2: A single-level page table has only one table, where a multi-level page table has multiple tables organized in a hierarchical manner. Which of the following statements are true?
Select all that apply

  1. For a single-level page table, the amount of space it requires varies based on how many pages are in use

  2. For a multi-level page table, the amount of space it requires varies based on how many pages are in use

  3. If your program uses 100% of the virtual address space (even the parts normally not used), having a multi-level page table requires less space than a single-level page table

  4. All of the above

Question 3: Which of the following can have a virtually-indexed-physically-tagged L1 cache?
Select all that apply

  1. L1 32 KB, 8-way set associative, page size 4K

  2. L1 64 KB, 8-way set associative, page size 4K

  3. L1 64 KB, 2-way set associative, page size 4K

  4. L1 64 KB, 2-way set associative, page size 8K

Question 4: Which of the following statements are true about the translation lookaside buffer (TLB)?
Select all that apply

  1. Small set-associative hardware cache in MMU

  2. Maps virtual page numbers to physical page numbers

  3. Contains complete page table entries for some pages

  4. None of the above