Answer the following questions about this week's lecture material.
In lecture, we discussed the idea of a processor that was connected to memory and I/O devices using a shared memory bus. Which of the following statements are true about how the processor using this memory bus would work? Select all that apply.
On a little-endian system,
if the value 0x1234ABC
was stored starting at address 0x101
as a 4-byte integer,
then afterwards the byte stored at address 0x103
would have what value?
(Assume, as we described in lecture, memory is organized similar to an array of bytes
where each byte has its own address.)
Write your
answer as a hexadecimal number.
If not enough information is supplied, write unknown
and explain in the comment.
stored as 0xBC (in 0x101), 0x4A (in 0x102), 0x23 (in 0x103) and 0x01 (in 0x104)
For the following questions, consider the following AT&T syntax x86-64 assembly instruction:
movq (%rax, %rbx, 4), %rcx
And assume that, just before this instruction executes, registers have values as follows:
%rax | 0x1000 |
%rbx | 0x8000 |
%rcx | 0x79 |
%rsp | 0xF00000 |
When the above instruction is executed, it will use a memory address. What memory address will it use? Write your answer as a hexadecimal number.
0x1000 + 0x8000 * 4
When the above instruction executes successfully, it will ____.
Consider the following assembly snippet:
addq %rbx, %rbx
addq %rbx, %rbx
addq %rax, %rbx
movq (%rbx), %rbx
Which of the following assembly snippets would result the same values in %rax
and %rbx
as the above assembly snippet? Select all that apply.
addq %rbx, %rbx --> rbx = initial rbx * 2; addq %rbx, %rbx --> rbx = initial rbx * 4; addq %rax, %rbx --> rbx = initial rbx * 4 + rax; movq (%rbx), %rbx --> rbx = memory[initial rbx * 4 + rax]
Answer the following questions about the lecture material for week 2.
For each of the following assembly snippets, identify which are possible values of the ZF and SF condition codes.
cmpq $0, %rax
jl skip
imulq $-1, %rax
skip:
testq %rax, %rax
I should have been more explicit that I only cared about the value when the snippet finishes, but I think/hope that's the only sensible interpretation. The condition code's final value is determined by the testq
insturction. (How the cmpq or imulq do or do not set condition codes does not matter.) If rax is initially positive, itu becomes negative when the testq instruction runs; if rax is initially negative, it remains negative (the jl skips the imulq); if rax is initially zero, it remains zero.
movq $0, %rax
cmpq %rax, %rax
leaq 0x12345678(,%rax,4), %rax
the last leaq won't change the flags
Consider the following C snippet:
while (rax >= 1 && rax < 10) {
rax = rax - r8;
}
and the following assembly snippets:
/* snippet A */
loop:
cmpq $0, %rax
jle done
cmpq $10, %rax
jge done
subq %r8, %rax
jmp loop
done:
/* snippet B */
jmp start
loop:
subq %r8, %rax
start:
cmpq $1, %rax
jl done
cmpq $10, %rax
jl start
done:
/* snippet C */
loop:
cmpq $1, %rax
cmpq $10, %rax
jl done
jge done
subq %r8, %rax
jmp loop
done:
/* snippet D */
loop:
cmpq $1, %rax
jl done
cmpq $11, %rax
jg done
subq %r8, %rax
jl loop
done:
Assuming the variable rax
is stored in the register %rax
and the variable r8
is
stored in the register %r8
, which snippets are correct assembly translations of the loop?
Select all that apply.
For the following questions, consider static linking using object files as discussed in lecture.
Suppose one converts the following two assembly files into an executable (where ...
represents
some code that is not shown, and .asciz
is an assembly directive to generate a nul-terminated
ASCII string, and .global LABEL
is a directive that causes the assembler to ensure
that a specified label LABEL
can be referenced from other files):
Assembly file main.s
:
.text
.global main
main:
mov $0, %r11
mov $10, %r10
loop:
mov $percent_d, %rdi
mov %r11, %rsi
call printf
incq %r11
decq %r10
jge loop
mov $0, %rax
ret
.data
.global percent_d
percent_d:
.asciz "%d"
Assembly file printf.s
:
.text
.global printf
printf:
movb (%rdi), %al
...
...
ret
.data
...
The object file generated from main.s
will include metadata (outside of any machine code) specifying ____. Select all that apply.
In the object files generated from the assembly files above,
a symbol table entry for printf
would most likely ____. Select all that apply.
If x
and y
are 64-bit signed integers between -100000000000 and 100000000000
(on a C implementation that uses two's complement
and implements signed right shift using arithmetic shift and where other constants are 64-bits), then which
of the following C expressions are always true? Select all that apply.
Consider the following C function:
long foo(long x) {
return (x >> 40) + x;
}
Assume this runs on a system where:
If overflow does not occur,
what is an argument x
that will result in foo
returning the most negative number possible?
(Overflow includes adding two negative numbers
and getting a positive number or adding two positives to get a negative.
It does not including right shift producing a rounded result for division by a power of two)
Write your answer as a hexadecimal number.
Originally, I made an off-by-one-error in my calculations (by neglecting to consider that x
needs to be close enough to the minimum long to cause shifting by 40 to give -1<<23, not -1<<22), this has now been corrected and the key adjusted accordingly. Sorry for the confusion.
result is floor(x / pow(2,40)) + x, which seems to minimized by -pow(2,63) (the lowest possible return value, since the return value must be a 64-bit two's complement integer), but if x is -pow(2, 63), this will wrap around (since (any negative value) + (-pow(2, 63)) is out of range for a 64-bit two's complement integer. Let's suppose x is -pow(2, 63) + K where K < pow(2, 40) (so it vanishes from the floor(x / pow(2, 40) term). Then x >> 40 will be -(1 << (63-40)) == -(1 << 23). Then, if we let K be (1<<23) to cancel this out the overall result -(1 << 23) - pow(2,63) + (1<<23) = -pow(2, 63), which is the minimum possible value.
Note that since this x
achieves a return value which is the smallest possible 64-bit two's complement integer, so no lower result is possible, even if overflow is allowed to occur. The prohibition on overflow was meant to exclude values of x
which were very close to pow(2,63)-1; it would have been better phrased "the most negative possible provided that overflow does not occur". If you assumed that we could somehow represent out-of-range values for the result of foo
, then the true result of -pow(2,63) >> 40 + -pow(2,63) would be more negative.
Which of the following are generally properties of an instruction set architecture (ISA) rather than properties of a microarchitecture? Select all that apply.
Which of the following decisions are more typical of a reduced instruction set architecture (RISC) design philosophy than a complex instruction set architecture (CISC) design philosoophy? Select all that apply.
Answer the following questions about the lecture material for this week.
Which of the following valid x86-64 instructions are also valid Y86-64 instructions (as they would be written in AT&T syntax)? Select all that apply.
Consider the following Y86-64 machine code (represented as a sequence of hexadecimal bytes, with the byte at the lowest address written left-most):
40 ab 12 60 12 60 12 60 12 60 61 ab 50 ab 78 56 00 00 00 00 00 00
When decoded, the first two instructions are, in order:
Consider the following Y86-64 machine code for mrmovq SOMETHING(%r9), %r8
where SOMETHING
is
a constant:
50 89 00 ef 00 00 00 d0 bc 0a
What is the value of SOMETHING
? Write your answer as a hexadecimal number.
Consider the registers we discussed in lecture. Suppose the register is currently outputting the value 17 and the input to the register is 20 and the clock signal is high. Then, the input changes to 21; then, the clock signals falls; then, the input changes to 22; then, the clock signal rises; then, the input changes to 23. Then, what will the value of the register's output be?
Consider the following HCLRS code snippet (where ...
represents some omitted code):
register iO {
a : 64 = ...;
b : 64 = ...;
}
i_b = O_a + i_a;
i_a = O_a + O_b + 1;
If during cycle 1, O_a
is 3
and O_b
is 4
, then the value of O_a
during
cycle 3 will be ____. Write your answer as a base-10 number. If not enough information is
provided write unknown
.
order of assignments doesn't matter. cycle 1: i_a = 3 + 4 + 1 = 8, i_b = 3 + 8 = 11; cycle 2: i_a = 8 + 11 + 1 = 20 (becomes O_a during cycle 3), i_b = 8 + 20 = 28 (not relevant);
Consider the single-cycle processor design we discussed in lecture (built using the components we discussed in lecture that executes every instruction in one cycle) executing the instruction:
mrmovq 10(%rax), %rbx
While the address 10(%rax)
is being computed for the above instruction,
the program counter register will output ____.
A value from the data memory is read for this instruction ____.
A value is written to the register file for this instruction ___.
Consider the single-cycle processor design we discussed in lecture (built using the components we discussed in lecture).
Suppose one wanted to add support for a new rrradd REG1, REG2, REG3
instruction which would take three
register operands and compute the sum of the first two registers and store the result in the third register.
For example if %rax
contained 3
and %rbx
contained 4
, then running rrradd %rax, %rbx, %rcx
would
write 7
to %rcx.
Which of the following is true about adding this instruction to a Y86 and the single-cycle processor design? Select all that apply.
Consider the single-cycle processor design we discussed in lecture (built using the components we discussed in lecture).
Suppose one wanted to add support for a new push2q REG1, REG2
instruction which would be equivalent
to pushq REG1
and pushq REG2
but be completed by one instruction (and in one cycle).
The layout of the machine code for the above instruction would most likely be most similar to the encoding for:
As part of adding this instruction, one would expect the data memory to be modified to ____. Select all that apply.
Answer the following questions about the lecture material for this week.
Consider a seven-stage pipelined processor with a 100 ps cycle time. Suppose this processor is executing instructions A, B, C, D, E, ... in order. (Assume no pipeline stalls, etc. are involved.)
Instruction D will finish its first stage _____ ps after instruction A finishes its first stage.
Instruction B will finish its seventh stage ____ ps after instruction A finishes its first stage.
Suppose we are constructing a three-stage pipelined processor, where the computations or data storage operations performed by each stage take:
and we are using pipeline registers with a 20 ps register delay.
What would the minimum cycle time for this processor be in ps?
Consider building a pipeline processor similar to our textbook's design Suppose the times required for components are as follows:
Suppose instead of constructing a 5-stage processor, like our textbook proposes and we discussed in lecture, one were to construct a 4-stage processor by merging two of the stages we would use in the textbook's 5-stage design. Based on the timings above, what two stages would make the most sense (in terms of expected performance) to merge?
Suppose a 5-stage pipelined processor achives a throughput of 1000 instructions/microsecond. If one converted this into a single-cycle processor, one would expect a throughput ____.
For the following questions, consider a four-stage pipelined processor built using the design we discussed in lecture but with the following stages:
(That is, the memory and writeback stages are combined.)
Suppose this processor uses forwarding to resolve data hazards. If forwarding alone is insufficient, the processor combines forwarding with a minimum amount of stalling.
Which of the following assembly snippets would exercise a data hazard if executed on the processor described above? Select all that apply.
On the processor described above,
the addq
instruction in the following assembly snippet would
finish ___ cycles after the subq
instruction finishes?
subq %rcx, %rax
addq %rdx, %rax
On the processor described above,
the subq
instruction in the following assembly snippet would
finish ___ cycles after the mrmovq
instruction finishes?
mrmovq (%rcx), %rax
subq %rcx, %rax
Consider the following assembly snippet:
addq %r8, %r9
subq %r9, %r10
irmovq $42, %r9
rmmovq %r9, 0(%r10)
rmmovq %r8, 8(%r10)
When this is executed on the processor described above, one would expect which of the following forwarding operations to occur? (It's possible that not all of the neceessary forwarding operations are included among the options below.) Select all that apply.
For the following questions, consider the following assembly snippet:
pushq %rax
addq %rcx, %rdx
jle foo
xorq %rdx, %rdx
andq %r9, %r10
subq %r11, %r12
nop
foo: rrmovq %rdx, %r8
irmovq $10, %r13
rmmovq %r14, 0(%r15)
For the questions below, consider a six-stage processor, similar in design to the one discussed in lecture, with the following stages:
In this processor, the result of arithmetic (such as performed by the addq
or xorq
instructions)
or of evaluating the condition codes to determine if a conditional jump is taken is not available
until near the end of the execute part 2 stage. In particular, a conditional jump that was not predicted
correctly will not be able to fetch the instruction that follows the jump instruction until the jump instruction
is in the memory stage.
Assume the processor uses forwarding and, where necessary, stalling to resolve data hazards. Control hazards are resolved as described in each question below.
Suppose the six-stage processor described above
predicted all jumps as taken and the jle
instruction
above was actually not taken. Then
the xorq
instruction would retrieve the value of %rdx
in its decode stage
Suppose the six-stage processor predicted all jumps as taken
and the jle
instruction
above was actually not taken. Then
the xorq
instruction would complete its decode stage while a ____ instruction
completed its writeback stage.
Answer the following questions about the lecture material for this week.
Suppose a five-stage processor was executing during cycle N:
and that the processor wants to stall instruction C and squash instructions B and A, so during the following cycle N+1:
If this processor were built using the bubble_X
and stall_X
signals we discussed in
lecture, then as part of achieving this effect, the processor should set ___.
Select all that apply.
Suppose we have a four-stage pipelined processor with the following stages:
Suppose, unlike the processor we've discussed in lecture, this processor uses branch prediction which predicts are conditional jumps as not taken. When a conditional jump is mispredicted, the processor detects the misprediction during the corresponding jump instruction's execute stage, and retrieves the correct instruction in the following cycle.
Suppose the processor is executing the following assembly snippet:
xorq %rax, %rax
jle foo
andq %rcx, %rdx
pushq %rdx
foo: addq %r8, %r9
subq %r9, %r10
xorq %r10, %r11
When the above assembly is run, the pushq
instruction is fetched as a result of the misprediction.
If this processor were built using the bubble_X
and stall_X
signals described in lecture,
the pushq
instruction will be squashed _____.
If this processor were built using the bubble_X
and stall_X
signals described in lecture,
the andq
instruction will be squashed _____.
For each of the pairs below, select which one is expected to exhibit more temporal locality.
Pair one.
I should have been specific that option A was meant to refer to the array accesses and not the combination of array accesses (not good temporal locality) and machine code accesses (good temporal locality), but I think B is the better answer regardless.
Pair two.
Identify which of the following is expected to exhibit more spatial locality.
Answer the following questions about the lecture material for this week.
Suppose a direct-mapped cache with a write-allocate, write-back policy has the following contents:
set index | valid | dirty | tag | value (as list of bytes in hexadecimal, lowest address (offset) first) |
0 | 1 | 0 | 0x000 | 00 22 33 44 55 FF AA CC |
1 | 0 | - | - | - |
2 | 1 | 0 | 0x002 | 11 22 33 44 55 66 77 88 |
3 | 1 | 1 | 0x002 | 99 AA BB CC 9A BC DE 00 |
4 | 1 | 0 | 0x003 | 23 34 56 00 33 44 55 88 |
5 | 1 | 0 | 0x003 | 00 00 00 11 22 33 44 55 |
6 | 1 | 0 | 0x002 | 00 33 44 55 99 11 33 55 |
7 | 0 | - | - | - |
8 | 0 | - | - | - |
9 | 1 | 1 | 0x002 | 56 78 45 67 34 23 12 01 |
10 | 1 | 1 | 0x002 | FF FF FF 99 FF FF FF 88 |
11 | 1 | 1 | 0x002 | FF FF FF 77 FF FF FF 66 |
12 | 0 | - | - | - |
13 | 1 | 0 | 0x000 | AA BB CC DD 00 00 FF 00 |
14 | 1 | 0 | 0x000 | 00 00 00 01 00 00 00 02 |
15 | 1 | 0 | 0x001 | 03 00 00 00 04 00 00 00 |
When this cache is used to access the address 0x123
, what tag would be used for that access?
Write your answer as a hexadecimal number.
When reading a 2-byte, little-endian value from an address with tag 0x3, set index 3, and (block) offset 0x3,
the result will be? Write your answer as a hexadecimal number. If this access would be a cache
miss, write miss
.
tag mismatch
Using the above cache, to read an address with which of the following attributes would replace a dirty value (and therefore require writing out the dirty value to memory or the next level of cache)? Select all that apply.
Suppose a 2-way set-associativate cache with an LRU (least recently used) replacement policy and a write-no-allocate, write-through policy has the following contents:
way 0 | way 1 | ||||||
set index | valid | tag | value (as list of bytes in hexadecimal, lowest address (offset) first) | valid | tag | value | LRU way |
0 | 1 | 0x01 | 01 23 | 1 | 0x02 | 34 56 | 0 |
1 | 1 | 0x01 | 01 23 | 1 | 0x02 | 34 56 | 1 |
Suppose one reads a byte from address 0x13
using this cache, which is a cache miss. As a result of
the miss, a block of the cache will be replaced. Which one?
100[1][1] --> set index 1, tag 100 (binary) = 4
Suppose a program writes a 2-byte value to address 0x00
on a system using this cache as its data cache.
Which of the following
statements is true about how the write will be performed?
Suppose a direct-mapped cache with a write-allocate, write-back policy has the following contents:
set index | valid | dirty | tag | value (as list of bytes in hexadecimal, lowest address (offset) first) |
0 | 1 | 0 | 0x000 | 00 22 33 44 55 FF AA CC |
1 | 0 | - | - | - |
2 | 1 | 0 | 0x002 | 11 22 33 44 55 66 77 88 |
3 | 1 | 1 | 0x002 | 99 AA BB CC 9A BC DE 00 |
4 | 1 | 0 | 0x003 | 23 34 56 00 33 44 55 88 |
5 | 1 | 0 | 0x003 | 00 00 00 11 22 33 44 55 |
6 | 1 | 0 | 0x002 | 00 33 44 55 99 11 33 55 |
7 | 0 | - | - | - |
8 | 0 | - | - | - |
9 | 1 | 1 | 0x002 | 56 78 45 67 34 23 12 01 |
10 | 1 | 1 | 0x002 | FF FF FF 99 FF FF FF 88 |
11 | 1 | 1 | 0x002 | FF FF FF 77 FF FF FF 66 |
12 | 0 | - | - | - |
13 | 1 | 0 | 0x000 | AA BB CC DD 00 00 FF 00 |
14 | 1 | 0 | 0x000 | 00 00 00 01 00 00 00 02 |
15 | 1 | 0 | 0x001 | 03 00 00 00 04 00 00 00 |
Suppose the one-byte value 77
(written as hexadecimal)
is written using this cache using an address which has a (set) index
of 4
, a tag of 0x4
, and a (block) offset of 0x0
. Which of the following is true about the
effects of this accesses? Select all that apply.
Consider a system with two-level cache hierarchy where:
Suppose a workload is measured to have
What percentage of first-level cache accesses require a main memory access? Write your answer to the nearest whole number of percent.
Assume that when there is a miss in a cache, the cache uses its hit time to determine that it is a miss, and then starts the access to the next level of the memory hierarchy. For example, this would mean that an access that misses in the first-level cache and hits in the second level would take 12 cycles to complete.
Given this, what is the average memory access time of the program, in cycles, rounded to the nearest tenth of a cycle?
The following question ask about what data cache misses occur for a particular cache and a particular piece of C code. For those questions assume:
array
use the data cache specifiedarray[0]
is a multiple of 2 to the 20th power (so array[0] is at the beginning of a cache block and has set index 0 in almost any cache)...
are irrelevant to the questionarray
are not reordered and does not omit any accesses to arrayConsider the following C code:
unsigned char array[280];
...
for (int i = 0; i < 10; ++i) {
for (int j = 0; j < 280; ++j) {
array[j] += 1;
}
}
With a direct-mapped 256 byte cache with 64 byte blocks and a write-allocate, write-back policy, how many cache misses will occur? (Note that unlike the examples in lecture, these accesses are unevenly distributed across the sets of the cache.)
10 misses each for 0-63 and 256-320; 3 misses for 64-128, 128-192, 192-256
Consider the following C code:
unsigned char array[2500];
...
for (int i = 0; i < 2500; i += 1000) {
array[i] += 1;
}
for (int i = 0; i < 2500; i += 990) {
array[i] += 1;
}
How many cache misses would occur with a 1KB (1024 byte) fully-associative cache with 64 byte cache blocks and an LRU replacement policy and a write-allocate, write-back policy?
first loop: 0 (miss loading 0-63); 1000 (miss loading 960-1023); 2000 (miss loading 1984-2047); second loop: 0 (hit); 990 (hit); 1980 (miss loading 1920-1983);
For the following questions, consider the following C code:
int A[1024 * 1024], B[1024 * 1024], C[1024 * 1024];
void foo1() {
for (int i = 0; i < 1024; ++i) {
for (int j = 0; j < 1024; ++j) {
A[i + j] = B[j * 4] * C[i * 1024 + j];
}
}
}
void foo2() {
for (int i_outer = 0; i_outer < 1024; i_outer += 16) {
for (int j = 0; j < 1024; ++j) {
for (int i = i_outer; i < i_outer + 16; ++i) {
A[i + j] = B[j * 4] * C[i * 1024 + j];
}
}
}
}
The temporal locality in A
is better for ____.
The temporal locality in B
is better for ____.
The spatial locality in C
is better for ____.
The temporal locality in C
is better for ____.
For the following questions, consider the following assembly code:
movq $100, %rax /* 1 */
begin_loop:
movq (%r8, %rax, 8), %r10 /* 2 */
movq (%r9, %rax, 8), %r11 /* 3 */
subq %r11, %r10 /* 4 */
jg not_negative /* 5 */
neg %r10 /* 6 */
not_negatve:
addq %r10, %r13 /* 7 */
subq $1, %rax /* 8 */
jg begin_loop /* 9 */
(The neg
(negate) instruction is equivalent to multiplying a value by negative 1.)
If one wanted to optimize the above loop by unrolling, the unrolled version of the loop would need to include an extra copy of (perhaps with changes to offsets, etc.) instructions labelled ____.
meant to write not_negative:
on label
If one wanted to optimize the above loop to use multiple accumulators, the best candidate of an accumulator that should be split into two accumulators is ___.
Consider the following assembly snippet:
addq %r8, %r9
subq %r8, %r10
imulq %r9, %r10
andq %r9, %r8
xorq %r11, %r12
Consider an out-of-order processor that uses register renaming to help handle hazards, In this scheme, the renaming step might convert the above instructions into three-operand versions with physical registers (filling in the numbered blanks shown below):
addq ___ (1), ___ (2) -> ___ (3)
subq ___ (4), ___ (5) -> ___ (6)
imulq ___ (7), ___ (8) -> ___ (9)
andq ___ (10), ___ (11) -> ___ (12)
xorq ___ (13), ___ (14) -> ___ (15)
In this renamed version, which of the following pairs of blanks would refer to the same physical register? Select all that apply.
Suppose an out-of-order processor has two arithmetic execution units, each capable of performing any arithmetic operation, and they can perform:
What is the minimum number of cycles the arithmetic for the above assembly take on this processor? Do not include time needed to rename instructions, perform forwarding, register reading and writeback, etc.
forgot to mention the actual time for xor, but if it's the same as and (which certainly seems most likely): addq/subq in parallel (1 cycle); imulq (5 cyles) in parallel andq + xorq (2 cycles)
Answer the following quesitons about the lecture material for this week.
For the following question, consider the following C code:
void foo(int N, int *A, int *B, int *C) {
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
C[i] += A[j*N+i] + B[i+j];
}
}
}
Since A
might point to part of the same array as B
or C
, the
compiler cannot easily perform which of the following optimizations
(without adding some check for whether A overlaps with B or C)?
Select all that apply.
Which of the following statements are true about the effects of unrolling the inner loop in the code above? Select all that apply.
Consider the following C functions:
int foo1(int *x, int *y, int *z) {
for (int i = 0; i < 16; ++i) {
z[i] = x[i] + y[i];
z[i+16] = x[i] - y[i];
}
}
int foo2(int *x, int *y, int *z) {
for (int i = 0; i < 16; ++i) {
z[i] = x[i] + y[i];
}
for (int i = 0; i < 16; ++i) {
z[i+16] = x[i] - y[i];
}
}
A compiler cannot generate the same code for these functions because of aliasing.
Which of the following would be sufficient conditions to gaurentee that
foo1
and foo2
would have equivalent results? Select all that apply.
(Assume that pointer comparisons compare the addresses of pointers and produce
consistent usables results even if the two pointers point to parts of different objects.)
dropped because wasn't marked as select-all-that-apply in quiz system
Consider the following C snippets:
/* snippet A */
for (int i = 0; i < N; ++i)
A[i] = B[i+1] + B[i] + B[i-1];
/* snippet B */
for (int i = 0; i < N; ++i)
A[i] = B[A[i]];
Using the vector instructions and intrinsics we discussed in lecture, which of these snippets is most simple to optimize using vector instructions?
Consider the following C snippets:
/* snippet A */
for (int i = 0; i < N; ++i)
A[i] = A[i] + A[i-1];
/* snippet B */
for (int i = 0; i < N; ++i)
A[i] = A[i] + A[i+1];
Using the vector instructions and intrinsics we discussed in lecture, which of these snippets is most simple to optimize using vector instructions?
After converting a function to use vector instructions that worked on vectors of 8 values, one would expect which of the following to be true about the new function compared to the original? Select all that apply.
Answer the following questions about the lecture material for this week.
When two processes are sharing a single-core processor, usually ____. Select all that apply.
When a program attempts to read from a memory location that the program is not permitted to access, the processor will usually ____.
I had intended this question to primarily be about what the hardware does versus what the software does. A and C are things that would be done because the operating system has code to do that (and different operating system make different choices about what to do); B is something the hardware would almost always do.
When an instruction triggers an exception, the processor should ___. Select all that apply.
Typically, when a program reads input from the keyboard ____. Select all that apply.
Consider a system with 12-bit virtual addresses, 11-bit physical addresses, and 256-byte pages. On this system, page offsets are therefore 8 bits, virtual page numbers are 4 bits and physical page numbers are 3 bits.
Suppose a process's page table is as follows:
virtual page # | valid? | physical page # |
0x0 | 0 | |
0x1 | 1 | 0x2 |
0x2 | 1 | 0x4 |
0x3 | 0 | |
0x4 | 0 | |
0x5 | 0 | |
0x6 | 0 | |
0x7 | 1 | 0x3 |
0x8 | 1 | 0x5 |
0x9 | 1 | 0x6 |
0xa | 0 | |
0xb | 0 | |
0xc | 0 | |
0xd | 0 | |
0xe | 0 | |
0xf | 1 | 0x0 |
If the process attempts to read from address 0x742
, what physical address will it read from?
Write your answer as a hexadecimal number. If an exception will happen instead, write fault
.
If the process attempts to write to address 0x187
, what physical address will it write to?
Write your answer as a hexadecimal number. If an exception will happen instead, write fault
.
Answer the following questions about the lecture material for this week.
Consider a system with 12-bit virtual addresses, 11-bit physical addresses, and 256-byte pages.
Suppose a process's page table is as follows:
virtual page # | valid? | write allowed? | user mode allowed? | physical page # |
0x0 | 0 | |||
0x1 | 1 | 1 | 1 | 0x2 |
0x2 | 1 | 1 | 1 | 0x4 |
0x3 | 0 | |||
0x4 | 0 | |||
0x5 | 0 | |||
0x6 | 0 | |||
0x7 | 1 | 1 | 1 | 0x3 |
0x8 | 1 | 0 | 1 | 0x5 |
0x9 | 1 | 0 | 1 | 0x6 |
0xa | 0 | |||
0xb | 0 | |||
0xc | 0 | |||
0xd | 0 | |||
0xe | 0 | |||
0xf | 1 | 1 | 0 | 0x0 |
Suppose the system using the page table shown above requires that exception handlers be located at a virtual address while processes are running. Based on the page table above, give an example of a virtual address at which the OS might have placed an exception handler. Write your answer as a hexadecimal number.
Suppose a system has 30-bit virtual addresses, 24-bit physical addresses, and 1024-byte pages. If this system uses a page table stored in memory as an array with 4 byte page table entries, how large is this page table in bytes? (Write your answer as a base-10 number.)
Consider a system with:
0x88000
, which is a physical byte addressWhat physical address will the processor access to look up
a page table entry for virtual address 0x12345
?
(Remember to account for each page table entry being 4 bytes.)
Write your answer as a hexadecimal number. If not enough information is given, write unknown
.
Suppose the page table entry retrieved for virtual address 0x12345
has the value 0x88800001
(when interpreted as a 4-byte integer).
What physical address will contain the value stored at virtual address 0x12345
?
If an exception would occur instead, write fault
.
Answer the following questions about the lecture material for week 15.
In lecture, we mentioned a scheme where the operating system can allocate memory on demand. The following questions are about how that scheme might be implemented.
In order to configure a memory region to be allocated on demand, the operating system will set page table entries ____.
When an access is made that requires memory allocation, the allocation will most likely occur
Consider a system that uses a two-level page table structure with 8192-byte pages and 1024 page table entries per table (at each level), where each page table entry is 8 bytes.
Suppose the processor is running with a page table base pointer containing physical (byte)
address 0x44440000
. When looking up the physical address for virtual address 0x12345678
, the processor
determines that the relevant first-level page table entry (that is, the one in the
table pointed to by the page table base pointer) was marked valid and
contained the physical page number 0x1111
,
and the relevant second-level page table entry was marked valid and contained the physical page number
0x4321
.
During the first-level of the lookup described above, from what physical address did the processor read the first-level page table entry? Write your answer as a hexadecimal number.
0x44440000 + (0x12345678 >> 13 >> 10) * 8
(13 to remove the page offset, 10 to remove the second-level's part of the virtual page number, times 8 for the page table entry size)
During the lookup described above, what was the final physical address
the processor determined? (That is, what physical address would the program read from when it read from
virtual address 0x12345678
.)
Write your answer as a hexadecimal number.
(0x4321 << 13) | (0x12345678 & 0x1FFF)
--- (0x4321 << 13)
to shift the physical page number to its location in the address, then or in the lower 13 bits to copy the page offset from the original virtual address
Suppose a system has:
If on this system, the processor looks up virtual address 0x52345
, and:
0x800000
; thenx50000
0x5a345
After the processor performs the lookup described above, it will store information in the TLB at what set index?
After the processor performs the lookup described above, it will store what tag in the corresponding TLB entry?