Answer the following question about the lecture material for this week.
Suppose we have a program which is compiled from C code, but some of the C code is very repetitive. Rather than having the program's source code include a copy of the repetitive C code with the program, the program's authors want to include a Python script that generated that repetitive code, some hand-written C files, and a Makefile to build them all.
Suppose the Python script is called generate_lookup.py
and
when run it outputs the generated C code to its output.
The Makefile writes the output of the Python script to lookup.c
.
The hand-written C files are called lookup.h
and main.c
.
Which of the following rules would be reasonable to include in the corresponding Makefile? ((tab) represents a tab character.) Select all that apply.
Suppose a single-core system is running three single-threaded processes A, B, and C.
The following events happen in the following order:
Which of the above steps are likely to involve making one or more system calls?
Write the numbers of the corresponding steps above; for example, if 1 and 5 involve system calls, write "15".
didn't meant to ask about 3 or 6 since we did not cover signals yet; also accepted 16 or 1
A non-system-call exception must occur during which of the steps above?
Write the numbers of the corresponding steps above; for example, if 1 and 5 must involve non-system-call exception, write "15".
didn't meant to ask about 3 or 6 since we did not cover signals yet; also accepted 456 or 3456 or 345
Consider the following C code snippet running on a Linux system:
char buffer[1024];
int x, y;
fgets(buffer, sizeof buffer, stdin);
x = atoi(buffer);
fgets(buffer, sizeof buffer, stdin);
y = atoi(buffer);
printf("%d / %d = %d\n", x, y, x / y);
Suppose x / y
results in a division by 0. This will cause an exception
to occur. While the handler for that exception is running,
most likely ____. Select all that apply.
Suppose the program above is run so that it takes input from the keyboard and shows output on a screen. When no division by zero occurs, which of the following operations occur in kernel mode when the code above runs? Select all that apply.
Answer the following question about the lecture material for this week.
Consider the following C code snippet running on a Linux system:
char buffer[1024];
int x, y;
fgets(buffer, sizeof buffer, stdin);
x = atoi(buffer);
fgets(buffer, sizeof buffer, stdin);
y = atoi(buffer);
printf("%d / %d = %d\n", x, y, x / y);
On x86-64 Linux, dividing by 0 results in a process being sent
the SIGFPE
(Floating point exception — yes, even for integer
division) exception. Suppose a signal handler for that signal was
registered before the code above ran and divided by 0.
Which of the following would be true about how the signal
handler is run? Select all that apply.
Consider the following a signal handler for SIGINT (triggered by Ctrl+C) that requires Ctrl+C to be pressed twice before actually exiting:
int sigint_counter = 0;
void handle_sigint(int signum) {
if (sigint_counter == 0) {
const char *message = "Press Ctrl+C again to exit!\n";
write(1, message, strlen(message));
sigint_counter = 1;
} else {
_exit(1);
}
}
(The _exit
function is like exit
except that it does not run functions registered with atexit()
or
flush pending output to stdout and other stdio streams.)
The signal handler is installed using the following code:
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sa.sa_handler = &handle_sigint;
sigaction(SIGINT, &sa, NULL);
Which of the following is true about the above signal handler when running on x86-64 Linux? Select all that apply.
POSIX access control lists like described in lecture would be suitable for enforcing which of the following policies on a system shared between students and an instructor? Select all that apply.
In lecture, we mentioned the much more restricted permissions systems that Unix supports more commonly than ACLs. In this system, files have an owner user ID, a group ID, and 9 permission bits (read/write/execute for that user, that group, and others). This permission bit system is much less capable than ACLs. Which of the following policies can be achieved using these permission bits? (Ignore that the special administrator user root/superuser would have more access in all of these cases.) Select all that apply.
Consider the following page table:
virtual page # | valid? | physical page # |
---|---|---|
0 | 0 | --- |
1 | 1 | 0x2 |
2 | 1 | 0x3 |
3 | 1 | 0x9 |
4 | 0 | --- |
5 | 1 | 0x5 |
6 | 1 | 0x1 |
7 | 1 | 0x6 |
Suppose this page table is used on a system with 4096 byte pages. On this system, virtual addresses are divided into a 3-bit page number and a 12-bit page offset and physical addresses are divided into a 4-bit page number and a 12-bit page offset.
If a program reads from virtual address 0x1234, what physical address will it read from?
Write your answer as a hexadecimal number. If an exception would occur instead, write "exception".
Answer the following questions about the lecture material for this week.
Consider a system with a virtual memory system with:
A program tries to read a value from virtual address 0x12
.
The page table base register is set to physical byte address
0x100
, so the processor looks up a page table entry
from physical address ___. Write your answer
as a hexadecimal number. Remember to take into account
that page table entries are two bytes.
Suppose the value of the page table entry (from the previous question) is 0x4307 (when interpreted as an 2-byte integer). The program will end up accessing physical address _____. Write your answer as a hexadecimal number.
Consider a system with a virtual memory system with:
If a program can access 1000 distinct pages of memory (without exceptions occuring), then its page tables must take up at least ____ bytes. Remember to include both its first-level page tables and its second-level page tables.
When a program tries to access a value from memory, the processor:
0x120000
, then0x120008
and that page table entry is valid and contains physical page number 0x123
, then0x123040
and that page table entry is valid and contains physical page number 0x6
, then0x6010
What was the virtual address the program tried to access? Remember to take into account the page table entry size. Write your answer as a hexadecimal number. If not enough information is supplied, write "unknown" and explain in comments.
0x120008 has page offset 0x8 --> index 1 --> VPN part 1 is 1
0x123010 has page offset 0x040 --> index 0x40 / 8 = 8--> VPN part 2 is 0x8
0x6010 has page offset 0x10 --> page offset of VA is 0x10
combining these we get (0x1 << (12+9)) + (0x8 << 12) + 0x10
Suppose a system uses 4096-byte pages, so addresses have 12-bit page offsets.
Assume the system running this program is as lazy as possible in setting up page table entries -- it waits until a program accesses a virtual page to setup the page table entry for that page, following the allocate/load-on-demand strategy we discussed in lecture. (Perhaps a better strategy would be somewhere in between this lazy strategy and the strategy of loading everything in advance.)
The program starts, with all its pages initially unallocated (marked invalid in its page table). It then runs a function, whose machine code is loaded at addresses 0x2040-0x2072, which writes 3 8-byte values to the stack at addresses 0xFFF8, 0xFFF0, and 0xFFE8.
As a result of the above operations, how many page faults will occur? (If not enough information is given write unknown and explain in the comments.)
one for 0x2000-0x2FFF, one for 0xF000-0xFFFF
Answer the following questions about the lecture material for this week.
Suppose a system has:
Suppose when looking up the virtual address 0x8001 on this system, the first-level page table entry the processor looks up has the value 0x1231. What is the (physical) address of the first byte of the second-level page table this processor will use?
Consider a network which provides the best-effort "mailbox" model gaurentees we discussed in lecture.
Suppose we try to send data from machine A to machine B that doesn't fit in a single message in parts using the following broken protocol:
Suppose we send the data "ABCD" by splitting it into the four messages, "A", "B", "C", and "D". If the network does not corrupt, delay, or reorder messages, but does lose some messages, which of the following is it possible for machine B to think it was sent when it tries to reconstruct the data? Select all that apply.
My intention in writing this question was that "machine B receives all four parts" meant "machine B receives what it thinks are all four parts". Under this interpretation A and C are possible.
But some interpreted it as machine B having a way to verify it received all parts (despite not using this knowledge to reconstruct the message). Under this interpretation A and C are not possible.
If the network loses every fourth message a machine sends, then how many messages will machine end up sending?
I meant to write "how many messages will machine A end up sending".
Assuming "received all four parts" just means that machine B gets four messages: Machine A will send A B C D and D will be lost. Then machine A will send A B C D again. Machine B will acknowledging receiving "ABCA". Machine A has sent 8 messages at this point and machine B has sent 1 (so total 9 messages sent).
If "received all four parts" includes machine B having a way to know whether it actually received four distinct messages, then it will never receive part D (it's lost on every resend), so it an arbitrarily large number of messages will be sent. (This scenario does not seem consistent with the incorrect message reconstruction from the previous question.)
In the layers we discuss, a message sent on the link layer will have a destination address for the link layer in its frame. In Ethernet, this destination address is a MAC address. This destination address will identify where the message should go for the purpose of the link layer's functionality.
Suppose a program sends a frame containing message that is going to be sent to a faraway machine. In order to get to that machine, the data will first be received by a router on the local network, then that router will send it to another router, which will send it to another router, which will send it a final router. That final router will then send the data to the final machine.
When the message is initially sent (before it is forwarded by the first router), what best describes what the link-layer destination address that accompanies it will contain?
Consider the scenario of the previous question.
In addition to the destination address for the link layer, the message will also contain a destination address for the network layer. The network layer also has an destination address in its packets. For example, in IP version 4, this would be an IP version 4 address (like 128.148.63.2).
When the message is initially sent (before it is forwarded by the first router), what best describes what the network-layer destination address that accompanies it will contain?
Answer the following questions about the lecture material for this week.
Consider the following code where socket_fd
is
the file descriptor for a UDP socket
(that has already been configured, including
having its default destination address set
with connect
):
write(socket_fd, "FIRST", strlen("FIRST"));
write(socket_fd, "SECOND", strlen("SECOND"));
write(socket_fd, "THIRD", strlen("THIRD"));
Assuming the writes above do not fail, when code running on the other end runs:
char buffer1[100] = "initial value";
char buffer2[100] = "initial value";
read(other_socket_fd, buffer1, sizeof buffer1);
read(other_socket_fd, buffer2, sizeof buffer2);
Assuming messsage data is not corrupted on the network (despite
the lack of end-to-end checksums), then, which of the following are possible outcomes
in buffer1
and buffer2
? Select all that apply.
Suppose www.foo.com
's DNS server when queried returns
a record containing the IP address for www.foo.com
specifying a time-to-live of 10000 seconds.
We mentioned in lecture that an ISP's DNS server could obtain this record by first querying one of the "root" DNS servers. This root server would most likely ____.
www.foo.com
is accessed many times over the course of 5 hours
by each of 1000 users.
These 1000 users a split evenly across 10 different ISPs.
Each of these ISPs runs their own DNS server for their
users that caches the returned records for
as long as possible.
About how many queries for the IP address
of www.foo.com
will www.foo.com
's DNS server get?
1 query per ISP near beginning of 5 hours, then another 10000 seconds later (= approx 2.8 hours) which less than 4 hours
Suppose A and B each prepare to perform secure communications with each other as follows:
After this preparation is complete, we would expect A to have ____. Select all that apply.
Consider the following (broken) scheme for allowing users to run a command on a remote server:
The system administrator sets up MAC keys for every user of the system. These keys are distributed securely to those users when their account is setup and are used in addition to passwords.
When a user asks to login to the server remotely:
Which of the following are scenarios this protocol prevents? Select all that apply.
Answer the following questions about the lecture material for this week.
In lecture, we discussed certificates used to verify website identities for TLS.
As typically used in TLS, the web server will have a certificate, which ___. Select all that apply.
In lecture, we mentioned key agreement protocols where two parties A and B would send each other key shares derived from secret values, then use the other's key share and their secret value to construct a shared secret value (known by A and B, but not anyone else).
Suppose A and B use key agreement as follows:
If other steps are not taken, this design would allow an attacker with sufficient access to the network to ___. Select all that apply.
Consider two strategies for eliminating duplicates from an array of values:
Which of the two approaches would one expect to exhibit better spatial locality?
In approach 1, at least when there are many duplicates to eliminate, the steps where the values are looked up in the hashtable would likely have significant ____. Select all that apply.
In approach 1, a hash function that better distributes values in the hashtable and better avoids hash collisions would likely ____. Select all that apply.
Suppose a 8-set direct mapped cache with 4B blocks has the following contents:
set index | tag (written in binary) | valid | data bytes (in hexadecimal; lowest address first) |
0 | — | 0 | — |
1 | 001101 | 1 | 02 FF 41 02 |
2 | 001101 | 1 | 01 11 23 45 |
3 | 011000 | 1 | 33 55 77 9A |
4 | 000000 | 1 | 00 01 02 03 |
5 | 000001 | 1 | F0 80 90 01 |
6 | — | 0 | — |
7 | — | 0 | — |
With the above cache accessing a two-byte integer from an address with tag 001101 (written in binary), set index 4, and offset 2 will result in a cache miss. As a result of the cache miss, what bytes currently stored in the cache will be replaced?
Answer the following quesitons about the lecture material for this week.
Consider the following C code:
unsigned char array[12];
...
a = array[0]; // (1)
b = array[0]; // (2)
sum += a * b;
a = array[4]; // (3)
b = array[1]; // (4)
sum += a * b;
a = array[8]; // (5)
b = array[2]; // (6)
For the following questions assume that:
array
starts at an address that is a multiple of 4096 (2 to the 12th power)array
use the data cachearray
With a direct-mapped 4-byte data cache with 2-byte blocks, the accesses labeled (1), (3), (4), (5) and (6)
above will all be cache misses. The access labeled (1)
is a compulsory miss also known as a cold miss. Which
other misses are also compulsory or cold misses?
Select all that apply.
Consider 2-way (fully associative) 8-byte data cache with 4-byte blocks and an least-recently-used replacement policy. The array access labeled (1) above will be cache miss and the access labeled (2) will be a cache hit. Which other accesses will be cache hits? Select all that apply.
Consider 2-way (fully associative) 8-byte cache with 4-byte blocks and first-in, first-out replacement policy The array access labeled (1) above will be cache miss and the access labeled (2) will be a cache hit. Which other accesses will be cache hits? Select all that apply.
(In a first-in, first-out replacement policy, the block replaced longest ago ("first-in") in a set is the next one replaced.)
Consider the following C snippet:
unsigned char array1[4096];
unsigned char array2[4096];
...
for (int i = 0; i < 2048; ++i) {
array1[i] = array2[i + 2048] * array2[i];
}
For the following questions, assume:
array1
and array2
are both located at addresses that are multiples of 32768.If the data cache has a write-through policy, during the execution of for loop,
how many times will the cache read from or write to the next level of cache (or memory if there is no next level of cache)
in order to read or write values in array1
?
assuming that the compiler doesn't do anything that would reduce the number of writes to array1
If the data cache has a write-back policy, during the execution of the for loop,
how many times will the cache read from or write to the next level of cache (or memory if there is no next level of cache)
in order to store values in array1
?
should have specified whether write-allocate or write-no-allocate since it dramatically affects the answer; we accepted both answers; also assuming the compiler doesn't do anything to reduce the number of writes to array1; gave half-credit for 0 (forgetting about reads, but knowing no writes) or 64 (reads for array2 only)
Consider a system with:
Suppose this system runs an instruction like:
movq 0x12345, %rax
which loads 8 bytes from 0x12345 into %rax
.
Suppose while this instruction runs, all related TLB and cache accesses are hits.
As part of running this instruction the system will ____. Select all that apply.
Accessing the virtual address 0x12345
will use the same TLB set as
accessing which of the following virtual addresses? Select all that apply.
The amount of storage required to implement the data TLB is ____.
Consider the following C code that uses the POSIX API:
printf("1"); fflush(stdout);
pid_t pid1 = fork();
pid_t pid2 = fork();
if (pid1 > 0 && pid2 > 0) {
printf("2"); fflush(stdout);
waitpid(pid1, NULL, 0);
waitpid(pid2, NULL, 0);
printf("3"); fflush(stdout);
}
printf("4"); fflush(stdout);
exit(0);
Which of the following are possible outputs of the above? Select all that apply.
Consider the following C code that uses the POSIX API:
int pipe_fds[2];
pipe(pipe_fds);
int out_fd = open("foo.txt", O_WRONLY | O_CREAT);
dup2(pipe_fds[1], out_fd);
dup2(pipe_fds[0], STDIN_FILENO);
Which (if any) of the following are true after this code snippet runs,
assuming pipe
, open
, and dup2
do not fail?
Select all that apply.
Answer the following questions about the lecture material for this week.
Consider the following C code:
void *thread_func(void *arg) {
int *p;
p = (int*) arg;
*p = 1; /* <----------------------------------- LINE A */
return (void*) (p + 2);
}
void foo() {
int array[4] = {0, 0, 0, 0};
pthread_t thread_ids[2];
for (int i = 0; i < 2; i += 1) {
int *q = &array[i];
pthread_create(&thread_ids[i], NULL, thread_func, (void*) q);
}
for (int i = 0; i < 2; i += 1) {
int *r = NULL;
pthread_join(&thread_ids[i], (void**) &r);
*r = 2; /* <----------------------------------- LINE B */
}
printf("%d %d %d %d\n", array[0], array[1], array[2], array[3]);
}
When the function foo()
runs, the line labeled LINE A will
most likely modify a value stored _____.
When the function foo()
runs, the line labeled LINE B will
most likely modify values stored _____.
Consider the following C code:
struct info {
int *array;
int start;
int end;
};
void *thread_func(void *arg) {
struct info *info;
info = (struct info*) arg;
for (int i = info->start; i < info->end; i += 1) {
info->array[i] *= 2;
}
return NULL;
}
void double_all(int *array, int size) {
pthread_t threads[2];
for (int i = 0; i < 2; i += 1) {
struct info cur_info;
cur_info.array = array;
cur_info.start = i * (size / 2):
cur_info.end = (i + 1) * (size / 2);
pthread_create(&threads[i], NULL, thread_func, &cur_info);
}
for (int i = 0; i < 2; i += 1) {
pthread_join(&threads[i], NULL);
}
}
double_all
is intended to multiple every element of its input array by 2, where the
array is specified using a pointer to its first element and its size. The implementation
tries to use two threads to take advantage of multiple cores to do this, computing a range
of the array (specified by start
and end
) for each thread to work on. But the
above code is buggy. Which of the following are bugs in the code? Select all that apply.
Consider the following C code:
struct dynamic_array {
int *data;
size_t size;
pthread_mutex_t lock;
};
void append_to_array(struct dynamic_array *dest, struct dynamic_array *to_add) {
pthread_mutex_lock(&dest->lock);
pthread_mutex_lock(&to_add->lock);
int *new_data;
size_t new_size = (dest->size + to_add->size) * sizeof(int);
/* BEFORE REALLOC */
new_data = realloc(dest->data, new_size);
/* was incorrectly new_data = realloc(dest, new_size) in quiz as released */
if (!new_data) { abort(); /* out of memory */ }
dest->data = new_data;
/* AFTER REALLOC */
memcpy(dest->data + dest->size, to_add->data, to_add->size * sizeof(int));
/* was incorrectly dest->new_data + dest->size in quiz as released */
dest->size += to_add->size;
pthread_mutex_unlock(&to_add->lock);
pthread_mutex_unlock(&dest->lock);
}
For this question, assume that all code follows the convention that dynamic_array
structs are only read or modified while holding the lock in the struct. (That is, only after a locking the lock and without unlocking the lock until after the read/modification is done.)
In the append_to_array
function, the lock for to_add
is kept locked even while the realloc
operation is run,
even though that operation does not involve to_add
.
An idea one might have to take advantage of this is to temporarily release
the lock on to_add
by inserting a pthread_mutex_unlock(&to_add->lock)
in place of the comment /* BEFORE REALLOC */
and
inserting pthread_mutex_lock(&to_add->lock)
in place of the coment
/* AFTER REALLOC */
.
It might seem that this would allow more threads to work in parallel.
However, this change would allow for some incorrect behavior.
Which of the following problems could we expect as a result of
the attempt to temporarily release the to_add
lock
described above? Select all that apply.
Answer the following questions about the lecture material for this week.
Consider the following C code:
struct dynamic_array {
int *data;
size_t size;
pthread_mutex_t lock;
};
void append_to_array(struct dynamic_array *dest, struct dynamic_array *to_add) {
pthread_mutex_lock(&dest->lock);
pthread_mutex_lock(&to_add->lock);
int *new_data;
size_t new_size = (dest->size + to_add->size) * sizeof(int);
new_data = realloc(dest->data, new_size);
if (!new_data) { abort(); /* out of memory */ }
dest->data = new_data;
memcpy(dest->new_data + dest->size, to_add->data, to_add->size * sizeof(int));
dest->size += to_add->size;
pthread_mutex_unlock(&to_add->lock);
pthread_mutex_unlock(&dest->lock);
}
It is possible for two simulatenous calls to append_to_array
to hang indefinitely.
Which, if any, of the following are examples of sets of simulatenous calls where this could happen? (Assume A, B, C, D, etc. represent distinct dynamic_array struct pointers.)
Select all that apply.
Which of the following changes to the above code would prevent the deadlock? Select all that apply.
Suppose we want to implement a variant of the producer-consumer pattern where producing threads can produce two types of values (type A and type B), and consumer threads can either wait for a value of a particular type A OR they can wait for a value of both type A and type B. When the threads wait, they should wait without consuming a lot of processor time.
To implement this, we'll use a Queue implementation that
provides queue_enqueue
, queue_dequeue
,
and queue_size
operations.
The following quesitons ask about the following incompete implementation.
Blanks (formatted like _____________
) represent omitted code:
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
Queue a_values;
pthread_cond_t a_available = PTHREAD_COND_INITIALIZER;
Queue b_values;
pthread_cond_t both_available = PTHREAD_COND_INITIALIZER;
void EnqueueA(Item item) {
pthread_mutex_lock(&lock);
queue_enqueue(&a_values, item);
_________________________________ /* BLANK 1 */
pthread_mutex_unlock(&lock);
}
void EnqueueB(Item item) {
pthread_mutex_lock(&lock);
queue_enqueue(&b_values, item);
pthread_cond_signal(&both_available);
pthread_mutex_unlock(&lock);
}
Item DequeueA() {
pthread_mutex_lock(&lock);
while (queue_size(a_values) == 0) {
pthread_cond_wait(&a_available, &lock);
}
Item result = queue_dequeue(&a_values); /* was b_values when quiz originally released */
pthread_mutex_unlock(&lock);
}
void DequeueBoth(Item *a_item_ptr, Item *b_item_ptr) {
pthread_mutex_lock(&lock);
while (_______________________________) { /* BLANK 2 */
pthread_cond_wait(_________________ /* BLANK 3 */, &lock);
}
*a_item_ptr = queue_dequeue(&a_values);
*b_item_ptr = queue_dequeue(&b_values);
pthread_mutex_unlock(&lock);
}
Which of the following would be most appropriate to place in the blank labeled "BLANK 1"?
Which of the following would be most appropriate to place in the blank labeled "BLANK 2"?
Suppose we have a system where three threads are running in parallel and we want a function to send a value from thread 0 to thread 1, and another value from thread 1 to thread 2, and another value from thread 2 to thread 0. To do this, each thread calls a function like:
other_threads_value = SendAndReceiveValue(thread_id, my_value);
where my_value
is the current thread's value and thread_id
is 0, 1, or 2, depending on which thread is making the call.
Functions can call SendAndReceiveValue multiple times; each time they call it they should receive values from the following call to SendAndReceiveValue.
Consider the following implementation of this function using semaphores:
int values[3];
sem_t value_ready[3];
sem_t value_empty[3];
int SendAndReceiveValue(int thread_id, int value) {
int other_thread_id = (thread_id + 1) % 3;
sem_wait(&value_empty[other_thread_id]);
values[other_thread_id] = value;
sem_post(&value_ready[other_thread_id]);
sem_wait(&value_ready[thread_id]);
int received_value = values[thread_id];
sem_post(&value_empty[thread_id]);
return received_value;
}
What should the initial value of the semaphore value_ready[0]
be?
(By "value", we mean the non-negative integer stored by the semaphore.)
What should the initial value of the semaphore value_empty[0]
be?
(By "value", we mean the non-negative integer stored by the semaphore.)
Answer the following questions about the lecture for this week.
Suppose we build a pipelined processor with the five-stage design we discussed in lecture and, when run alone:
If the above processor is running an assembly program
that includes the instruction addq %r8, %r9
(and there are no relevant data hazards or control hazards),
then this instruction would start
its writeback stage approximtely ___ ps
after it starts its fetch stage.
500 * 4
For this question, consider a processor uses pipelining with 6 pipeline stages and a cycle time of 600 picoseconds (.000 000 6 milliseconds).
Suppose we execute a program with 100000000 (100 million) instructions on this processor and the program experiences no data or control hazards. How long in microseconds, rounded to the nearest millisecond, would the program take to execute on the pipelined processor?
meant to always use milliseconds
Suppose we modified this processor to have 5 pipeline stages by combining its third and fourth pipeline stages. Which of the following would likely be true about the modified processor? Select all that apply.
Suppose when running a benchmark program on a pipelined processor we discover that:
The pipelined processor has a 1000 ps cycle time.
On average, when running the benchmark program, the processor will complete an instruction approximately every ____ ps.
(1 + .05 * 2 + .10 * 1) * 1000
Consider a nine-stage pipelined processor with the following pipeline stages:
This processor uses forwarding combined with (when necessary) stalling to resolve data hazards.
When running the assembly:
addq %r8, %r9
xorq %r8, %r11
subq %r9, %r10
on this nine-stage processor, the subq
instruction will complete ____ cycles
after the addq
instruction completes.
(Assume that any instructions the processor is running not included in the snippet above:
%r8
, %r9
, %r10
, or %r11
; and)
Answer the following questions about the lecture material for this week.
For the following question, consider a six-stage pipelined processor with the following pipeline stages:
For the two execute stages:
Suppose the processor is running the following assembly snippet:
movq (%r8), %r9
subq $128, %r9
jle foo
addq %r9, %r10
and the jle
is mispredicted.
During the execution of the above assembly snippet,
the addq
instruction will finish ___ cycles after
the movq
instruction.
the subq will stall for two cycles + two cycle delay for the jle + 2 instructions in between + 1 for addq to complete; also accepted 8 under the interpretation that the jle would need to stall (in decode or execute 1) to get the condition code values from the subq; also accepted 8 (jle stalling to get condition codes)
Consider the following assembly snippet:
movq $4, %rax // rax <- 4
start_loop:
movq %r8, %r9 // r9 <- r8
andq $3, %r9 // r9 <- r9 & 3
cmpq $0, %r9 // set condition codes using r9 - 0
jne skip // if (r9 != 0) goto skip
addq $1, %r8 // r8 <- r8 + 1 ***
skip:
subq $1, %rax // rax <- rax - 1
cmp $0, %rax // set condition codes using rax - 0
jne start_loop // if (rax != 0) goto start_loop
end_loop:
For the questions below, assume %r8
has an initial value of 4
every time the snippet above is run, in this case:
the loop (marked by the start_loop
and end_loop
labels) will execute 4 times.
during the first time the loop is executed, the addq $1, %r8
instruction will be executed
during the other times the loop is executed, the addq $1, %r8
instruction will not be executed
If the processor running this snippet uses the forwards-not-taken, backwards-taken branch prediction
strategy discussed in lecture, then the conditional jumps (jne skip
and jne start_loop
) will
be predicted correctly ___% of the time. Round to the nearest percent.
jne skip predicted as not taken; is not taken 1/4 times; jne start_loop predicted as taken, is taken 3/4 times
Suppose the processor running this snippet predicts each conditonal jump as taken if it was taken the last time it was run, and predicts it as not taken if it was not taken the previous time it was run.
Assume the processor just ran the snippet and then runs the snippet a second time (that is, there is some loop surrounding the snippet). During the second execution, the conditional jumps in the snippet will be predicted correctly ____% of the time. Round to the nearest percent.
since jne skip pattern is NTTT, predictions will be TNTT (first prediction from last iteration of previous execution); correct 50% of time; since jne start_loop pattern is TTTN, predictions will be NTTN; correct 50% of time
Consider the following assembly snippet:
mov (%rax), %r9
add %r9, %r10
mov (%rbx), %r9
add %r9, %r11
mov (%rcx), %r9
add %r9, %r12
xor %r10, %r11
Consider an out-of-order processor design like we discussed in lecture. Assume that can perform multiple arithmetic operations at a time and can perform multiple loads at a time. (Its data cache is capable of doing multiple reads at a time, even if they are started in the same cycle.)
On that processor, which of the following is true about how this code could be executed? Select all that apply.
Consider an out-of-order processor design similar to what we discussed in lecture, with execution units supporting running several arithmetic instructions at a time.
Suppose after register renaming, the end of the instruction queue for the processor contains:
add %x10, %x19 -> %x20
sub %x20, %x21 -> %x22
xor %x18, %x22 -> %x24
imul %x18, %x18 -> %x25
where %x...
represents a physical register name. Assume that there are also
other instructions in the instruction queue, not shown above.
What of the following are true about these instructions and how they could be run on this processor? Select all that apply.
Answer the following questions about the lecture for this week.
Consider the following assembly snippet:
addq %rcx, %rax
subq %rcx, %r9
xorq %rax, %r8
movq (%r8), %r10
imulq %r10, %r9
imulq %rax, %rax
On an out-of-order processor, the xorq
instruction could do its computation at the same time as ____ is
doing is computation. Select all that apply.
Consider the following assembly snippet:
cmpq %rax, %rbx
jle other
addq %r12, %rdx
movq (%rdx), %r8
subq %r8, %r9
jmp after
other:
addq %r13, %rcx
movq (%rcx), %r8
addq %r8, %r9
after:
Suppose this executes on an out-of-order processor, and
the jle other
is predicted as taken, but is actually
not taken.
Then, ____. Select all that apply.
Consider the following C code:
int IsPowerOfTwo(unsigned long x) {
while (x > 1) {
if ((x & 1) == 1) {
return 0;
}
x = x >> 1;
}
return 1;
}
Assume this is translated to assembly in a manner that does omit any bitwise or arithmetic operations.
Suppose IsPowerOfTwo(a)
takes substantially longer
to execute than IsPowerOfTwo(b)
for some non-zero a
and non-zero b
.
(Note that IsPowerOfTwo's return value is not specified.)
Based on the time difference between IsPowerOfTwo(a)
and
IsPowerOfTwo(b)
,
what is most likely true about a
and b
? Select all that apply.
meant to write "does not omit" any bitwise or arithmetic operations instead, and question does not make much sense with that statement; but I'm guessing it did not affect answers much
Consider the following C snippet:
t = lookupTable[x];
where lookupTable
is an array of 16384 chars located in memory at
address 0x20CD00
.
Suppose using the PRIME+PROBE strategy discused in lecture, we determine that
the access to lookupTable
modifies set 23 of a 65536-byte 4-way set-associative cache
with 256-byte blocks.
What is a possible value of x
?
To simplify this problem, assume all addresses are physical.
0x20CD00 starts at cache index 13 offset 0, to get to index 23 offset 0 we need to advance by 10*256 bytes.
Consider the following C code:
if (x < 1024) {
char y = array1[x];
if (y < 64) {
foo(array2[y*2048]);
}
}
where array1 and array2 are arrays of char
s.
If an attacker can control x
, they can use the Spectre vulnerability to read
from memory using this code.
Suppose array1
is located at address 0x1000000
,
array2
is located at address 0x2000000
.
To simplfiy the problem, assume all addresses are physical.
The attacker will need to set x
to a value related to the memory
location that they want to read from.
If they want to get information about the int
at memory address 0x4000000
,
they should set x
to ____. Write your answer as a hexadecimal.