CS3130 Spring 2023 quizzes

Question 1 (5 points)

Suppose we have a program which is compiled from C code, but some of the C code is very repetitive. Rather than having the program's source code include a copy of the repetitive C code with the program, the program's authors want to include a Python script that generated that repetitive code, some hand-written C files, and a Makefile to build them all.

Suppose the Python script is called generate_lookup.py and when run it outputs the generated C code to its output. The Makefile writes the output of the Python script to lookup.c. The hand-written C files are called lookup.h and main.c.

Which of the following rules would be reasonable to include in the corresponding Makefile? ((tab) represents a tab character.) Select all that apply.

lookup.c: lookup.h main.c
(tab)$(CC) $(CFLAGS) -o lookup.c -c lookup.h
lookup.c: generate_lookup.py
(tab)python generate_lookup.py >lookup.c
(tab)$(CC) $(CFLAGS) -o lookup.o lookup.c
lookup.c: generate_lookup.py
(tab)python generate_lookup.py >lookup.c
lookup.o: lookup.c lookup.h
(tab)$(CC) $(CFLAGS) -o lookup.o -c lookup.c
lookup.o: lookup.c lookup.h
(tab)python generate_lookup.py >lookup.c
(tab)$(CC) $(CFLAGS) -o lookup.o -c lookup.c

Suppose a single-core system is running three single-threaded processes A, B, and C.

The following events happen in the following order:

Process A asks to read from the keyboard. No input is available initially.
Process B does some computation.
Process B sends a signal to process C.
A key is pressed on the keyboard, causing process A to run.
Process A accesses an invalid memory location and is terminated as a result.
Process C's signal handler runs and prints a message to the screen.

Question 2 (4 points) (see above)

Which of the above steps are likely to involve making one or more system calls?

Write the numbers of the corresponding steps above; for example, if 1 and 5 involve system calls, write "15".

Answer:

Question 3 (4 points) (see above)

A non-system-call exception must occur during which of the steps above?

Write the numbers of the corresponding steps above; for example, if 1 and 5 must involve non-system-call exception, write "15".

Answer:

Consider the following C code snippet running on a Linux system:

char buffer[1024];
int x, y;
fgets(buffer, sizeof buffer, stdin);
x = atoi(buffer);
fgets(buffer, sizeof buffer, stdin);
y = atoi(buffer);
printf("%d / %d = %d\n", x, y, x / y);

Question 4 (3 points) (see above)

Suppose x / y results in a division by 0. This will cause an exception to occur. While the handler for that exception is running, most likely ____. Select all that apply.

local variables from the printf function that don't fit in registers will be stored on the stack (of the process that was running the snippet above)
the local variable buffer will be stored on the stack (of the process that was running the snippet above)
the processor will be in kernel mode

Question 5 (5 points) (see above)

Suppose the program above is run so that it takes input from the keyboard and shows output on a screen. When no division by zero occurs, which of the following operations occur in kernel mode when the code above runs? Select all that apply.

adjusting the stack pointer to make room for buffer
reading input from the keyboard as part of the implementation of fgets
converting buffer from a string to an integer in atoi
the conversion of x from an integer to a string to print out
returning from the printf function

quiz for week 3

Consider the following C code snippet running on a Linux system:

char buffer[1024];
int x, y;
fgets(buffer, sizeof buffer, stdin);
x = atoi(buffer);
fgets(buffer, sizeof buffer, stdin);
y = atoi(buffer);
printf("%d / %d = %d\n", x, y, x / y);

Question 1 (4 points) (see above)

On x86-64 Linux, dividing by 0 results in a process being sent the SIGFPE (Floating point exception — yes, even for integer division) exception. Suppose a signal handler for that signal was registered before the code above ran and divided by 0. Which of the following would be true about how the signal handler is run? Select all that apply.

the kernel would change the exception table to point to the signal handler
the signal handler would run in kernel mode
the value of x would be on the stack or in a register while the signal handler runs
if the signal handler exited the process, then printf would output something before the process terminated

Question 2 (3 points)

Consider the following a signal handler for SIGINT (triggered by Ctrl+C) that requires Ctrl+C to be pressed twice before actually exiting:

int sigint_counter = 0;
void handle_sigint(int signum) {
    if (sigint_counter == 0) {
        const char *message = "Press Ctrl+C again to exit!\n";
        write(1, message, strlen(message));
        sigint_counter = 1;
    } else {
        _exit(1);
    }
}

(The _exit function is like exit except that it does not run functions registered with atexit() or flush pending output to stdout and other stdio streams.) The signal handler is installed using the following code:

struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sa.sa_handler = &handle_sigint;
sigaction(SIGINT, &sa, NULL);

Which of the following is true about the above signal handler when running on x86-64 Linux? Select all that apply.

if the write system call takes a long time and the signal handler interrupted a long computation, that computation could proceed in the background while the write is happening
when the signal handler is run the first time, the variable sigint_counter is stored on the stack
if code using %rdi is running and is interrupted to run the signal handler and then resumed, the value of %rdi after the code is resumed will be the same as before it was interrupted

Question 4 (3 points)

In lecture, we mentioned the much more restricted permissions systems that Unix supports more commonly than ACLs. In this system, files have an owner user ID, a group ID, and 9 permission bits (read/write/execute for that user, that group, and others). This permission bit system is much less capable than ACLs. Which of the following policies can be achieved using these permission bits? (Ignore that the special administrator user root/superuser would have more access in all of these cases.) Select all that apply.

allowing one user to read a file but not change its permission bits or write it, and allow others to write the file
allowing one user to read or write a file or change the permission bits on a file and allowing all other users to only read the file
allowing one group to read a file and another group to write a file

Consider the following page table:

virtual page #	valid?	physical page #
0	0	---
1	1	0x2
2	1	0x3
3	1	0x9
4	0	---
5	1	0x5
6	1	0x1
7	1	0x6

Suppose this page table is used on a system with 4096 byte pages. On this system, virtual addresses are divided into a 3-bit page number and a 12-bit page offset and physical addresses are divided into a 4-bit page number and a 12-bit page offset.

Question 5 (4 points) (see above)

If a program reads from virtual address 0x1234, what physical address will it read from?

Write your answer as a hexadecimal number. If an exception would occur instead, write "exception".

Answer:

quiz for week 4

Consider a system with a virtual memory system with:

a single-level page table
16-byte pages and therefore 4-bit page offsets
8-bit virtual addresses, divided up into a 4-bit virtual page number and 4-bit page offset
12-bit physical addresses, divided up into a 8-bit physical page number and 4-bit page offset
page table entries stored as 2-byte integers where:
- the most significant 8 bits are the physical page number
- the least significant bit is the valid bit
- the second least significant bit is the writeable bit
- the third least significant bit is the user-mode accessible bit
- the remaining bits are unused

Question 1 (3 points) (see above)

A program tries to read a value from virtual address 0x12. The page table base register is set to physical byte address 0x100, so the processor looks up a page table entry from physical address ___. Write your answer as a hexadecimal number. Remember to take into account that page table entries are two bytes.

Answer:

Question 2 (3 points) (see above)

Suppose the value of the page table entry (from the previous question) is 0x4307 (when interpreted as an 2-byte integer). The program will end up accessing physical address _____. Write your answer as a hexadecimal number.

Answer:

Consider a system with a virtual memory system with:

a 2-level page table table
4096-byte pages and therefore 12-bit page offsets
8-byte page table entries
4096-byte page tables at each level,
- each of which starts at the beginning of a physical page, and
- each of which contains 512 page table entries
36-bit physical addresses
30-bit virtual addreses, divided up into a 18-bit virtual page number (which is divided into two equal parts for the page lookup) and a 12-bit page offset

Question 3 (4 points) (see above)

If a program can access 1000 distinct pages of memory (without exceptions occuring), then its page tables must take up at least ____ bytes. Remember to include both its first-level page tables and its second-level page tables.

Answer:

Question 4 (4 points) (see above)

When a program tries to access a value from memory, the processor:

reads its page table base register and gets the physical address 0x120000, then
reads a page table entry from physical address 0x120008 and that page table entry is valid and contains physical page number 0x123, then
reads a page table entry from physical address 0x123040 and that page table entry is valid and contains physical page number 0x6, then
reads the value from physical address 0x6010

What was the virtual address the program tried to access? Remember to take into account the page table entry size. Write your answer as a hexadecimal number. If not enough information is supplied, write "unknown" and explain in comments.

Answer:

Suppose a system uses 4096-byte pages, so addresses have 12-bit page offsets.

Assume the system running this program is as lazy as possible in setting up page table entries -- it waits until a program accesses a virtual page to setup the page table entry for that page, following the allocate/load-on-demand strategy we discussed in lecture. (Perhaps a better strategy would be somewhere in between this lazy strategy and the strategy of loading everything in advance.)

The program starts, with all its pages initially unallocated (marked invalid in its page table). It then runs a function, whose machine code is loaded at addresses 0x2040-0x2072, which writes 3 8-byte values to the stack at addresses 0xFFF8, 0xFFF0, and 0xFFE8.

Question 5 (4 points) (see above)

As a result of the above operations, how many page faults will occur? (If not enough information is given write unknown and explain in the comments.)

Answer:

quiz for week 5

Suppose a system has:

four-level page tables
16-bit virtual addresses
16-byte pages, and therefore 4-bit page offsets
page tables at each level which contain 8 2-byte page table entries
the 2-byte page table entries contain a 1-bit valid bit in the least significant bit, a 12-bit physical page number in the most significant bits, and the rest is unused

Question 1 (4 points) (see above)

Suppose when looking up the virtual address 0x8001 on this system, the first-level page table entry the processor looks up has the value 0x1231. What is the (physical) address of the first byte of the second-level page table this processor will use?

Answer:

Consider a network which provides the best-effort "mailbox" model gaurentees we discussed in lecture.

Suppose we try to send data from machine A to machine B that doesn't fit in a single message in parts using the following broken protocol:

machine A splits the data into four parts, and sends each of the parts in a separate message; then
once machine B receives all four parts, it sends an acknowledgment that all the messages were received
if machine A doesn't get an acknowledgment after a certain amount of time, it sends all four parts again, repeating indefinitely
machine B takes the last four message it gets before sending an acknowledgment and combines them (in order it received them) to reconstruct the data. If machine B doesn't get all four messages, it waits indefinitely

Question 2 (4 points) (see above)

Suppose we send the data "ABCD" by splitting it into the four messages, "A", "B", "C", and "D". If the network does not corrupt, delay, or reorder messages, but does lose some messages, which of the following is it possible for machine B to think it was sent when it tries to reconstruct the data? Select all that apply.

AAAA
ABC
ABCA
DABC

Question 3 (4 points) (see above)

If the network loses every fourth message a machine sends, then how many messages will machine end up sending?

Question 4 (4 points) (see above)

In the layers we discuss, a message sent on the link layer will have a destination address for the link layer in its frame. In Ethernet, this destination address is a MAC address. This destination address will identify where the message should go for the purpose of the link layer's functionality.

Suppose a program sends a frame containing message that is going to be sent to a faraway machine. In order to get to that machine, the data will first be received by a router on the local network, then that router will send it to another router, which will send it to another router, which will send it a final router. That final router will then send the data to the final machine.

When the message is initially sent (before it is forwarded by the first router), what best describes what the link-layer destination address that accompanies it will contain?

it will contain an address for the machine that originally sent the message
it will contain an address for the first router that will accept the message
it will contain the address for the final router that will accept the message
it will contain an address for the final destination machine
it will identify which program of several on the destination machine will be using the data the frame contains

Question 5 (4 points) (see above)

Consider the scenario of the previous question.

In addition to the destination address for the link layer, the message will also contain a destination address for the network layer. The network layer also has an destination address in its packets. For example, in IP version 4, this would be an IP version 4 address (like 128.148.63.2).

When the message is initially sent (before it is forwarded by the first router), what best describes what the network-layer destination address that accompanies it will contain?

it will contain an address for the machine that originally sent the message
it will contain an address for the first router that will accept the message
it will contain the address for the final router that will accept the message
it will contain an address for the final destination machine
it will identify which program of several on the destination machine will be using the data the frame contains

quiz for week 6

Question 1 (4 points)

Consider the following code where socket_fd is the file descriptor for a UDP socket (that has already been configured, including having its default destination address set with connect):

write(socket_fd, "FIRST", strlen("FIRST"));
write(socket_fd, "SECOND", strlen("SECOND"));
write(socket_fd, "THIRD", strlen("THIRD"));

Assuming the writes above do not fail, when code running on the other end runs:

char buffer1[100] = "initial value";
char buffer2[100] = "initial value";
read(other_socket_fd, buffer1, sizeof buffer1);
read(other_socket_fd, buffer2, sizeof buffer2);

Assuming messsage data is not corrupted on the network (despite the lack of end-to-end checksums), then, which of the following are possible outcomes in buffer1 and buffer2? Select all that apply.

buffer1 could contain FIRST and buffer2 could contain SECOND
buffer1 could contain FIRST and buffer2 could contain SECONDTHIRD
buffer1 could contain FIRST and the second read will hang indefinitely
buffer1 could contain SECOND and buffer2 will contain FIRST

Suppose www.foo.com's DNS server when queried returns a record containing the IP address for www.foo.com specifying a time-to-live of 10000 seconds.

Question 2 (4 points) (see above)

We mentioned in lecture that an ISP's DNS server could obtain this record by first querying one of the "root" DNS servers. This root server would most likely ____.

resend the query to www.foo.com's DNS server and then return the result it received to the ISP's DNS server
return the result directly to the ISP's DNS server (that is, it will have a copy of this record already)
return information about a DNS server for .com domain names to the ISP's DNS server
return information about www.foo.com's DNS server, possibly after querying one of the DNS servers for the .com domain names to obtain it

Question 3 (4 points) (see above)

www.foo.com is accessed many times over the course of 5 hours by each of 1000 users. These 1000 users a split evenly across 10 different ISPs.

Each of these ISPs runs their own DNS server for their users that caches the returned records for as long as possible. About how many queries for the IP address of www.foo.com will www.foo.com's DNS server get?

Answer:

Question 4 (4 points)

Suppose A and B each prepare to perform secure communications with each other as follows:

they each generate a keypair for asymmetric encryption and a separate keypair for digital signatures. They will use these keypairs both for their communications with each other and their communications with third-parties; then
A and B securely share some information

After this preparation is complete, we would expect A to have ____. Select all that apply.

A's private signing and/or encryption key
B's private signing and/or encryption key
B's public signing and/or encryption key
a signature generated using B's private signing key over a message containing B's public signing key
a common secret value that A and B share

Question 5 (4 points)

Consider the following (broken) scheme for allowing users to run a command on a remote server:

The system administrator sets up MAC keys for every user of the system. These keys are distributed securely to those users when their account is setup and are used in addition to passwords.

When a user asks to login to the server remotely:

the server sends the user a unique, one-time code N
the user generates a MAC tag of a message containing the code N and their password and sends the tag, along with the command they want to run to the server
the server recomputes the tag using its own copy of the key and password and compares it to what the user sent; if it matches, the server runs a command specified by the user

Which of the following are scenarios this protocol prevents? Select all that apply.

an eavesdropper with only passive (read-only) access to the network reading the commands the user is running
someone with enough control over the network replacing the command specified by the user with their own command
after a user runs a command, an eavesdropper on the network using data from messages they intercepted to run another command using that user's access
an eavesdropper with only passive (read-only) access to the network getting useful information about the user's password

quiz for week 7

In lecture, we mentioned key agreement protocols where two parties A and B would send each other key shares derived from secret values, then use the other's key share and their secret value to construct a shared secret value (known by A and B, but not anyone else).

Question 2 (4 points) (see above)

Suppose A and B use key agreement as follows:

first, they send each other key shares over an insecure network,
then, use their shared secret value to construct a key for symmetric encryption and a key for message authentication codes.

If other steps are not taken, this design would allow an attacker with sufficient access to the network to ___. Select all that apply.

using the key shares sent by A and B as well as their own secret value and key share, compute their own copy of the symmetric encryption key A and B are using
make A and B be unable to exchange key shares despite being able to send other (insecure) messages to each other on the network
replace A's key shares with their own and take the place of B in the communication and make it appear to A that the network is not responding
replace A's key shares with their own and take the place of B for A's communications, and simulatenously replace B's key shares with their own and take the place of A for B's communications

Consider two strategies for eliminating duplicates from an array of values:

constructing a hashtable from the list of values, discarding any values found that were already in the hashtable
sorting the array of values, then removing consecutive items that have the same value

Question 3 (2 points) (see above)

Which of the two approaches would one expect to exhibit better spatial locality?

approach 1
approach 2
their spatial locality should be the same

Question 4 (4 points) (see above)

In approach 1, at least when there are many duplicates to eliminate, the steps where the values are looked up in the hashtable would likely have significant ____. Select all that apply.

temporal locality in the accesses to the hashtable
spatial locality in the accesses to the hashtable
temporal locality in the access to the input array
spatial locality in the accesses to the input array

Question 5 (3 points) (see above)

In approach 1, a hash function that better distributes values in the hashtable and better avoids hash collisions would likely ____. Select all that apply.

improve the spatial locality of accesses to the hashtable
improve the temporal locality of accesses to the hashtable
reduce the total number of cache accesses

Suppose a 8-set direct mapped cache with 4B blocks has the following contents:

set index	tag (written in binary)	valid	data bytes (in hexadecimal; lowest address first)
0	—	0	—
1	001101	1	02 FF 41 02
2	001101	1	01 11 23 45
3	011000	1	33 55 77 9A
4	000000	1	00 01 02 03
5	000001	1	F0 80 90 01
6	—	0	—
7	—	0	—

Question 6 (4 points) (see above)

With the above cache accessing a two-byte integer from an address with tag 001101 (written in binary), set index 4, and offset 2 will result in a cache miss. As a result of the cache miss, what bytes currently stored in the cache will be replaced?

quiz for week 9

Consider the following C code:

unsigned char array[12];
...
a = array[0]; // (1)
b = array[0]; // (2)
sum += a * b;
a = array[4]; // (3)
b = array[1]; // (4)
sum += a * b;
a = array[8]; // (5)
b = array[2]; // (6)

For the following questions assume that:

array starts at an address that is a multiple of 4096 (2 to the 12th power)
only accesses to array use the data cache
the compiler does not reorder or optimize the code to reduce the number of memory accesses for array
the data cache is initially empty (all blocks invalid) when the code labeled (1) starts.

Question 1 (4 points) (see above)

With a direct-mapped 4-byte data cache with 2-byte blocks, the accesses labeled (1), (3), (4), (5) and (6) above will all be cache misses. The access labeled (1) is a compulsory miss also known as a cold miss. Which other misses are also compulsory or cold misses? Select all that apply.

Question 2 (4 points) (see above)

Consider 2-way (fully associative) 8-byte data cache with 4-byte blocks and an least-recently-used replacement policy. The array access labeled (1) above will be cache miss and the access labeled (2) will be a cache hit. Which other accesses will be cache hits? Select all that apply.

(3)
(4)
(5)
(6)

Question 3 (4 points) (see above)

Consider 2-way (fully associative) 8-byte cache with 4-byte blocks and first-in, first-out replacement policy The array access labeled (1) above will be cache miss and the access labeled (2) will be a cache hit. Which other accesses will be cache hits? Select all that apply.

(In a first-in, first-out replacement policy, the block replaced longest ago ("first-in") in a set is the next one replaced.)

(3)
(4)
(5)
(6)

Consider the following C snippet:

unsigned char array1[4096];
unsigned char array2[4096];
...
for (int i = 0; i < 2048; ++i) {
    array1[i] = array2[i + 2048] * array2[i];
}

For the following questions, assume:

the snippet uses a 32768-byte 4-way set-associative cache with 64-byte blocks and an LRU replacement policy;
the data cache is initially empty when the for loop starts;
array1 and array2 are both located at addresses that are multiples of 32768.

Question 4 (4 points) (see above)

If the data cache has a write-through policy, during the execution of for loop, how many times will the cache read from or write to the next level of cache (or memory if there is no next level of cache) in order to read or write values in array1?

Answer:

Question 5 (4 points) (see above)

If the data cache has a write-back policy, during the execution of the for loop, how many times will the cache read from or write to the next level of cache (or memory if there is no next level of cache) in order to store values in array1?

Answer:

quiz for week 10

Consider a system with:

4096-byte pages
50-bit physical addresses
48-bit virtual addresses
4-level page tables where
- page tables at each level hold 512 8-byte entries
a 2-way 4-entry data TLB with a random replacement policy
a 2-way 4-entry instruction TLB with a random replacement policy
a 16384-byte 4-way set associative data cache with 64-byte blocks that uses physical addresses
a 16384-byte 4-way set associative instruction cache with 64-byte blocks that uses physical addresses

Question 1 (4 points) (see above)

Suppose this system runs an instruction like:

movq 0x12345, %rax

which loads 8 bytes from 0x12345 into %rax. Suppose while this instruction runs, all related TLB and cache accesses are hits. As part of running this instruction the system will ____. Select all that apply.

access the data cache at an address with a page number obtained from the TLB
access the data TLB more than once
access the instruction TLB at least once
use the page offset from 0x12345 as part of a data TLB access

Question 2 (4 points) (see above)

Accessing the virtual address 0x12345 will use the same TLB set as accessing which of the following virtual addresses? Select all that apply.

0x22446
0x23456
0xFF345
0xFF12345

Question 3 (4 points) (see above)

The amount of storage required to implement the data TLB is ____.

less than 128 bits
greater than or equal to 128 bits and less than 1024 bits
greater than or equal to 1024 bits and less than 8192 bits
greater than or equal to 8192 bits and less than 65536 bits
greater than or equal to 65536 bits and less than 524288 bits
greater than or equal to 524288 bits

Question 4 (4 points)

Consider the following C code that uses the POSIX API:

printf("1"); fflush(stdout);
pid_t pid1 = fork();
pid_t pid2 = fork();
if (pid1 > 0 && pid2 > 0) {
    printf("2"); fflush(stdout);
    waitpid(pid1, NULL, 0);
    waitpid(pid2, NULL, 0);
    printf("3"); fflush(stdout);
}
printf("4"); fflush(stdout);
exit(0);

Which of the following are possible outputs of the above? Select all that apply.

1234444
1244434
1244344
1444234

Question 5 (4 points)

Consider the following C code that uses the POSIX API:

int pipe_fds[2];
pipe(pipe_fds);
int out_fd = open("foo.txt", O_WRONLY | O_CREAT);
dup2(pipe_fds[1], out_fd);
dup2(pipe_fds[0], STDIN_FILENO);

Which (if any) of the following are true after this code snippet runs, assuming pipe, open, and dup2 do not fail? Select all that apply.

the file foo.txt will be open
writing to stdout (such as with printf) will write to a pipe
data written to out_fd could be later read from stdin
data written to pipe_fds[1] could be later read from out_fd

quiz for week 11

Consider the following C code:

void *thread_func(void *arg) {
    int *p;
    p = (int*) arg;
    *p = 1;         /* <----------------------------------- LINE A */
    return (void*) (p + 2);
}

void foo() {
    int array[4] = {0, 0, 0, 0};
    pthread_t thread_ids[2];
    for (int i = 0; i < 2; i += 1) {
        int *q = &array[i];
        pthread_create(&thread_ids[i], NULL, thread_func, (void*) q);
    }
    for (int i = 0; i < 2; i += 1) {
        int *r = NULL;
        pthread_join(&thread_ids[i], (void**) &r);
        *r = 2;     /* <----------------------------------- LINE B */
    }
    printf("%d %d %d %d\n", array[0], array[1], array[2], array[3]);
}

Question 1 (3 points) (see above)

When the function foo() runs, the line labeled LINE A will most likely modify a value stored _____.

on the stack of the thread that initially called foo()
on the stack of one of the threads created by calls to pthread_create() in foo()
on the heap
in the space used for global variables
in an unallocated or invalid location --- it is likely to segfault or similar

Question 2 (3 points) (see above)

When the function foo() runs, the line labeled LINE B will most likely modify values stored _____.

on the stack of the thread that initially called foo()
on the stack of one or more of the threads created by calls to pthread_create() in foo()
on the heap
in the space used for global variables
in an unallocated or invalid location --- it is likely to segfault or similar

Question 3 (6 points)

Consider the following C code:

struct info {
    int *array;
    int start;
    int end;
};

void *thread_func(void *arg) {       
    struct info *info;
    info = (struct info*) arg;
    for (int i = info->start; i < info->end; i += 1) {
        info->array[i] *= 2;
    }
    return NULL;
}

void double_all(int *array, int size) {
    pthread_t threads[2];
    for (int i = 0; i < 2; i += 1) {
        struct info cur_info;
        cur_info.array = array;
        cur_info.start = i * (size / 2):
        cur_info.end = (i + 1) * (size / 2);
        pthread_create(&threads[i], NULL, thread_func, &cur_info);
    }
    for (int i = 0; i < 2; i += 1) {
        pthread_join(&threads[i], NULL);
    }
}

double_all is intended to multiple every element of its input array by 2, where the array is specified using a pointer to its first element and its size. The implementation tries to use two threads to take advantage of multiple cores to do this, computing a range of the array (specified by start and end) for each thread to work on. But the above code is buggy. Which of the following are bugs in the code? Select all that apply.

The array passed to double_all is copied to a local variable and not copied back to the original array, so code calling the above function won't see changes to the array.
thread_func's may access a value which has gone out of scope when it uses the info pointer
the start and end indicies computed by double_all may not cover all of its elements (for certain sizes)
the start and end indicies computed by double_all may exceed the bounds of the array (for certain sizes)
the two threads spawned by double_all cannot run simultaneously since they are started in separate iterations of a loop
double_all might return while one or more threads is still doubling values

Consider the following C code:

struct dynamic_array {
    int *data;
    size_t size;
    pthread_mutex_t lock;
};

void append_to_array(struct dynamic_array *dest, struct dynamic_array *to_add) {
    pthread_mutex_lock(&dest->lock);
    pthread_mutex_lock(&to_add->lock);
    int *new_data;
    size_t new_size = (dest->size + to_add->size) * sizeof(int);

    /* BEFORE REALLOC */
    new_data = realloc(dest->data, new_size);
        /* was incorrectly new_data = realloc(dest, new_size)  in quiz as released */

    if (!new_data) { abort(); /* out of memory */ }
    dest->data = new_data;
    /* AFTER REALLOC */

    memcpy(dest->data + dest->size, to_add->data, to_add->size * sizeof(int));
        /* was incorrectly dest->new_data + dest->size  in quiz as released */
    dest->size += to_add->size;
    pthread_mutex_unlock(&to_add->lock);
    pthread_mutex_unlock(&dest->lock);
}

Question 4 (4 points) (see above)

For this question, assume that all code follows the convention that dynamic_array structs are only read or modified while holding the lock in the struct. (That is, only after a locking the lock and without unlocking the lock until after the read/modification is done.)

In the append_to_array function, the lock for to_add is kept locked even while the realloc operation is run, even though that operation does not involve to_add. An idea one might have to take advantage of this is to temporarily release the lock on to_add by inserting a pthread_mutex_unlock(&to_add->lock) in place of the comment /* BEFORE REALLOC */ and inserting pthread_mutex_lock(&to_add->lock) in place of the coment /* AFTER REALLOC */. It might seem that this would allow more threads to work in parallel.

However, this change would allow for some incorrect behavior. Which of the following problems could we expect as a result of the attempt to temporarily release the to_add lock described above? Select all that apply.

if to_add's size is increased at the same time as the append_to_array call is made, the memcpy in append_to_array could exceed the bounds of dest->data
if to_add's data is reallocated at the same time as the append_to_array call is made, the memcpy in append_to_array could attempt to copy from the old allocation
if append_to_array(dest, A) and append_to_array(dest, B) calls are made at the same time (for non-empty A, B), then dest could include elements from B but not include elements from A after both calls complete

quiz for week 12

Consider the following C code:

struct dynamic_array {
    int *data;
    size_t size;
    pthread_mutex_t lock;
};

void append_to_array(struct dynamic_array *dest, struct dynamic_array *to_add) {
    pthread_mutex_lock(&dest->lock);
    pthread_mutex_lock(&to_add->lock);
    int *new_data;
    size_t new_size = (dest->size + to_add->size) * sizeof(int);
    new_data = realloc(dest->data, new_size);
    if (!new_data) { abort(); /* out of memory */ }
    dest->data = new_data;
    memcpy(dest->new_data + dest->size, to_add->data, to_add->size * sizeof(int));
    dest->size += to_add->size;
    pthread_mutex_unlock(&to_add->lock);
    pthread_mutex_unlock(&dest->lock);
}

Question 1 (4 points) (see above)

It is possible for two simulatenous calls to append_to_array to hang indefinitely. Which, if any, of the following are examples of sets of simulatenous calls where this could happen? (Assume A, B, C, D, etc. represent distinct dynamic_array struct pointers.) Select all that apply.

append_to_array(A, A) and append_to_array(A, B)
append_to_array(A, B) and append_to_array(B, A)
append_to_array(B, A) and append_to_array(C, A)
append_to_array(A, B) and append_to_array(A, C) and append_to_array(B, C)

Question 2 (4 points) (see above)

Which of the following changes to the above code would prevent the deadlock? Select all that apply.

adding a global lock variable, and locking this lock at the beginning of append_to_array and unlocking it just before the end of append_to_array
adding a global lock variable, and locking this lock just before the pthread_mutex_lock calls in append_to_array and unlocking it just after the pthread_mutex_lock calls in append_to_array
comparing the addreses of dest and to_add and always locking the one of the two with the higher address, then the one of the two with the lower address at the beginning of append_to_array
instead of having one lock for each dynamic_array struct, have a seperate lock for each dynamic_array's data and for each dynamic_array's size, and have append_to_array acquire each of these locks independently before working

Suppose we want to implement a variant of the producer-consumer pattern where producing threads can produce two types of values (type A and type B), and consumer threads can either wait for a value of a particular type A OR they can wait for a value of both type A and type B. When the threads wait, they should wait without consuming a lot of processor time.

To implement this, we'll use a Queue implementation that provides queue_enqueue, queue_dequeue, and queue_size operations.

The following quesitons ask about the following incompete implementation. Blanks (formatted like _____________) represent omitted code:

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

Queue a_values;
pthread_cond_t a_available = PTHREAD_COND_INITIALIZER;
Queue b_values;
pthread_cond_t both_available = PTHREAD_COND_INITIALIZER;

void EnqueueA(Item item) {
    pthread_mutex_lock(&lock);
    queue_enqueue(&a_values, item);
    _________________________________ /* BLANK 1 */
    pthread_mutex_unlock(&lock);
}

void EnqueueB(Item item) {
    pthread_mutex_lock(&lock);
    queue_enqueue(&b_values, item);
    pthread_cond_signal(&both_available);
    pthread_mutex_unlock(&lock);
}

Item DequeueA() {
    pthread_mutex_lock(&lock);
    while (queue_size(a_values) == 0) {
        pthread_cond_wait(&a_available, &lock);
    }
    Item result = queue_dequeue(&a_values);      /* was b_values when quiz originally released */
    pthread_mutex_unlock(&lock);
}

void DequeueBoth(Item *a_item_ptr, Item *b_item_ptr) {
    pthread_mutex_lock(&lock);

    while (_______________________________) { /* BLANK 2 */

        pthread_cond_wait(_________________ /* BLANK 3 */, &lock);
    }
    *a_item_ptr = queue_dequeue(&a_values);
    *b_item_ptr = queue_dequeue(&b_values);
    pthread_mutex_unlock(&lock);
}

Question 3 (4 points) (see above)

Which of the following would be most appropriate to place in the blank labeled "BLANK 1"?

pthread_cond_wait(&a_available, &lock);
pthread_cond_wait(&a_available, &lock); pthread_cond_wait(&both_available, &lock);
a_available += 1;
a_available += 1; if (queue_size(b_values) > 0) both_available += 1;
pthread_cond_signal(&a_available);
pthread_cond_signal(&a_available); pthread_cond_signal(&both_available);
pthread_cond_signal(&a_available); if (both_available != 0) {pthread_cond_signal(&both_available);}
while (queue_size(b_values) == 0) pthread_cond_wait(&both_available, &lock);
none of these would work (explain in comments)

Question 4 (4 points) (see above)

Which of the following would be most appropriate to place in the blank labeled "BLANK 2"?

queue_size(a_values) == 0 && queue_size(b_values) == 0
queue_size(a_values) == 0 || queue_size(b_values) == 0
queue_size(a_values) > 0 && queue_size(b_values) > 0
queue_size(a_values) > 0 || queue_size(b_values) > 0
queue_size(a_values) != queue_size(b_values)
none of these would work (explain in comments)

Suppose we have a system where three threads are running in parallel and we want a function to send a value from thread 0 to thread 1, and another value from thread 1 to thread 2, and another value from thread 2 to thread 0. To do this, each thread calls a function like:

other_threads_value = SendAndReceiveValue(thread_id, my_value);

where my_value is the current thread's value and thread_id is 0, 1, or 2, depending on which thread is making the call.

Functions can call SendAndReceiveValue multiple times; each time they call it they should receive values from the following call to SendAndReceiveValue.

Consider the following implementation of this function using semaphores:

int values[3];
sem_t value_ready[3];
sem_t value_empty[3];

int SendAndReceiveValue(int thread_id, int value) {
    int other_thread_id = (thread_id + 1) % 3;
    sem_wait(&value_empty[other_thread_id]);
    values[other_thread_id] = value;
    sem_post(&value_ready[other_thread_id]);
    sem_wait(&value_ready[thread_id]);
    int received_value = values[thread_id];
    sem_post(&value_empty[thread_id]);
    return received_value;
}

Question 5 (2 points) (see above)

What should the initial value of the semaphore value_ready[0] be? (By "value", we mean the non-negative integer stored by the semaphore.)

Answer:

Question 6 (2 points) (see above)

What should the initial value of the semaphore value_empty[0] be? (By "value", we mean the non-negative integer stored by the semaphore.)

Answer:

quiz for week 13

For this question, consider a processor uses pipelining with 6 pipeline stages and a cycle time of 600 picoseconds (.000 000 6 milliseconds).

Question 2 (4 points) (see above)

Suppose we execute a program with 100000000 (100 million) instructions on this processor and the program experiences no data or control hazards. How long in microseconds, rounded to the nearest millisecond, would the program take to execute on the pipelined processor?

Answer:

Question 3 (4 points) (see above)

Suppose we modified this processor to have 5 pipeline stages by combining its third and fourth pipeline stages. Which of the following would likely be true about the modified processor? Select all that apply.

it would have a higher instruction throughput than the original processor
it would have a higher instruction latency than the original processor
its cycle time would be greater than 1200 ps
it would need to stall less often than the original processor

Consider a nine-stage pipelined processor with the following pipeline stages:

a first fetch stage, during which a read from the instruction cache is started and the next program counter value is computed or predicted
a second fetch stage, during which the read from the instruction cache is completed
a decode stage, during which register values are read
a first execute stage, at the beginning of which the operands for arithemtic operations need to be available
a second execute stage
a third execute stage, near the end of which the results of any arithmetic operations are available (and not earlier)
a first memory stage, during which a data cache access is started (if necessary for the instruction)
a second memory stage, during which a data cache access is completed
a writeback stage, during which values are written to a registers

This processor uses forwarding combined with (when necessary) stalling to resolve data hazards.

Question 5 (4 points) (see above)

When running the assembly:

addq %r8, %r9
xorq %r8, %r11
subq %r9, %r10

on this nine-stage processor, the subq instruction will complete ____ cycles after the addq instruction completes.

(Assume that any instructions the processor is running not included in the snippet above:

do not use %r8, %r9, %r10, or %r11; and
do not require stalling

)

Answer:

quiz for week 14

For the following question, consider a six-stage pipelined processor with the following pipeline stages:

fetch
decode
execute 1
execute 2
memory
writeback

For the two execute stages:

the results of any conditional jump are computed during the execute 1 stage, allowing the correct instruction to be fetched during the execute 2 stage if it was mispredicted
arithmetic instructions require their operands to be available near the beginning of the execute 1 stage
arithmetic instructions have their results available near the end of the execute 2 stage

Suppose the processor is running the following assembly snippet:

    movq (%r8), %r9  
    subq $128, %r9   
    jle foo          
    addq %r9, %r10

and the jle is mispredicted.

Question 1 (4 points) (see above)

During the execution of the above assembly snippet, the addq instruction will finish ___ cycles after the movq instruction.

Answer:

Consider the following assembly snippet:

    movq $4, %rax       // rax <- 4
start_loop:
    movq %r8, %r9       // r9 <- r8
    andq $3, %r9        // r9 <- r9 & 3
    cmpq $0, %r9        // set condition codes using r9 - 0
    jne skip            // if (r9 != 0) goto skip
    addq $1, %r8        // r8 <- r8 + 1     ***
skip:
    subq $1, %rax       // rax <- rax - 1
    cmp $0, %rax        // set condition codes using rax - 0
    jne start_loop      // if (rax != 0) goto start_loop
end_loop:

For the questions below, assume %r8 has an initial value of 4 every time the snippet above is run, in this case:

the loop (marked by the start_loop and end_loop labels) will execute 4 times.
during the first time the loop is executed, the addq $1, %r8 instruction will be executed
during the other times the loop is executed, the addq $1, %r8 instruction will not be executed

Question 2 (4 points) (see above)

If the processor running this snippet uses the forwards-not-taken, backwards-taken branch prediction strategy discussed in lecture, then the conditional jumps (jne skip and jne start_loop) will be predicted correctly ___% of the time. Round to the nearest percent.

Answer:

Question 3 (4 points) (see above)

Suppose the processor running this snippet predicts each conditonal jump as taken if it was taken the last time it was run, and predicts it as not taken if it was not taken the previous time it was run.

Assume the processor just ran the snippet and then runs the snippet a second time (that is, there is some loop surrounding the snippet). During the second execution, the conditional jumps in the snippet will be predicted correctly ____% of the time. Round to the nearest percent.

Answer:

Question 4 (4 points)

Consider the following assembly snippet:

mov (%rax), %r9
add %r9, %r10
mov (%rbx), %r9
add %r9, %r11
mov (%rcx), %r9
add %r9, %r12
xor %r10, %r11

Consider an out-of-order processor design like we discussed in lecture. Assume that can perform multiple arithmetic operations at a time and can perform multiple loads at a time. (Its data cache is capable of doing multiple reads at a time, even if they are started in the same cycle.)

On that processor, which of the following is true about how this code could be executed? Select all that apply.

the value of %r9 will be read from the same physical register for each of the add instructions
if one of the movs is slow due to a cache miss, then it will delay any following mov because they all use %r9
the xor will read the value of %r10 from the same physical register that add %r9, %r10 writes to
the value of %r11 will be written to the same physical register in the second add as in the xor

Question 5 (4 points)

Consider an out-of-order processor design similar to what we discussed in lecture, with execution units supporting running several arithmetic instructions at a time.

Suppose after register renaming, the end of the instruction queue for the processor contains:

add  %x10, %x19 -> %x20
sub  %x20, %x21 -> %x22
xor  %x18, %x22 -> %x24
imul %x18, %x18 -> %x25

where %x... represents a physical register name. Assume that there are also other instructions in the instruction queue, not shown above.

What of the following are true about these instructions and how they could be run on this processor? Select all that apply.

the imul and xor could perform their arithmetic at the same time (depending on the speed of the surrounding instructions)
the add and xor could perform their arithmetic at the same time (depending on the speed of the surrounding instructions)
in the original assembly code, the sub and the xor used the same (architectural) destination register
the add will only be executed when there are no other instructions in the instruction queue that use %x19

quiz for week 15

Question 1 (5 points)

Consider the following assembly snippet:

addq %rcx, %rax
subq %rcx, %r9
xorq %rax, %r8
movq (%r8), %r10
imulq %r10, %r9
imulq %rax, %rax

On an out-of-order processor, the xorq instruction could do its computation at the same time as ____ is doing is computation. Select all that apply.

Question 2 (4 points)

Consider the following assembly snippet:

    cmpq %rax, %rbx
    jle other
    addq %r12, %rdx
    movq (%rdx), %r8
    subq %r8, %r9
    jmp after
other:
    addq %r13, %rcx
    movq (%rcx), %r8
    addq %r8, %r9
after:

Suppose this executes on an out-of-order processor, and the jle other is predicted as taken, but is actually not taken.

Then, ____. Select all that apply.

the cache may be accessed using an address obtained from %rcx
the cache may be accessed using an address obtained from %rdx
the cache might never be accessed using an address obtained from %rcx
the cache might never be accessed using an address obtained from %rdx

Question 3 (4 points)

Consider the following C code:

int IsPowerOfTwo(unsigned long x) {
    while (x > 1) {
        if ((x & 1) == 1) {
            return 0;
        }
        x = x >> 1;
    }
    return 1;
}

Assume this is translated to assembly in a manner that does omit any bitwise or arithmetic operations.

Suppose IsPowerOfTwo(a) takes substantially longer to execute than IsPowerOfTwo(b) for some non-zero a and non-zero b. (Note that IsPowerOfTwo's return value is not specified.)

Based on the time difference between IsPowerOfTwo(a) and IsPowerOfTwo(b), what is most likely true about a and b? Select all that apply.

the difference between a and its closest power of two is greater than the difference between b and its closest power of two
a is a larger number than b
more of the least significant bits of a are zero than the least significant bits of b
a is a power of two and b is not

Question 4 (4 points)

Consider the following C snippet:

t = lookupTable[x];

where lookupTable is an array of 16384 chars located in memory at address 0x20CD00.

Suppose using the PRIME+PROBE strategy discused in lecture, we determine that the access to lookupTable modifies set 23 of a 65536-byte 4-way set-associative cache with 256-byte blocks.

What is a possible value of x?

To simplify this problem, assume all addresses are physical.

Answer:

Consider the following C code:

if (x < 1024) {
    char y = array1[x];
    if (y < 64) {
        foo(array2[y*2048]);
    }
}

where array1 and array2 are arrays of chars.

If an attacker can control x, they can use the Spectre vulnerability to read from memory using this code.

Suppose array1 is located at address 0x1000000, array2 is located at address 0x2000000. To simplfiy the problem, assume all addresses are physical.

Question 5 (4 points) (see above)

The attacker will need to set x to a value related to the memory location that they want to read from. If they want to get information about the int at memory address 0x4000000, they should set x to ____. Write your answer as a hexadecimal.

Answer:

CS3130 Spring 2023 quizzes

quiz for week 2

quiz for week 3

quiz for week 4

quiz for week 5

quiz for week 6

quiz for week 7

quiz for week 9

quiz for week 10

quiz for week 11

quiz for week 12

quiz for week 13

quiz for week 14

quiz for week 15