S2024 quizzes

Question 1 (4 points)

Suppose we create a library with a function foo:

int foo(int x);

This function is declared in a header file bar.h and the library implementation is linked into a file libbar.so.

If we modify the function foo to have the prototype

int foo(double x);

and update bar.h and libbar.so accordingly, we find that executables that use the library and used to work now crash (even though there are no relevant bugs in the library itself).

Which of the following would either fix this issue or have avoided this issue in the first place? Select all that apply.

if the library always used .a files instead of .so files
if one relinked the executable program from its existing .o files (and libraries)
if one recompiled the executable from its .c files, and relinked it
if one modified the bar.h header file to include both prototypes and then recompiled and relinked the library

Question 2 (4 points)

Consider the following erroneous Makefile.

all: executable

foo.c: foo.h
    gcc -Wall -c foo.c

main.c: foo.h
    gcc -Wall -c main.c

executable: foo.c main.c
    gcc -Wall -o executable foo.c main.c

clean:
    rm --force main.o foo.o executable

(Assume the commands in each rule are prefixed by tab characters.)

Which of the following is true about this erroneous Makefile? Select all that apply.

running make all will never build foo.o, regardless of the modification times or existence of other files
running make all once could compile main.c multiple times
updating foo.h, then running make all might not update executable
running make clean will remove foo.c instead of foo.o

Suppose a program displays its logo when it starts. Rather than store that logo in a separate data file, the program includes it in its executable. To do this, multiple steps are used:

First a program converts the logo file from SVG format in the file logo.svg into BMP format in the file logo.bmp:
```
svg2png logo.svg logo.bmp
```
Then, the BMP file logo.bmp is converted to a file logo.h file that defines an array of unsigned chars logo using the xxd utility:
```
xxd -i logo.bmp logo.h
```
The file logo.h is included by main.c, along with the utility header file graphics.h and the C standard library headers <stdlib.h> and <stdio.h>.

Consider the following incomplete Makefile that implements this scheme, including building the final executalbe program. In this incomplete Makefile, lines or partial lines of code that are not yet complete are replaced by ### BLANK X ### where X is some number:

all: program

### BLANK 1 ###
    svg2png logo.svg logo.bmp

### BLANK 2 ###
    xxd -i logo.bmp logo.h

main.o: ### BLANK 3 ###

graphics.o: graphics.h

%.o: %.c
    clang -c $^

program: main.o graphics.o
    clang -o program main.o graphics.o

(Assume the commands in each rule are prefixed by tab characters.)

Question 3 (4 points) (see above)

What would be appropriate to put in place of ### BLANK 2 ###?

Answer:

Question 4 (4 points) (see above)

What would be appropriate to put in place of ### BLANK 3 ###?

Answer:

Question 5 (4 points)

Suppose the file "first.txt" has the following access control list:

user:A:rw-
user:B:r--
group:C:rw-
group:D:r--
other::---

and the file "second.txt" has the following access control list:

user:B:rw-
group:C:rw-
other::---

(The other::... lines provide default permissions for cases where other entries of the access control list do not apply.)

It's possible for a running program to be able to read and write "second.txt" but not to be able to both read and write "first.txt". Which of the following is a sufficient condition for this to be true? Select all that apply.

the program is running as user A
the program is running as user B
the program is running as user A and groups C and E
the program is running as user E and group C

Question 6 (4 points)

Suppose we wanted to allow users to collaborate on a shared system like the department machines (portal, etc.)

In particular, one student A wants to collect comments from their collaborator students B and C and then reply to those comments. Students B and C should not be able to see each other's comments or the replies intended for the other student. But no one else should be able to read those comments.

Suppose to implement this scheme, we have separate files for B's comments and the corresponding replies and for C's comments and the corresponding replies.

Suppose we want to set the permissions on these files to restrict access appropriately, but instead of being able to use full access control lists (where we can specify read/write/execute access individually for arbitrary numbers of users and groups), we have to use more limited chmod-style permissions (where we can only specify read/write/execute access for a single user and for a single group).

We would need to create some groups of users to enable this. (When we say a "group contains users", we mean that everytime one of those users runs a program it runs as part of that group.) What sets of new groups could enable this? Select all that apply.

a group containing just student A
a group containing just students A, B, and C (and no other users)
a group containing just student B and a group containing just student C
a group containing just students B and C

quiz for week 3

Suppose a single-core Unix-like system is running three processes, one for a text editor, one for a compiler, and one for simulation. The following sequence of events happens:

initially, the compiler is running on the processor to generate assembly in memory;
then, the compiler starts waiting for more source code to be read from disk;
while the compiler is waiting, the simulation runs and does computations in memory, including using many functions in <math.h>
then, a keypress occurs and as a result, the text editor immediately runs to process the keypress and update the screen, and then ask for more input
then, after the keypress is handled, the simulation resumes running, continuing to do computations in memory
then, the compiler's read from disk finishes, and the compiler resumes generating assembly
the compiler opens and write a file back to disk to store the assembly it computed and starts waiting for this to complete
while the compiler is waiting for the data to finish being written by the disk, the simulation resumes running, performing computations in memory

Question 1 (4 points) (see above)

Sometimes a system call's completion will be triggered by a non-system call exception (although system calls are always started with the system call itself -- a system call exception).

During what steps in the above scenario is this likely to happen?

Write the number of steps above; for example, if it would happen in steps 4 and 7, write "47"

Answer:

Question 2 (4 points) (see above)

Some of the steps above are likely not to involve any system calls exceptions occuring. (That is steps in which no system call exception handler starts. The steps may or may not involve a system call handler that was started in a previous step resuming or continuing running.) Which ones?

Write the number of steps above; for example, if the answer is steps 4 and 7, write "47"

Answer:

Question 3 (4 points) (see above)

Some of the steps above may require multiple exceptions to occur. Identify them.

Write the number of steps above; for example, if it would happen in steps 4 and 7, write "47"

Answer:

On Linux, the sched_yield system call requests that the OS to context switch to another thread/process (if there is one to switch to). (sched_yield does not do anything else.)

For the following questions, suppose we are running a single-core Linux system with only two processes and that sched_yield calls and processes exiting are the only reason context switches occur, and that each call to sched_yield causes a context switch if another process can be run.

Suppose the first process is running the program:

int main() {
    int x = 1;
    printf("%d", x); fflush(stdout);
    x = x + 1;
    sched_yield();
    printf("%d", x); fflush(stdout);
    x = x + 1;
    sched_yield();
    printf("%d", x); fflush(stdout);
    exit(0);
}

and the second process is running the program:

int main() {
    printf("A"); fflush(stdout);
    sched_yield();
    printf("B"); fflush(stdout);
    sched_yield();
    printf("C"); fflush(stdout);
    exit(0);
}

(Assume all appropriate header files are #included.)

Suppose we start both of these programs at the same time, but arrange for the first process to initially have access to the processor.

Question 4 (4 points) (see above)

If both programs output to the same terminal, what is a possible output one could see on that terminal?

Answer:

Question 5 (4 points) (see above)

Running printf and/or fflush may require performing system calls. If so, ___. Select all that apply.

any immediately following sched_yield invocation will not require entering kernel mode
the assembly code for main that calls printf and/or fflush will use a different call instruction than is normally used for function calls
some of the instructions in the assembly code for main() will end up running in kernel mode
the system could enter kernel mode, run code in kernel mode, and then exit kernel mode without performing a process context switch

Question 6 (4 points) (see above)

Suppose the assembly for first process's code stores the value x in the callee-saved register %r15 and does not use the stack to store it.

Then, while the second process is outputting B, the value of x is most likely ____.

stored in a register on the processor
stored in memory managed by the OS
stored in the first process's exectuable file
stored on the second process's stack
not stored anywhere, because it will be recomputed when the first process resumes running
not stored anywhere, because the first process will not run anymore
none of the above (explain in comments)

quiz for week 4

For the following questions, suppose we run two processes. The first process runs with PID 48 and has a signal handler setup for SIGUSR1 with the following code:

int global = 0;
void handle_usr1_program1(int signum) {
    global += 1;
    int temp = global * 2 + global;
    printf("%d", temp); fflush(stdout);
    kill(54, SIGUSR1);
    printf("A"); fflush(stdout);
}

and the second process runs with PID 54 and has a signal handler setup for SIGUSR1 with the following code:

int global = 0;
void handle_usr1_program2(int signum) {
    global += 1;
    printf("%d", global); fflush(stdout);
}

Assume the code outside signal handlers in these processes does not do anything relevant to these questions (other than not exiting or exiting as described below) and that the signal handlers are setup using sigaction with an sa_flags value of SA_RESTART.

Question 1 (4 points) (see above)

If both processes are running and outputting to the same terminal, no code that would change global has run in any process, and we send SIGUSR1 to the first process, what is a possible output?

Answer:

Question 2 (4 points) (see above)

handle_usr1_program2 can run as a result of handle_usr1_program1's call to kill. When handle_usr1_program2 runs this way, ____. Select all that apply.

incrementing global will be performed by making a system call
the return address for handle_usr1_program2 will point to the machine code for handle_usr1_program1
the value of global will be stored in the second process's memory but outside its stack
the value of temp from handle_usr1_program1 will be stored on the function's stack or in a register on the processor

Question 3 (4 points) (see above)

Suppose initially both processes are running, but then the second process terminates and a message about it exiting is printed by the shell that was running it. If we wait a while after this happens, and then send SIGUSR1 to the first process, which of the following may happen depending on the details of other running programs? Select all that apply.

the kill function call in handle_usr1_program1 may return a non-zero value to indicate an error
if other processes have been started, the kill function call in handle_usr1_program1 might cause one of them to terminate
if there are no other processes with SIGUSR1 handlers, handle_usr1_program1 may recursively call itself because of its call to kill() causing a stack overflow
the kill function call in handle_usr1_program1 may result in the first process having a segmentation fault

Consider the following program that uses the POSIX API:

pid_t orig_pid, other_pid;
void handle_signal(int signum) {
    if (signum == SIGUSR1) {
        printf("%d:SIGUSR1 ", (int) other_pid); fflush(stdout);
        if (other_pid != 0)
            kill(other_pid, SIGUSR2);
        else
            kill(orig_pid, SIGUSR2);
    } else {
        printf("%d:SIGUSR2 ", (int) other_pid); fflush(stdout);
        exit(0);
    }
}

int main() {
    struct sigaction sa;
    sa.sa_handler = &handle_signal;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    sigaction(SIGUSR1, &sa, NULL);
    sigaction(SIGUSR2, &sa, NULL);
    orig_pid = getpid();
    other_pid = fork();
    if (other_pid == 0) {
        /* child process */
        kill(orig_pid, SIGUSR1);
        while (1) pause();
    } else {
        /* parent process */
        kill(other_pid, SIGUSR1);
        while (1) pause();
    }
}

Assume all relevant header files are #included, and that the call to fork does not fail.

For the questions below, assume that when this program runs that a signal handler is never started while a signal handler is already running in the same process.

The pause() function called by the code above waits until the current process receives a signal.

Question 4 (5 points) (see above)

Under the assumption above, which are possible outputs of this program? Select all that apply.

0:SIGUSR1 0:SIGUSR2 43:SIGUSR1 43:SIGUSR2
0:SIGUSR1 32:SIGUSR2 32:SIGUSR1 0:SIGUSR2
0:SIGUSR1 43:SIGUSR1 44:SIGUSR2 0:SIGUSR2
0:SIGUSR1 50:SIGUSR2 0:SIGUSR2
0:SIGUSR1 50:SIGUSR1 50:SIGUSR2 0:SIGUSR2

Consider the following program that uses the POSIX API:

int global = 0;
int main() {
    pid_t pid1, pid2;
    int status;
    pid1 = fork();
    if (pid1 == 0) {
        pid2 = fork();
        if (pid2 == 0) {
            global += 1;
        } else {
            global += 2;
            waitpid(pid2, &status, NULL);
        }
    } else {
        waitpid(pid1, &status, NULL); 
        pid2 = fork();
        if (pid2 == 0) {
            global += 4;
        } else {
            global += 8;
            waitpid(pid2, &status, NULL);
        }
    }
    printf("%d\n", global);
    return 0;
}

Assume all relevant header files are included and fork and waitpid do not fail.

Question 5 (4 points) (see above)

When we run this program, its output will include "4". This will be outputted by ____.

(For the purpose of this question, a child process of some process X is a process that was created directly by process X calling fork().)

the original process that was run
a child process of the original process that was run
a child process of a child process of the original process that was run
a child process of a child process of a child process of the original process that was run
a process that varies depending on timing
none of the above (explain in comments)

Question 6 (4 points) (see above)

If we run this program once, what is the maximum number of simulatenously running processes that could result?

Answer:

quiz for week 5

Suppose an analyst wants to run an analysis on a large list of data files. They have a program they wrote called ./analyze, which reads a file from stdin and appends to a summary file.

Doing this by hand, they'd run commands like:

./analyze < first-file
./analyze < second-file
./analyze < third-file
...

They find that typing all these commands is tedious and fails to take advantage of their system having four cores, which could speed up the analysis by analyzing multiple files at the same time. So, they write a function to analyze a large list of files:

void analyze_all(int num_files, const char **filenames) {
    pid_t *pids = malloc(num_files * sizeof(pid_t));  // was erroneously not pointer on quiz as released
    for (int i = 0; i < num_files; i += 1) {
        pids[i] = fork();
        if (pids[i] < 0) {
            handle_error();
        } else if (pids[i] == 0) {
            int fd = open(filenames[i], O_RDONLY);
            if (fd < 0)
                handle_error();
            if (dup2(fd, STDIN_FILENO) < 0)
                handle_error();
            close(fd);
            const char *argv[] = {"./analyze", NULL};
            if (execv("./analyze", argv) < 0)
                handle_error();
        }
        /* LOCATION 1 */
    }
    /* LOCATION 2 */
}

This seems to function, but they discover that there are two problems:

when the function returns, the analysis is still going on since it does not wait for the analyze programs to finish
the function runs so many instances of the analyze program at the same time that it overloads their machine

The following questions ask about modifying the code to fix these issues. Throughout the questions, assume that the handle_error() function is never reached.

Question 1 (4 points) (see above)

Suppose to solve the problem of the function not waiting for the analyze programs to finish, they add the code waitpid(pids[i], NULL, 0) at the line labeled "LOCATION 1". This will solve the problem of not waiting for the analyze programs to finish, but create other problems. Which ones? Select all that apply.

they will not take advantage of multiple cores anymore
the system will be more overloaded will be worse because of the extra code in analyze_all
multiple instances of ./analyze may read from the same file
the child processes will attempt to waitpid for themselves and hang, so analyze_all will not terminate

Question 2 (4 points) (see above)

As a revised solution, they place code at "LOCATION 1" as follows:

if (i >= 4) {
    waitpid(pids[i-3], NULL, 0);
}

To complete this solution, what code would be best to place at LOCATION 2?

for (int i = 0; i < 4 && i < num_files; i += 1) waitpid(pids[i], NULL, 0);
for (int i = 0; i < 4 && i < num_files; i += 1) { pids[i] = fork(); if (pids[i] > 0) { waitpid(pids[i], NULL, 0); } }
for (int i = num_files - 1; i >= num_files - 4 && i > 0; i -= 1) waitpid(pids[i], NULL, 0);
for (int i = num_files - 1; i >= num_files - 4 && i > 0; i -= 1) { pids[i] = fork(); if (pids[i] > 0) { waitpid(pids[i], NULL, 0); } }
if (num_files > 0) waitpid(pids[num_files - 1], NULL, 0);
none of the above (explain in comments)

Question 3 (4 points) (see above)

After using the solution of the previous question, the analyst finds that sometimes only one instance of analyze is running, failing to take advantage of all four cores available on their system. This seems to happen even when the list of files is very long and the loop in analyze_all is near the beginning of the list of files. What is most likely happening?

some of the analyze processes finish quickly but are not waited for until well after they finish
since the analyze processes all append to the same file, one cannot start until all previous ones finish
since the code in analyze_all does not redirect the analyze process's stdout, one cannot start until all previous ones finish
because there's only one call to open in the loop and one fd variable, only one input file can be opened at a time

Consider a system with a virtual memory system with:

13 bit virtual addresses
16 bit physical addresses
1024 byte pages

and the following page table:

virtual page number	valid	physical page number	read allowed?	write allocated?	execute allocated?	user mode access allowed?
0	0
1	1	0x3	1	0	1	1
2	1	0x4	1	0	1	1
3	1	0x1	1	1	0	1
4	1	0x7	1	1	0	1
5	0
6	1	0x0	1	1	0	1
7	1	0x2	1	1	1	0

Question 4 (4 points) (see above)

What is the maximum value a physical page number could have on this system?

Answer:

Question 5 (4 points) (see above)

Consider executing the C snippet

*p += 1;

on this processor in a program running in user mode where p is a pointer to unsigned char. Assume the compiler does not optimize away the memory accesses. This will succeed without exceptions occuring if p points to a value in virtual page ____. Select all that apply.

Question 6 (4 points) (see above)

The virtual address 0x1123 on this system corresponds to what physical address? Write your answer as a hexadecimal number. If there is no corresponding physical address so accessing virtual address 0x1123 would cause an exception (such as a page fault) write "fault".

Answer:

quiz for week 6

Suppose a system with 256 byte pages is running a process with the following memory layout (all address virtual, all ranges inclusive):

0x100-0x2ff (2 virtual pages): code and constants (read-only and executable)
0x300-0x4ff (2 virtual pages): global variables (read/write)
0x500-0x6ff (2 virtual pages): heap (read/write)
0xe00-0xfff (2 virtual pages): stack (read/write)
(and all other virtual pages are not accessible to the process)

Question 1 (4 points) (see above)

Suppose the system uses allocate-on-demand to allocate memory for its heap and stack. In this scheme, the page table initially has no valid entries for the program's heap or stack, and it is updated as the program accesses memory (in response to the exceptions triggered by the memory accesses to invalid pages). Assume that when updating the page table this way, the operating system only allocates as much memory as needed to make the access that triggered the exception work.

If the process runs code that:

writes to a global array at address 0x310 through 0x33f (inclusive)
writes to an array on the heap address 0x5f0 through 0x61f (inclusive)
reads the array on the heap at addresses 0x5f0 through 0x61f, computes the sum, and writes to a global variable at address 0x380

then, after that code completes, which of the following will be true? Select all that apply.

the process reading from some part of the "global variable" region could trigger an exception
the process writing to some part of the "heap" region could trigger an exception
at least two physical pages will be allocated to store the process's global variable region
at least two physical pages will be allocated to store the process's heap region

Question 2 (5 points) (see above)

Suppose:

initially all accessible virtual pages have a corresponding physical page (so they can be accessed without an exception occurring), and
the system uses copy on write and does the minimum copying possible

and the process fork()s and then:

the child process runs code at addresses 0x1f0 through 0x210 that:
- copies a global variable at address 0x310 to the stack at address 0xe40
- copies a global variable at address 0x340 to the stack at address 0xe41
- copies a global variable at address 0x410 to the stack at address 0xe42
the parent process runs code at addresses 0x220 through 0x240 that:
- modifies a value on the stack at address 0xe08

and no other relevant memory accesses occur.

After these events occur, which of the following is true? Select all that apply.

if the child process reads from 0xe08, then an exception will occur
if the parent process writes to 0x500, then an exception will occur
the parent and child process will use same physical pages to store their code and constants
there will be at least 4 physical pages allocated to store the stacks of the two processes
there will be at least 4 physical pages allocated to store the code and constants of the two processes

Question 3 (4 points) (see above)

In lecture, we mentioned the Unix mmap() call, which allows a program to access the contents of a file as if it is part of its own memory.

Suppose a program opens and mmap's a file as follows:

int fd = open("file.dat", O_RDWR);
char *p;
p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

This makes it so p can be used like a 4096 element array of chars that is stored in the file.

If we run two instances of the program at the same time and from the same directory, and open and mmap do not fail, then _____. Select all that apply.

p will have the same value in both processes
running p[0] will access the same physical address in both processes
running munmap(p, 4096) in one process would make the other process unable to use its p as an array
writing to the file descriptor fd (from the code above) could change the contents of p[0]

Consider a system which uses single-level page tables with:

65536 (2 to the 16) byte pages (and therefore 16-bit page offsets)
32-bit virtual addresses
40-bit physical addresses
8 byte page table entries, which contain, starting with the most significant bits when interpreted as a 64-bit integer:
- 24 unused bits (which must be 0)
- a 24 bit physical page number
- a 11 unused bits (which must be 0)
- a user-mode-accessible bit
- a readable bit
- a writeable bit
- an executable bit
- a valid bit
a page table base register, which contains the address of the first byte of the page table (so if the page table base register contains 0x10000, the first page table entry is at physical address 0x10000 and the second at 0x10008 and so on.)

Question 4 (4 points) (see above)

While executing the instruction

movq 0x123450, %rax

which reads an 8-byte integer from 0x123450 and stores it in the register %rax, the processor accesses the page table to perform the read.

If the page table base register contains 0xA0000, then what physical address will the processor read a page table entry from? Write your answer as a hexadecimal number. If not enough information is provided, write "unknown" an explain briefly in the comments.

Answer:

Question 5 (4 points) (see above)

Consider the access described in the previous question.

Suppose the mov instruction runs in user mode, and after reading the page table entry from the address that is the answer to the previous question, the processor then reads an integer from physical address 0x903450 and copies the integer into %rax.

In order for this to happen, what is a possible value of that page table entry?

Write your answer as a hexadecimal number (interpreting the page table entry as a 64-bit integer).

Answer:

Consider a system which uses three-level page tables with:

65536 (2 to the 16) byte pages (and therefore 16-bit page offsets)
55-bit virtual addresses
40-bit physical addresses
65536 byte page tables (at each level), each containing 8192 entries, where each entry is 8 bytes in size
a page table base register, which contains the address of the first byte of the first-level page table (so if the page table base register contains 0x10000, the first page table entry in the first-level page table is at physical address 0x10000 and the second entry at 0x10008 and so on.)

Question 6 (4 points) (see above)

Suppose when the processor is translating the virtual address 0x00123456789ABC, the following sequence of events happens:

based on a page table base register of value of 0x10000000, the processor accesses a first-level page table entry at address 0x10000020
based on the value retrieved from the first-level page table entry, the processor discovers that the relevant second-level page table is in physical page 0x54, which starts at byte address 0x540000.
after looking up the appropriate entry in the second-level page table, the processor discovers that the relevant third-level page table is in physical page 0x60, which starts at byte address 0x600000.
in the third-level page table, the processor reads a page table entry from address 0x60b3c0 and discovers that the final translation of 0x00123456789ABC should be 0x9009ABC

From what physical address did the processor access a second-level page table entry in this lookup? Write your answer as a hexadecimal number. If not enough information is provided, write "unknown" and explain what additional information is needed in the comments.

Answer:

quiz for week 7

Consider the following C snippet that computes two values sum and squared_sum from an array of doubles A with a constant number N elements.

double sum = 0.0;
for (int i = 0; i < N; ++i) {
    sum += A[i];
}
double squared_sum = 0.0;
for (int i = 0; i < N; ++i) {
    squared_sum += A[i] * A[i];
}

Question 1 (4 points) (see above)

Which of the following changes would improve the temporal and/or spatial locality of the snippet's accesses to A? Select all that apply.

adjusting each for loop to iterate through the array in reverse order (from N-1 to 0 instead of 0 to N-1)
the array was replaced with a singly-linked list
computing both sum and squared_sum in a single for loop instead of two separate ones
halving the value of N

For the following questions, consider a direct-mapped data cache with 16 byte cache blocks.

Question 2 (4 points) (see above)

If the addresses 0x401 and 0x542 map to two different cache sets, then what is the minimum number of sets the cache could have?

Assume that the cache must have a power of two number of sets.

Answer:

Question 3 (4 points) (see above)

Suppose the cache has 4 sets (and therefore 2 set index bits) and is initially empty; and a processor accesses the cache as follows:

reads 2 bytes from an address with tag 0x43, set index 2, offset 4
reads 2 bytes from an address with tag 0x43, set index 2, offset 6
reads 2 bytes from an address with tag 0x43, set index 3, offset 0

When the cache performs these accesses, how many bytes will it read from memory (or the next level of cache)?

Answer:

Question 4 (4 points) (see above)

Suppose the cache has 256 sets (and therefore 8 set index bits) and addresses passed to the cache are 32 bits in size. Including space for valid bits, tags, and the actual data stored in the cache, how many bits of storage does the cache contain?

(Remember to account for bytes taking up 8 bits.)

Answer:

Consider a two-way set associative cache whose contents are as follows. Data is written as as sequence of bytes, starting with the lowest address (lowest offset), with each byte written in hexadecimal. The "LRU way" bit indicates which way is least recently used in each set.

	way 0			way 1
index (written in binary)	valid	tag (written in binary)	data	valid	tag (written in binary)	data	LRU way
000	1	00000	12 13 14 15	1	00100	A0 B1 C2 D2	0
001	0	—	—	0	—	—	1
010	1	00100	E1 F2 A3 B4	1	00101	D4 D5 D6 D7	1
011	1	01000	00 11 22 33	1	01010	01 44 66 88	0
100	0	—	—	0	—	—	0
101	1	11111	09 19 29 39	0	—	—	1
110	0	—	—	0	—	—	0
111	0	—	—	0	—	—	0

Question 5 (4 points) (see above)

The value 0xB1 has index 0, tag 4, and offset 1 (index, tag, and offset all written in decimal). What is its address? Write your answer as a binary number.

Answer:

Question 6 (4 points) (see above)

When reading a byte from the address with tag 00000 (binary), index 011 (binary), and offset 3 (decimal), the cache will ___. Select all that apply.

replace data in a cache block that was previously marked valid
return a byte with value 0x15
change at least one "LRU way" bit
change at least one valid bit

quiz for week 9

Consider a direct-mapped data cache with

16 byte cache blocks,
256 sets
a write-allocate policy and a write-back policy

Suppose the cache is initially empty and a program using this cache does the following sequence of accesses, in this order:

writing 4 bytes to an address with set index 4, block offset 8, tag 0x1234
reading 4 bytes from an address with set index 3, block offset 0, tag 0x1234
writing 4 bytes to an address with set index 3, block offset 4, tag 0x1234
writing 4 bytes to an address with set index 4, block offset 0, tag 0x2000

In the following questions ask about what happens during the accesses above. Do not include anything that will happen while processing later accesses, even if they are affected by the accesses above.

Question 1 (3 points) (see above)

How many bytes will be written to the next level of cache or main memory during the accesses above?

Answer:

Question 2 (3 points) (see above)

How many bytes will be read from the next level of cache or main memory during the accesses above?

Answer:

Consider a direct-mapped data cache with

16 byte cache blocks,
256 sets

on a system where virtual memory is not in use.

Question 3 (4 points) (see above)

Suppose a program uses this cache to access an array of 8192 four-byte integers which is located at address 0x1000000.

If a program reads from array[100], then a cache block containing array[100] will be stored in the cache. If the program then accesses array[X] for some X, then the cache block containing array[100] will be evicted from the cache. Give a possible value of X that is within the bounds of the array.

Answer:

Suppose a system has:

4096 byte pages;
two-level page tables with 1024 entries in page tables at each level;
a 2-way data TLB with 16 total entries; and
a 8KB (8192 byte) data cache with 256-byte blocks which uses only physical addresses. (This cache has 8 block offset bits and 5 set index bits.)

Question 4 (3 points) (see above)

When translating the virtual address 0xABCDEF, the system will check the TLB set with index _____.

Answer:

Question 5 (4 points) (see above)

Suppose a program running on this system has an 8192 byte array of bytes. Depending on how the program's page tables are setup, sometimes index 4096 of the array and index 0 of the array map to the same data cache set, so they cannot be stored simulatenously in the cache, even though they are both present in memory. Which of the following must be true for this situation to happen? Select all that apply.

the page numbers of the physical pages assigned to the array are not consecutive
the page table entries for virtual pages containing the array are stored across two different second-level page tables
the starting address of the array is in the middle of a page (not a multiple of 4096)
two or more of the virtual pages used by the array have the same tag in the TLB

Consider the following code that uses multiple threads using the POSIX ("pthreads") API.

int x, y;
void *thread_function(void *argument) {
    int *p = (int*) argument;
    *p = *p + x;
    return &y;
}

int main() {
    pthread_t thread;
    int z;
    x = 10; y = 20;
    if (0 != pthread_create(&thread, NULL, thread_function, &z))
        handle_error();
    printf("here\n");
    void *result;
    if (0 != pthread_join(thread, &result))
        handle_error();
    printf("%d %d %d %p\n", x, y, z, result);
    return 0;
}

Question 6 (3 points) (see above)

The *p = *p + x line modifies ____.

a value on the stack of the thread created by pthread_create
a value on the stack of the thread that was running main
a value stored in a register
a value stored in memory outside of any stacks
something that depends on how the code is compiled (explain briefly in comments)
none of the above (explain briefly in comments)

Question 7 (4 points) (see above)

The message here may be output ___. Select all that apply.

before thread_function starts running
after thread_function starts running but before it returns
after thread_function returns
after all resources created by the pthread_create call in main() are freed

quiz for week 10

Consider the following C code that uses the POSIX API and implements a binary search tree search function that can be called from multiple threads at the same time:

struct tree_node {
    int value;
    struct tree_node *left;
    struct tree_node *right;
    pthread_mutex_t lock;
};

bool Find(struct tree_node *node, int value) {
    bool result = false;
    if (node != NULL) {
        struct tree_node *next = NULL;
        pthread_mutex_lock(&node->lock);
        if (value == node->value) {
            result = true;
        } else {
            if (value < node->value) {
                next = node->left;
            } else {
                next = node->right;
            }
        }
        pthread_mutex_unlock(&node->lock);
        if (next != NULL) {
            result = Find(next, value);
        }
    }
    return result;
}

This Find function would be called by passing the root node of the tree as node and some integer value to search for.

Question 1 (4 points) (see above)

Suppose we run multiple instances of the above Find function from multiple threads at the same time. Depending on the tree and the arugments to Find, the threads will be able to make more or less effective use of mutliple cores.

Which of the following changes would increase how effectively the threads use multiple cores? Select all that apply.

changing from a tree with many nodes to one with just one node
changing the tree so most non-leaf nodes in the tree only have one child instead of two
distributing the value arguments to Find so they are split between more evenly between the left and right sides of the tree

Question 2 (4 points) (see above)

The locking in the Find routine should enable writing routines that manipulate the tree in parallel with calls to Find. Consider the following code that attempts to manipulate the tree:

pthread_mutex_lock(&root->lock);
struct tree_node *left;
left = root->left;
pthread_mutex_unlock(&root->lock);
pthread_mutex_lock(&left->lock);
struct tree_node *left_left;
left_left = left->left;
left->value = left_left->value;     /* Line A */
left->left = left_left->left;       /* Line B */
left->right = left_left->right;
free(left_left);
pthread_mutex_unlock(&left->lock);

Despite the locking, some race conditions can occur when this code is run at the same time as calls to Find. Which of the following are possible? Select all that apply.

Find can examine the old value of left->value (before the modification in Line A) but recurse to the new value of left->left (set by Line B), causing it to return the wrong result in some cases
the free on left_left can run before Find inspects left_left's value (causing Find to try to access deallocated memory)
if left_left->right is NULL, then Find might attempt to dereference a NULL pointer

Consider the following (incomplete) C code that uses barriers:

pthread_barrier_t barrier;
int global = 0;
int N, M;
void *f1(void *ignored) {
    for (int i = 0; i < N; i += 1) {
        pthread_barrier_wait(&barrier);
        printf("f1: %d, global: %d\n", i, global);
        global += i + 1;
        pthread_barrier_wait(&barrier);
    }
    return NULL;
}

void *f2(void *ignored) {
    for (int i = 0; i < M; i += 1) {
        pthread_barrier_wait(&barrier);
        if (_____________ /* BLANK 1 */) {
            global += 10;
        }
    }
    return NULL;
}

void run() {
    pthread_barrier_init(&barrier, NULL, 2);
    pthread_t t1, t2;
    N = 100;
    M = _________ /* BLANK 2 */;
    pthread_create(&t1, NULL, f1, NULL);
    pthread_create(&t2, NULL, f2, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    pthread_barrier_destroy(&barrier);
}

The _s marked "BLANK x" (for some x) represent some omitted code which is the subject of a question below.

Question 3 (4 points) (see above)

In order to make global have the largest value possible without creating race conditions and without adding additional assignments to global, what is an appropriate expression to place in the blank marked BLANK 1?

Answer:

Question 4 (4 points) (see above)

What is an appropriate expression to place in the blank marked BLANK 2 to allow run() to terminate?

Answer:

Consider the following C code that uses the POSIX API (where identifiers in ALL_CAPITALS are integers defined elsewhere):

struct PageVersion {
    pthread_mutex_t lock;

    /* only accessed after locking 'lock' */
    char *label;
    char *data;
    struct Page *page;

    /* only accessed after locking 'page->lock' */
    struct PageVersion *next;
};

struct Page {
    pthread_mutex_t lock;
    struct PageVersion *all_versions;
    struct PageVersion *active_version;
};

struct PageVersion *add_version_to(struct Page *page, char *label, char *data) {
    struct PageVersion *version = malloc(sizeof(struct PageVersion));
    version->label = label;
    version->data = data;
    version->page = page;
    pthread_mutex_init(&version->lock);
    pthread_mutex_lock(&page->lock);
    version->next = page->all_versions;
    page->all_versions = version;
    pthread_mutex_unlock(&page->lock);
    return version;
}

void move_version_to_page(struct PageVersion *version, struct Page *new_page) {
    pthread_mutex_lock(&version->lock);
    pthread_mutex_lock(&version->page->lock);
    pthread_mutex_lock(&new_page->lock);
    struct PageVersion *iterator = version->page->all_versions;
    if (iterator == version) {
        version->page->all_versions = version->next;
    } else {
        while (iterator->next != version) {
            iterator = iterator->next;
        }
        iterator->next = iterator->next->next;
    }
    version->next = new_page->all_versions;
    new_page->all_versions = version;
    pthread_mutex_unlock(&new_page->lock);
    pthread_mutex_unlock(&version->lock);
    pthread_mutex_unlock(&version->page->lock);
}

bool set_active_by_label(struct Page *page, char *label) {
    bool found = false;
    pthread_mutex_lock(&page->lock);
    struct PageVersion *iterator = page->all_versions;
    while (iterator) {
        pthread_mutex_lock(&iterator->lock);
        if (0 == strcmp(iterator->label, label)) {
            found = true;
            page->active_version = iterator;
            break;
        }
        iterator = iterator->next;
        pthread_mutex_unlock(&iterator->lock);
    }
    pthread_mutex_unlock(&page->lock);
    return found;
}

Question 5 (5 points) (see above)

It is possible for simulatenous calls to the functions above to cause a deadlock. Which of the following calls (if run simulatenously from different threads) could result in a deadlock? Select all that apply.

In the calls below A, B, C, etc. represent distinct struct Page* values and A1, A2, A3, etc. represent distinct PageVersion* values that are on the list of all versions in A.

set_active_by_label(A, A1->label) and set_active_by_label(A, A2->label)
move_version_to_page(A1, B) and move_version_to_page(A2, B) and move_version_to_page(A3, C)
move_version_to_page(A1, B) and move_version_to_page(A1, C) and move_version_to_page(B1, C)
move_version_to_page(A1, B) and move_version_to_page(B1, C) and move_version_to_page(C1, A)
set_active_by_label(A, A2->label) and move_version_to_page(A1, B)

Question 6 (4 points) (see above)

One possible deadlock occurs when move_version_to_page(A1, B) and move_version_to_page(B1, A) are called simulatenously (using the conventions for A1, B1, A, B from the previous question).

Which of the following changes would resolve this deadlock? Select all that apply

change pthread_mutex_lock(&version->page->lock) to
while (0 != pthread_mutex_trylock(&version->page->lock)) {
pthread_mutex_unlock(&version->lock);
pthread_mutex_lock(&version->lock); }
}
change the pthread_mutex_lock(&new_page->lock) to while (0 != pthread_mutex_trylock(&new_page->lock)) {}
change the sequence of lock calls to:
pthread_mutex_lock(&version->lock);
if (new_page < version->page) {
  pthread_mutex_lock(&new_page->lock);
  pthread_mutex_lock(&version->page->lock);
} else {
  pthread_mutex_lock(&version->page->lock)
  pthread_mutex_lock(&new_page->lock);
}
move the pthread_mutex_lock(&new_page->lock) to the beginning of the move_version_to_page function

quiz for week 11

Consider the following incomplete code that uses monitors:

pthread_mutex_t lock;
char *files[NUM_FILES];
bool file_is_ready_to_process[NUM_FILES];
bool file_is_processed[NUM_FILES];
pthread_cond_t file_is_ready_cv[NUM_FILES];
pthread_cond_t file_is_processed_cv[NUM_FILES];
pthread_cond_t any_file_processed_cv;

void PrepareFile(int index) {
    SetupFileToBeProcessed(files[index]);
    pthread_mutex_lock(&lock);
    file_is_ready_to_process[index] = true;
    pthread_cond_broadcast(&file_is_ready_cv[index]);
    pthread_mutex_unlock(&lock);
}

void ProcessFileWhenReady(int index) {
    pthread_mutex_lock(&lock);
    while (!file_is_ready_to_process[index]) {
        pthread_cond_wait(&file_is_ready_cv[index], &lock);
    }
    pthread_mutex_unlock(&lock);    /* LINE A */
    ProcessFile(files[index]);
    pthread_mutex_lock(&lock);      /* LINE B */
    file_is_processed[index] = true;
    pthread_cond_broadcast(&any_file_processed_cv);
    pthread_cond_broadcast(&file_is_processed_cv[index]);
    pthread_mutex_unlock(&lock);
}

void WaitForTwoFilesToBeProcessed(int index1, int index2) {
    pthread_mutex_lock(&lock);

    while (_______________________________________________ /* BLANK 1 */) {

        pthread_cond_wait(&file_is_processed_cv[_____ /* BLANK 2 */], &lock);
    }
    while (_______________________________________________ /* BLANK 3 */) {

        pthread_cond_wait(&file_is_processed_cv[_____ /* BLANK 4 */], &lock);
    }
    pthread_mutex_unlock(&lock);
}

/* returns first file_index_list[i] such that file_is_processed[file_index_list[i]]
   is false; if no such entry is exists, returns -1
 */
int FirstNonReadyFile(int *file_index_list, int files_index_list_length) {
    for (int i = 0; i < file_index_list_length; i += 1) {
        if (!file_is_processed[file_index_list[i]]))
            return file_index_list[i];
    }
    return -1;
}

void WaitForManyFilesToBeProcessed(int *file_index_list, int file_index_list_length) {
    pthread_mutex_lock(&lock);
    while (_______________________________________________ /* BLANK 5 */) {

        pthread_cond_wait(&any_file_processed_cv, &lock);  /* LINE C */
    }
    pthread_mutex_unlock(&lock);
}

Assume all relevant header files are included (and NUM_FILES is an integer constant larger than 10).

A line of _s marked with a comment like /* BLANK 1 */ indicates a place where code may need to be added to complete the code above.

When WaitForTwoFilesToBeProcessed(index1, index2) returns, file_is_processed[index1] and file_is_processed[index2] should both be true because they were set by ProecssFileWhenReady() in other threads.

Similarly, when WaitForManyFilesToBeProcessed(index_array, index_array_length) returns, file_is_processed[index_array[i]] should be true for all i from 0 to index_array_length - 1, inclusive.

Question 1 (4 points) (see above)

what code would be most appropriate to place in the blanks marked /* BLANK 1 */ and /* BLANK 3 */?

file_is_processed[index1] and file_is_processed[index2]
!file_is_processed[index1] and !file_is_processed[index2]
!file_is_processed[index1] && !file_is_processed[index2] and !file_is_processed[index1] || !file_is_processed[index2]
file_is_processed[index1] == file_is_processed[index2] and !file_is_processed[index1] || !file_is_processed[index2]
!file_is_processed[index1] && !file_is_processed[index2] and !file_is_processed[index2]
none of the above (explain in comments)

Question 2 (4 points) (see above)

Which of the following code would be most appropriate to place in the blank marked /* BLANK 5 */?

FirstNonReadyFile(file_index_list, file_index_list_length) == -1
FirstNonReadyFile(file_index_list, file_index_list_length) != -1
FirstNonReadyFile(file_index_list, file_index_list_length) == file_index_list[0]
FirstNonReadyFile(file_index_list, file_index_list_length) != file_index_list[0]
file_is_processed[file_index_list[file_index_list_length - 1]]
!file_is_processed[file_index_list[file_index_list_length - 1]]

Question 3 (5 points) (see above)

Assuming the best answer to the previous question, the line marked LINE C could be replaced with which of the following without making WaitForManyFilesToBeProcessed behave incorrectly? Select all that apply.

int index = FirstNonReadyFile(file_index_list, file_index_list_length); if (index != -1) pthread_cond_wait(&file_is_processed_cv[index], &lock)
int index = FirstNonReadyFile(file_index_list, file_index_list_length); if (index != -1) pthread_cond_wait(&any_file_processed_cv, &file_is_processed_cv[index])
for (int i = 0; i < file_index_list_length; i += 1) {
pthread_cond_wait(&file_is_processed_cv[file_index_list[i]], &lock);
}
for (int i = 0; i < file_index_list_length; i += 1) {
if (file_is_processed[file_index_list[i]]) pthread_cond_wait(&file_is_processed_cv[file_index_list[i]], &lock);
}
for (int i = 0; i < file_index_list_length; i += 1) {
if (!file_is_processed[file_index_list[i]]) pthread_cond_wait(&file_is_processed_cv[file_index_list[i]], &lock);
}

Question 4 (4 points) (see above)

If the pthread_mutex_unlock and pthread_mutex_lock calls marked with the comments LINE A and LINE B above were removed, then ____. Select all that apply.

it would be possible for file_is_processed[index] to be modified by two different threads at the same time when it was not previously
only one file would be processed at a time, no matter how many threads called ProcessFileWhenReady
PrepareFile's call to pthread_cond_broadcast would take longer because it would sometimes wait until the corresponding file was processed when it would not previously
it would be possible for two calls to ProcessFileWhenReady to deadlock when they would not previously

Suppose two machines A and B are connected via a network that implements the "mailbox model" we discussed in lecture.

To send data reliably this way, one machine sends a message with that data, and expects the receiving machine to send an acknowledgments of each message with data sent. When acknowledgments are not received after waiting a long time, the data-sending machine is responsible for resending the message with data.

Question 5 (4 points) (see above)

Suppose machine A sends machine B ten messages with data using this scheme. Machine B's network has a problem where every other acknowledgment message (starting with the first one sent) is lost. (So, for example, if machine B attempted to send four acknowledgments in total, the first and third would be lost, regardless of which messages with data they were acknowledging.) Also, the network loses the first message the first time it is sent.

Including messages that are lost, how many messages are sent in total?

Answer:

Question 6 (4 points) (see above)

Suppose it takes 1 time unit to send a message with data, 10 time units for it to cross the network and be received, and 1 time unit to send an acknowledgment of that message, and 10 time units for the acknowledgment to cross the network be received.

If machine A sends machine B ten messages with data, how long (in time units) will it take for machine B to receive all ten messages with data assuming:

there are no lost or corrupted messages
machine A waits until it receives an acknowledgment of message with data N before sending message with data N + 1

Answer:

quiz for week 12

Suppose machine A and machine B are on separate wifi networks. The routers for these wifi networks are connected to a common wired network.

Question 1 (4 points) (see above)

If machine A sends a packet to B and B sends a packet acknowledging reciept of the original packet, what is the minimum number of link-layer frames (link-layer messages) that must be sent as part of this process? (Briefly identify them in the comments.)

Answer:

Question 2 (3 points) (see above)

As part of sending the packet from machine A to machine B, the wifi router for machine B's wifi network will send a frame (link-layer message) to machine B (possibly contained within another message).

What will the source address of this frame represent?

the identity of machine B's wifi router
the identity of machine A's wifi router
the identity of machine A
something else (explain briefly in comments)

Question 3 (3 points) (see above)

As part of sending the packet from machine A to machine B, the wifi router for machine B's wifi network will send a packet (network-layer message) to machine B (which will be contained within link-layer frames as it crosses the network).

What will the source address of this packet represent?

the identity of machine B's wifi router
the identity of machine A's wifi router
the identity of machine A
something else (explain briefly in comments)

Suppose A, B, and C attempt to secure communication using (symmetric) cryptography. To do this, they have three shared keys:

key 1 is known by A, B, and C
key 2 is known by just A and B

Suppose A is sending data to B via C and wants to ensure that C cannot interfere or eavesdrop despite C's role in forwarding the message. To do this, A will encode the message using some cryptographic scheme before providing it to C to forward. Suppose (like the examples in lecture) we have an encryption and decryption functions E(key, message) (which produces a ciphertext) and D(key, ciphertext) (which recovers the original message), and a MAC function E(key, message) (which produces a MAC "tag"). In the options below, E(x, y) means the result of applying E to x and y, and similar for MAC(x, y).

The following questions ask which schemes prevent C from interfereing or eavesdropping in a particular way.

Question 5 (4 points) (see above)

Assuming C only forwards A's message unchanged, which of the following schemes would prevent C from learning information about the contents of the data other than its length that it did not already know? Select all that apply.

A's message consists of E(key 2, the data)
A's message consists of E(key 2, the data) and MAC(key 1, the data)
A's message consists of E(key 1, the data)
A's message consists of E(key 1, the data) and MAC(key 2, the data)

Question 6 (4 points) (see above)

Which of the following schemes would prevent C from changing the data without B being able to easily detect this? (For this question, you may assume that C can guess what the data A intended to send was, even if C cannot infer this from the message.) Select all that apply.

A's message consists of E(key 2, the data)
A's message consists of E(key 2, the data) and MAC(key 1, the data)
A's message consists of E(key 1, the data)
A's message consists of E(key 1, the data) and MAC(key 2, the data)

quiz for week 13

Consider a system for allowing students to submit course registration requests to the registrar.

The system uses the following insecure (as partly revealed in the correct answers to the questions below) scheme.

the registrar and each student has their own private keys (separate keys for encryption and signing; assume the appropriate key is used depending on the operation required), which are not shared with anyone; the corresponding public keys are shared with all parties securely in advance
a student's registration request consists of:
- the student's name; and
- a list of course registration requests, each containing:
  - a cryptographic hash of the identity of the course being registered for
  - a digital signature made using the student's private key over the cryptographic hash
  - a message encrypted using the registrar's public key containing:
    - the student's name
    - the section of the course the student is requesting
    - the identity of the course being registered for
the student sends the request to the registrar over an (insecure) network
the registrar verifies all signatures, that the messages decrypt properly, and that the student names all match when it receives them
after processing the request, the registrar sends back a message containing:
- the current time
- a submessage encrypted using the student's public key containing:
  - a list of the courses the student is registered for
- a digital signature made using the registrar's private key over:
  - the current time and the student's name

Question 1 (7 points) (see above)

An attacker with the ability to read, replace, and introduce new messages on the network can do which of the following? (You may assume that the attacker has access to all relevant public keys and that asymmetric encrypted messages cannot be meaningfully changed by transforming the ciphertext, but they can be replaced by the attacker with new messages or messages obtained from elsewhere. In addition, you may assume the attacker is aware of the scheme described above, including the choice of cryptographic functions (hash, encryption, digital signature, etc.) in use.) Select all that apply.

given a small list of interesting classes, determine which students are registering for that class
given a small list of interesting sections of classes, determine which students are registering for that particular class section
make the registrar think a student requested to register for fewer classes than they actually requested (but still at least one class)
make the registrar think a student requested to register for an additional class they did not request
make the registrar think a student requested to register for a different section of the class than they actually requested
make a student think they were registered for fewer courses than they actually were
make a student think they were registered for courses they were not actually registered for

As an alternative to the asymmetric key exchange scheme we discussed in lecture (which we mentioned was commonly used in TLS), two communicating parties A and B can use asymmetric encryption or signatures in place of the special "asymmetric key exchange" operator. There is a secure way to do this, but the following questions discuss insecure ideas.

In the question below PE and PD represents asymmetric encryption and decryption. Assume that A and B have shared public keys for encryption and signature verification in advance, and that the corresponding private keys are never available to any attacker.

Question 3 (5 points) (see above)

Consider the following insecure protocol:

A chooses a new, secret value K
A sends to B: PE(B's public key, K)
B sends to A: PE(A's public key, K)
then A and B use K as a symmetric key for symmetric authenticated encryption

Which of the following are omissions of this scheme assuming no additional steps are added to mitigate them? Select all that apply.

A cannot verify that it ever received a message from B or successfully sent a message to B
B cannot verify that it ever received a message from A or successfully sent a message to A
if B runs this protocol with an attacker instead of A, then the attacker can trick B into decrypting someone else's encrypted message to B and sending the decrypted data to the attacker
an attacker pretending to be A that intercepted an earlier use of the protocol can cause the K value from that earlier version of the protocol to be reused
an attacker pretending to be B that intercepted an earlier use of the protocol can cause the K value from that earlier version of the protocol to be reused

quiz for week 14

Consider a seven-stage pipelined processor with the following pipeline stages:

fetch
decode (include register file read)
execute part 1
execute part 2
execute part 3
memory
writeback

Assume the stages do the same thing as the stages we discussed in lecture, except some of the stages we discussed in lecture are divided into multiple parts. When this happens, the divided stages need the inputs near the beginning of the stage for the first part and only have the results of that stage near the end of the stage for the last part.

Assume the processor resolves data hazards with a combination of stalling and forwarding.

Question 3 (4 points) (see above)

When executing

addq %r8, %r9
subq %r9, %r10
xorq %r9, %r11
imulq %r8, %r12

if the addq completes its fetch stage in cycle 0 then the imulq will complete its writeback stage in cycle ___. Account for any stalling necessary to resolve data hazards in the code above (and remember that the stalling will be combined with forwarding).

(In each of the instructions above, the first operand is a source and the second is a source and destination of the instruction.)

Answer:

Question 4 (6 points) (see above)

Consider executing the following assembly, with the following pipeline diagram:

instruction / cycle:    0  1  2  3  4  5  6  7  8  9  10 11 12
addq %r8, %r9           F  D  E1 E2 E3 M  W
imulq %r10, %r13           F  D  E1 E2 E3 M  W
movq (%r9), %r10              F  D  E1 E2 E3 M  W
subq %r8, %r9                    F  D  E1 E2 E3 M  W
nop                                 F  D  E1 E2 E3 M  W
nop                                    F  D  E1 E2 E3 M  W
xorq %r10, %r9                            F  D  E1 E2 E3 M  W

the processor will use forwarding alone (the explicit nops are sufficient to avoid stalling) to resolved data hazards. (In each of the instructions above, the first operand is a source and the second is a source and destination of the instruction.) Which of the following forwarding operations will occur? Select all that apply.

Consider the following C code:

int x = 100;
do {
    if (5 == x % 10)
        foo();
    quux();
    x -= 1;
} while (x >= 0);

Question 5 (4 points) (see above)

Suppose the above code is executed with a branch predictor that predicts all conditional branches as taken if and only if they were taken the last time time they were run. (Branches are identified by the address of the corresponding instruction.)

If the loop above is run repeatedly (by surrounding some outer loop) then how many branch mispredictions would occur each time the loop runs?

Assume that:

the initial predictions for each conditional branches are based on what happened at the end of a previous run of the loop;
the foo and quux functions contain no conditional branches;
the compiler generates code that has one conditional branch instruction to implement the do-while condition and one conditional branch instruction to implement the 5 == x % 10 condition;
each conditional branch instruction is taken if the condition written in the C code (5 == x % 10 or x >= 0) is true;
each conditional branch instruction runs once per iteration of the do-while loop;

Answer:

Question 6 (4 points) (see above)

Which (if any) of the following changes to branch prediction would decrease the number of branch mispredictions versus the previous scenario? Select all that apply.

if a branch was predicted as taken if and only if the last conditional branch (regardless of whether it was the same branch) was predicted as taken
if the conditional branch from the if statement was always predicted as not taken
if conditional branches were predicted as taken if and only if they were taken the last two times the same conditional branch was run
if conditional branches were predicted as not taken if and only if they were not taken the last two times the same conditional branch was run

quiz for week 15

For the following questions, consider an out-of-order processor with a similar design to what we discussed in lecture (with register renaming and a single instruction queue).

Suppose the processor executing the following assembly code:

addq %r8, %r9
subq %r9, %r10
xorq %r8, %r12
andq %r8, %r10
imulq %r12, %r10

(In each of the instructions above, the first operand is a source and the second is a source and destination of the instruction.)

Question 1 (4 points) (see above)

The computation for addq %r8, %r9 could occur at the same time as the computation for ____. Select all that apply.

subq %r9, %r10
xorq %r8, %r12
andq %r8, %r10
imulq %r12, %r10

Question 2 (3 points) (see above)

The computation for xorq %r8, %r12 could occur at the same time as the computation for ____. Select all that apply.

subq %r9, %r10
andq %r8, %r10
imulq %r12, %r10

Question 3 (4 points) (see above)

If the processor has two ALUs that each can do the computation for any instruction in one cycle, then the minimum number of cycles it would need to do the computation for the instructions above would be ____ cycles.

Answer:

Question 4 (4 points)

Consider the following C code:

unsigned char array[4096];
for (int i = 0; i < 4096; i += 1) { 
    array[i] += 1;
}
*pointer += 1;

Suppose we run this:

on a system has a 4096-byte direct-mapped cache with 32-byte blocks and where (to simplify this problem) virtual memory is not in use; and
only accesses to *pointer and array use the cache.

In that case, the for loop will fill the data cache with the elements of array, and then the addition to *pointer will evict something something from the cache.

Suppose we don't know the exact value of pointer but have limited it to two possibilities. By timing how long it takes to access parts of array after running this code, we can sometimes distinguish between two values and sometimes cannot. For which pairs of possible values could we distinguish them by timing access to the array? Select all that apply.

0x402100 and 0x401200
0x123789 and 0x234789
0x123700 and 0x123710
0x995533 and 0x994433

Consider the following C code:

unsigned char table[256] = {
    0,
    11 % 256 , (2 * 11) % 256,
    (3 * 11) % 256, (4 * 11) % 256,
    ... /* and so on */
};


int Example(int x, int y) {
    if (x < 256) {
        unsigned char result = table[x];
        return table[(result + y) % 256];
    } else {
        return -1;
    }
}

Suppose table is located at address 0x100000, and the system has an 4-way L1 cache with 64-byte blocks and 256 sets.

To simplify this problem, assume that virtual memory is not in use.

Question 5 (4 points) (see above)

Suppose this function is vulnerable to a Spectre-style attack through which an attacker who can control the values of x and y and has access the ability to detect what's evicted from the cache can determine information about parts of memory they are not supposed to be able to access.

If attacker wants to find out about the 1-byte value in memory at address 0x143000, what value would be best to supply for x to perform this attack? Write your answer as a hexadecimal number.

Answer:

Question 6 (4 points) (see above)

Suppose an attacker finds out that for the value of x that is the correct answer to the previous question, setting y to 15 results in an access that evicts a value from set index 2 of the L1 cache, and setting y to 16 results in an access that evicts a value from set index 3 of the L1 cache.

Based on this behavior, give a possible 1-byte value from memory at 0x143000.

Answer:

S2024 quizzes

quiz for week 2

quiz for week 3

quiz for week 4

quiz for week 5

quiz for week 6

quiz for week 7

quiz for week 9

quiz for week 10

quiz for week 11

quiz for week 12

quiz for week 13

quiz for week 14

quiz for week 15