- context switch and interrupts/exceptions - exception: processor runs the OS [kernel] for some reason - context switch: the OS unloads the state for one process/thread and loads the state for a different process/thread - the only connection between excetpions and context switches is that the OS needs to be running for a context switch to happen - so if we see process A was running and then process B was running that means something needed to run the OS for a little bit of time in between - that something is some kind of exception - if process A didn't do anything to ask the OS to run, it must have been triggered by an event external to process A like --- input from somewhere --- device is ready for output --- timer [setup previously] timer expiring - process A might ask to run not to expliictly switch to another program, but the OS might switch anyway example: it asks for keyboard input, the OS might decide to switch away because there's no input right now - timer exceptions: OS wants to make sure as a last resort that it runs if a program takes too long solution: the hardware supplies the OS a timer - before running process A, the OS sets the timer for (as an example) 1 ms - if the OS doesn't reset the timer later, the processor will trigger an exception in 1 ms and run the OS regardless of what the program was doing - I/O and exceptions - when devices need attention from the OS, they can ask the processor to run the OS - when do they need attention? when they have input ready for the OS to do something with when the OS might want to send output and the yare now ready to take that output (but weren't before) - NB: not everytime a progrma accesses an I/O device --- just when the OS wouldn't otherwise be trying to handle that I/O - F2024 Q6 signals - chat program like the signals lab - we send a message, what exceptions happen before we send the message: both programs have setup a shared memory region both programs have setup a signal handler both programs are waiting for terminal input ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^--- with a system call almost certainly the OS is running something that is not these programs (b/c they can't do anything until they get input) I/O exception to tell the OS there's input available for tihs program [option (depending on how much the OS likes switching away from programs) possibly, the OS might get the input, but not run the chat program right away if so, we might need a timer interrupt before the OS switches ] then the chat program will get the input (return value of a system call) then the chat program will store the input in the shared memory doesn't require any exceptions assuming the memory is already in our page table then the chat program will send a signal requires a system call the OS will eventually run the signal hnadler could do this immediately during the system call to the send th signal or could do it later in respones to some other exception then the other chat program will output to the terminal system call - cache write policies [write-back, write-allocate] - let's say the processor requests to write 2 bytes to 0x104 with value 0x1234 and the cahce block contianing 0x104 ranges from addresses 0x100 through 0x10F inclusive - the cache's flowchart: * lookup address 0x104 in the cache? if it's NOT present, then [IF A WRITE-NO-ALLOCATE CACHE] --- STOP! just send the write to the next level [IF A WRITE-ALLOCATE CACHE] --- add it to the cache if adding it to the cache, we need to read 0x100-0x103 and 0x106-0x10F inclusive from the next level problem: if the process reads from 0x100, we're going to think that value is correctly cached if we don't read it, we can't give processsor back the right value * modify offsets 4 and 5 of the cache block to contain 0x1234 * [IF A WRITE-THROUGH CACHE] --- send the write of 0x1234 to bytes 0x104-0x105 to the next level (write-through cache ---> no dirty bits, we CANNOT mark the block as dirty) * [IF A WRITE-BACK CACHE] --- mark the block as dirty - [IF A WRITE-BACK CACHE] when we REPLACE something in the cache (regardless of read or write) if it's dirty, we have to write to the next level - we have to write the WHOLE block, because we don't track more detial about what changed - TLB structure vvvvvvvvvvvvvvvvvvvvvvvvvvvvv--------permission bits + phyiscal page number TLB mapping [virtual page numbers]----->[last-level page table entries] ^^^^^^^^^^^^^^^^^^^^-- everything in a virtual address that's not the page offset TLB is a cache, organized like our memory caches, but: * instead of using an address as input we use the virtual page number * when we split the input [VPN] into tag/index/offset, we have 0 offset bits TLB index vvvvvvvvv [virtual page number] ^^^^^^^^^^ TLB TAG * use the TLB inde to find a set * look for a matching tag in the set, if so --- hit and the block contains our PTE if not --- miss, and we load the correct PTE into the TLB * because we have 0 offset bits, we store one last-level page table entry per block on a TLB hit: [virtual page number][page offset] ~~~~~~~~~~~~~~~~~~~ %%%%%%%%%%% | | TLB | v v [physical page num][page offset] (plus we check the permission bits retrieved from the TLB) - threads and heap/stack/etc. - when one process has multiple threads, they all have the same page table - any pointer in one thread another thread could use - each thread in a process has a different stack, with a different address - new "empty" stack created by pthread_create() - usually passing pointers through pthread_create() and pthread_create() won't copy antthing the pointer points to - each thread in a process can pass pointers to each other and dereference a pointer created by another thread -- BUT if the pointer points to something that went out of scpoe, it won't work -- most commonly comes up with things on another thread's stack -- usually not a problem to have pointers to global variables/heap b/c those won't go out of scope by accident - broadcast v signal for condition variables - condition variables = list of waiting threads - broadcast = let all of them run and see if they can do something - signal = let one of them run " " " " " " " - both do nothing if no one is waiting [but that shouldn't be a problem because we always check if we need to wait before waiting,and do it with a lock to make sure whether we need to wait does not change while we're checking] - to use pthread_cond_wait correctly, we always need to double-check the condition: while (still need to wait) pthread_cond_wait(...); - this means that it's "safe" to broadcast from a correctness point of view - worst that can happen, is bunch threads realize they still need to wait when they double-check the condition - ideally, we'd like to not wakeup threads that will just statring waiting again SOMETIMES we can use signal instead of broadcast as part of doing this: - we have to know that at most one thread can go as a result of our action - example: if we placed one item on a queue and we signal each time to threads that will remove one item - we know in tihs case, we're always waking up *enough* threads to handle the one item [what we need to know for correctness] - it might be the case that thread we wake up doesn't get to go because another thread snuck in before it (and was neve waiting) - but in typical case, we'll wake up exactly the rgith number of waiting threads - UDP v TCP - "transport" layer protocols - transport layer has two responsibilities that maybe should've been separated (but aren't on the Internet) - organizing data into streams or similar [even though it might not be sent that way] that are potentialy reliable - TCP does this ~~ implements reiable transmission - UDP does NOT do this - getting data to the correct program on machines (when there might be multiple) - TCP and UDP do this - why would you choose UDP over TCP? - if you don't care about reliable transmission (and want lower overhead/latency) - example: for a video stream, you might prefer to NOT resend old data that was lost and instead get the most recent video frames - if you think you can do better than TCP at implementing reliable stream tranmission - example: QUIC (used by some web browsers) - certificates/certificate authorities -- why useful - we can communicate securly with a machine if we have its public keys - problem: how do we get public keys? - part 1 of the modern solution to this: - we ship a list of public keys with OSes and browsers - BUT: - we can't include all the public keys for every machine we want to talk to - we cna't change this list public keys as fast as the public keys of machies we want to talk change - part 2 of the modern solution: - the public keys we ship are for certificate authorities - those public keys are used to verify (digital signature) that those cetrificate authorities are sending us another public key - for this to make sense, we need to trust the certificate authority to verify other public keys [open quesiton: are they actually trustworthy?] - rather than having the certificate authorities send the public key directly, the certificates autohrities generate a message for foo.com saying "I, this Certificate Authotity, think foo.com's public key is X" and foo.com fowards that message to the browser/etc. - that message comes with a digital signature so we can verify it actually comes from the certificate authority - message is called a "certificate" - part 3 of the modern solution: - it's not practical to have certificate authorities directly sign the messages for foo.coms, so they delegate it - we have use another message from the certificate authority so browsers can know that it's delegated to a "sub-certificate authority". - result: "certificate chains" - browser is sent: - certificate 1: "Certificate Authority A delegates to Certificate Authority B with public key Y" [signed by Certificate Authority A's private key] - certificate 2: "Certificate Authority B thinks foo.com's public key is X" [signed by Certificate Authority B's private key = Y] - branch prediction --- when does squashing happen - pipelined processor: we keep running the incorrect prediction until we compute the actual result of the jump in the pipeline HW v--------------- first time we can fetch based on what we computed v----------------- computed where the jump should really go F D E M W jmp F D prediction F after prediction F finally fetching based on what was actually computed [b/c this was the first time we had the information about where to go in time to fetch something] - Q1 on this week's quiz [Sidechannel] function that checks whether its input mathces some rotated version of a hidden_string we learned that it's true for ABCDEFGH hidden_string is ABCDEFGH OR BCDEFGHA OR ... the functions works by trying rotations until one matches then returns true, so we can figure out what rotation is actually in hidden_string by seeing which is fastest to return true should have the shortest time from the correct rotation being tested first - Q2 on this week's quiz [Spectre] Prime() *pointer += ...; Probe() and found out taht array index 512 was slow in Probe() and array index 512 was 0x14000000 + 512 --> cache set index 2 (b/c the cache has 256-byte blocks) so the correct answers are: addresses in cache set 2 that are not in the array cache set 2 = could evict array[512] not in array = needed to added to the cache - Q3 on this week's quiz [Spectre] array[mystery * 2] array[mystery * 3] accessed cache set indices 514, 515, 739 cache set 6xx vvvvvvvvvvvvvvvvvvvvvvv array was address 0x10004000 through __ + 1000 * sizeof(int) ^^^^^^^^^^^-- cache set index 512 739 must be outside the array (probably mystery) array[0 to 3] = cache set 512 array[4 to 7] = cache set 513 ... we can use this to figure out the ranges that cache set 514 + 515 access could have come from we know that the 514 acces was mystery * 2 and 515 " " mystery * 3 - Q4 on this week's quiz [Spectre] int GetLocalPortFor(int file_number) { if (file_number < 0 || file_number >= 4096) { return -1; } else if (!all_files[file_number].is_socket) { return -1; } else { // IN ATTACK: the processor will branch predict that this should run // (but it will be wrong) int socket_index = all_files[file_number].socket_index; // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ --- the address will be computed based on this // IN ATTACK: all_files[file_number].socket_index wil be out of bounds // and chosen by attacker to match some value they want ot l earn about struct SocketInio *socket_info = &all_sockets[socket_index]; return socket_info->local_port; // ^^^^^^^^^^^^^^^^^^^^^^^--- we will learn about the address of this cache access } }