- last time - using cache set to solve for values - writing your own attack code - normal with Meltdown - JavaScript compiled; indirect branch predictor mistraining - test-taking strategies - if you think a quesiton is unclear, writing assumption you had to make will allow us to do partial credit/etc. - formulas - on the schedule page, there is a reference sheet linked - personally, for e.g. cache formulas, I try to remember the functional picture of what happens and be able toderive the formulas I know example: Iknow offset bits choose where in a block from that I know there are 2^(offset bits) bytes in a block - make syntax target: dependency1 dependency2 ... command1 command2 ... target: what file will be created by the commands dependency1, etc.: what things will be used by the commands so that we should rerun the command if they change make handles chains of dependencies, so, e..g foo.exe: foo.o ... # linking command foo.o: foo.c ... # compiling command I don't need to write "foo.exe: foo.c" ---- variables for convenicnce FOO=xxx ... $(FOO) # instead of writing "xxx" most commonly "FOO" is something like "CC" or "CLFAGS" and "xxx" is something like "clang" or "-Wall -pedantic" ---- pattern rules: instead of writing foo.o: foo.c gcc -c foo.c bar.o: bar.c gcc -c bar.c we can write %.o: %.c gcc -c $< (we won't expect you to memorize $< $^ $@ --- will supply in quesotn/etc. if needed) - kernel v user mode - processors want to support OSes restricting what programs can do - primary mechanism: the OS code runs in "kernel mode" where it can do stuff user (normal) code can't - processor switches to kernel mode on "exceptions" - OS can switch out of kernel mode when it wants to run normal code - typical operations that can only happen in kernel mode: - talking directly to I/O devices (mostly) - normal program code talks to I/O devices only by asking the OS to do that operation on its behalf - example: only way to talk to keyboard on most consumer OSes is by calling existing OS code to do it - modifying the page table base pointer - accessing memory that has been marked non-user-accessible in the page tables - changing where exceptions start running code ... - what is an exception / when do exception handlers run - exception is way for the hardware to run the OS - does an exceptoin handler run? ---> does the hardware need to run the OS? - different types of reasons the hardware would run the OS: - the current program requested it because it wants to do an operation that the OS needs to do for it (any of the kernel-mode-only operations) "system call" - the current program did something weird (segfault, divide-by-zero, ...) - an external device (like keyboard, network interface, ...) needs attention - this is what we sometimes call an "I/O exception" - only needs to happen if we wouldn't otherwise be talking to that I/O device - typically: finishing some operation we started earlier or bringing new input to our attention - a timer the OS set has gone off - what the exceptoin handler (the OS code that runs) does doesn't change whether or not it's an exception - common points of confusion: - if a timer goes off and the OS decides to switch to another program or the OS decides to do nothing about the timer and a set new timer --> same number of exceptions - where is process information (PCB, etc. stored) - PCB = process control block (a term academia likes using for information about a process) - TCB = thread control block - informatoin about a process: - open files - process ID - virtual memory layout (page tables, and mapping information used to fill in page tables) - which signal handlers are registered - wihch signals are blocked - ... - information about a thread: - registers including PC and stack pointer - ... information about a process/thread is "mostly" stored in the OS's memory exceptoin: when a thread is running, its registers/PC/etc. are stored in the processor - signal unsafety - if a signal handler interrupts an "unsafe" function, we can't call an "unsafe" function. - our example in lecture of an unsafe function is malloc() - big problem: - most functions are not written to be called from within themselves - with malloc(): signal could happen while we're in the middle of manipulating bookkeeping about where the next memory to return is - and we could end up modifying that inconsistently "reentrant" - alternate idea with malloc(): signal could happen while we have alock on something and signal handler could try to acquire tha tlock --> dealdlock - avoiding this: - write all our functions thinking very carefully about the order in which we change bookkeeping/etc. information - block signals while the function is doing its work on persistent state - Unix designers decided --- avoidance is not worth it for most functions - Unix specifies a list of "async-signal-safe" functions that are written in a way to avoid these problems (but very small list) - week6 quiz Q3 - 3-level table - second level PT entry at 0x2011a - 10-bit page offset - 256-entry page tables w/ 4-byte entries - each VPN part is 8 bits - VA 0x12345678 - least sig 10 bits: page offset - next 8 bits: 3rd VPN part - next 8 bits: 2nd VPN part - next 8 bits: 1st VPN part - PTBR @ 0x10000 - PT format: MSB[v][w][e][u][unused][PPN] - answer: impossible - first-level page table entry indicated the locatoin of the second-level PT with a PPN - so the byte address of table was PPN * page size (2**10) - the byte address of the PTE = table address + 4 * (VPN part) - 0x2011a = PPN * page size + 4 * VPN part is not possible if PPN and VPN part are integeers PPN * page size is a multiple of 4 4 * VPN part is a multiple of 4 0x2011a is not a multiple of 4 - if 0x2011a was replaced with a possible address, what would be the PTE value the quesiton was looking for? PTE address = (PPN from 1st level PTE) * page size + 4 * (VPN part 2) ^^^^^^^^^^ some bits of VA 0x12345678 can find VPN part 2 from the VA, know the page size could solve for the PPN from 1st leve PTE - the PTE value found at the first level had this format: - MSB[v][w][e][u][unused][PPN] ^^^^^-- fill in from solving that equation ^^^^^^^^^^^^---- set based on knowing hte access suceeded so: definitely set Valid and User-mode accessiable if it was just a read, don't know Writeable or Executale (since Q asked for "a value", choose arbitrarily) - temporal/spatial locality - temporal locality: if I access Mmeory[X], I'm likely to access Memory[X] in the future priamry thing caches are taking advantages of - spatial locality: if I access Memory[X], I'm likely to access memory[X+c] for some small c soon a way caches take advantage of this (but not the only way to) is by having large(ish) cache blocks - quiz week06 Q5: sptial locality for for (int i = 0; i < SIZE; i += 1) { for (int j = 0; j < SIZE; j += 1) { A[i] = A[i] + B[i * 4] * C[j * SIZE + i]; } } if: swapping the lines for (int i = 0; i < SIZE; i += 1) { and for (int j = 0; j < SIZE; j += 1) { for A: changes from A[0], A[0], A[0],... A[1], A[1], A[1]..., A[2], A[2], ... to A[0] A[1] A[2] A[3] A[4] ..., A[0], A[1] A[2], A[3], A[4] for B: changes from B[0], B[0], B[0],... B[4], B[4], B[4]..., B[8], B[8], ... to B[0] B[4] B[8] B[12] B[16] ..., B[0], B[4] B[8], B[12], B[16] I'd argue: worse temporal locality, but not much change to spatial for C: changes from C[0], C[SIZE], C[2*SIZE], ... C[1], C[1+SIZE], C[1+2*SIZE], ... to C[0], C[1], C[2], .... C[SIZE-1], C[0+SIZE], C[1+SIZE], C[2+SIZE], ... I'd argue: C has gotten way way better spatial locality replacing j * SIZE + i with i * SIZE + j does the same change to C without changing A, B replacing i * 4 with i * 2 means that instead of doing B[0], ... B[4] .... B[8], do B[0] ... B[2] ... B[4] better spatial locality (closer to adjacent accesses) - dirty bit - with write-back, we want to keep a modified value only in the cache - BUT this means we eventaully have to update memory - dirty bit tracks that "memory still needs an update" - we want to track this so we don't update memory for thigs that haven't changed - empirically: most programs do many more reads than writes - TLB - with page tables we have a way of taking a virtual page number and getting the final page table entry - this process is long and often slow - TLBs are a cache for this - uses the same structure as "normal" caches except: lookup with VPN instead of memory address have one page table entry per block (--> no offset bits) store page table entries, not actual program data - counting number of threads that exist - given that threads can run at different speeds - potentially one more thread for each create call - that thread is potentially still running until there's a join call - yes, sometimes threads might finish early but they'll still have bookkeeping info around so join works unless they were detached - Quiz 10 Q2 [mutexes] void reader(void) { pthread_mutex_lock(&lock); while (writing) pthread_cond_wait(&proceed, &lock); reading++; pthread_mutex_unlock(&lock); //reading code here pthread_mutex_lock(&lock); reading--; pthread_cond_signal(&proceed); pthread_mutex_unlock(&lock); } void writer(void) { pthread_mutex_lock(&lock); while (reading) pthread_cond_wait(&proceed, &lock); writing++; pthread_mutex_unlock(&lock); //writing code here pthread_mutex_lock(&lock); writing--; pthread_cond_signal(&proceed); pthread_mutex_unlock(&lock); } B, C: reader/writer can wait indefinitely if enough writers/readers yes, because there's no gaurnetee that reader writer gets the lock first when reading or writing is 0 no progress guarentee because pthread_mutex_lock + pthread_cond_signal don't say who runs when yes, because pthread_cond_signal might not wake up a thread that can actually go, and since cond_signal only wakes up ONE thread, even if there is a thread that can go it may keep waiting indefinitely D: multiple threads can write at a time as long as reading == 0, we can have multiple threads reach "Writing cod ehere" at the same time E: readers/writers can run at the same time not true because writers/readers count is accurate and readers wait for writers == 0 (or indefinitely) writers wait for readers == 0 (or indefinitely) - pthread_cond_signal v pthread_cond_broadcast - if only some of the waiting threads can go, even if only one can make progress, signal might not choose one of the that can make progress - example: one CV for readers and writers but only one writer can go signal might select a reader - Quiz week10 Q2 is example where cond_signal could wakeup the wrong thread cond_broadcast would avoid this by definitely waking up at least one of the threads that could go - typically the "real" fix for this is more CVs for different conditions - if multiple threads can now make progress MUST use bbroadcast ("now make progress" means that can all stop waiting and when they double-chekc their condition they'll all be able to go) - example from lecture: WaitForFinished if multiple threads are waiting for this, we should broadcast bceause they all can go, not just one - network address tranlsation - Internet ---- [small set of public addreses] Router ---- [large set of pirvate addresses] - make machines with private addreses think: they use their private address to talk to address on the internet w/o changes to any of the programs doing this - make machines on the Intenre tthink: htey are talking to one of the small set of public addreses (since they don't know how to send things to our private addreses themselves) - mechanism: table {public address + port, private address + port} edit addresses and port numbers in incoming packets and outgoing packets using that table - when can M [machine-in-the-middle] change message A to B? if there's no MAC or digitial signature, probably they can also, they can reuse other messages if we don't take precautions like a number-used-once (and/or we don't include needed context like "this message if is for B") note: "only" encrypion doesn't prevent this - TLS handshake - key idea: use "key agreement" to choose symmetric keys client combines server KeyShare + their data (that produced client KeyShare) server combines client KeyShare + their data (that produced server KeyShare) both get the same set of symmetric keys use certificate + digital signature to verify the key agreement was done by the correct server certificate says "here's a public for digital signatures and it really belongs to website.com" and certificate is verified by a certificate authority (who we've chosen to trust) verified means it's signed and we can check the signature server makes a digital signature over its key share to prove it's website.com's keyshare and not evil.com's keysahre > ClientHello, KeyShare < ServerHello, KeyShare, Certificate, CertificateVerify > Finished < Finished ~~ confirms that we've all seen the same messages in the whole handshake ~ make sure that we didn't downgrade TLS version - pipeline: when to/not to forward - don't forward: if we read the value from a register and it's right! - do forward: if value is available somewhere (because we computed it) but not in a register yet - need to make sure it's the _current_ version of that value - need to make sure we need the value (example: not just overwriting it)