- last time
    - using cache set to solve for values
    - writing your own attack code
        - normal with Meltdown
        - JavaScript compiled; indirect branch predictor mistraining

- test-taking strategies
    - if you think a quesiton is unclear, writing assumption you had to make
        will allow us to do partial credit/etc.
- formulas
    - on the schedule page, there is a reference sheet linked
    - personally, for e.g. cache formulas, I try to remember the functional picture
        of what happens and be able toderive the formulas I know
        example: Iknow offset bits choose where in a block
            from that I know there are 2^(offset bits) bytes in a block

- make syntax
    target: dependency1 dependency2 ...
        command1
        command2
        ...

    target: what file will be created by the commands
    dependency1, etc.: what things will be used by the commands
        so that we should rerun the command if they change

    make handles chains of dependencies, so, e..g
        foo.exe: foo.o
            ... # linking command

        foo.o: foo.c
            ... # compiling command

    I don't need to write "foo.exe: foo.c"

    ----

    variables for convenicnce

        FOO=xxx
        ...
        $(FOO) # instead of writing "xxx"

    most commonly "FOO" is something like "CC" or "CLFAGS"
    and "xxx" is something like "clang" or "-Wall -pedantic"

    ----
    
    pattern rules:

    instead of writing
        foo.o: foo.c
            gcc -c foo.c

        bar.o: bar.c
            gcc -c bar.c

    we can write
        %.o: %.c
            gcc -c $< 

    (we won't expect you to memorize $< $^ $@ --- will supply in quesotn/etc. if needed)


- kernel v user mode
    - processors want to support OSes restricting what programs can do
    - primary mechanism: the OS code runs in "kernel mode"
        where it can do stuff user (normal) code can't
    - processor switches to kernel mode on "exceptions"
    - OS can switch out of kernel mode when it wants to run normal code

    - typical operations that can only happen in kernel mode:
        - talking directly to I/O devices (mostly)
            - normal program code talks to I/O devices only by
              asking the OS to do that operation on its behalf
            - example: only way to talk to keyboard on most consumer OSes
                is by calling existing OS code to do it
        - modifying the page table base pointer
        - accessing memory that has been marked non-user-accessible in the page tables
        - changing where exceptions start running code
        ...


- what is an exception / when do exception handlers run
    - exception is way for the hardware to run the OS
    - does an exceptoin handler run? --->
        does the hardware need to run the OS?
    - different types of reasons the hardware would run the OS:
        - the current program requested it because it wants to do 
          an operation that the OS needs to do for it
            (any of the kernel-mode-only operations)
            "system call"
        - the current program did something weird
            (segfault, divide-by-zero, ...)
        - an external device (like keyboard, network interface, ...)
            needs attention
                - this is what we sometimes call an "I/O exception"
                - only needs to happen if we wouldn't otherwise
                  be talking to that I/O device
                - typically: finishing some operation we started earlier
                  or bringing new input to our attention
        - a timer the OS set has gone off

    - what the exceptoin handler (the OS code that runs) does
        doesn't change whether or not it's an exception
        - common points of confusion:
            - if a timer goes off and the OS decides to switch to another program
                or the OS decides to do nothing about the timer and a set new timer
                --> same number of exceptions

- where is process information (PCB, etc. stored)
    - PCB = process control block
        (a term academia likes using for information about a process)
    - TCB = thread control block

    - informatoin about a process:
        - open files
        - process ID
        - virtual memory layout (page tables, and mapping information used to
            fill in page tables)
        - which signal handlers are registered
        - wihch signals are blocked
        - ...

    - information about a thread:
        - registers including PC and stack pointer
        - ...

    information about a process/thread is "mostly" stored
        in the OS's memory
        exceptoin:
            when a thread is running, its registers/PC/etc.
                are stored in the processor

- signal unsafety
    - if a signal handler interrupts an "unsafe" function,
        we can't call an "unsafe" function.
    - our example in lecture of an unsafe function is malloc()

    - big problem:
        - most functions are not written to be called from within themselves
        - with malloc(): signal could happen while we're in the middle
            of manipulating bookkeeping about where the next memory to return is
            - and we could end up modifying that inconsistently
            "reentrant"
        - alternate idea with malloc():
            signal could happen while we have alock on something
            and signal handler could try to acquire tha tlock
            --> dealdlock
        - avoiding this:
            - write all our functions thinking very carefully about the order
                in which we change bookkeeping/etc. information
            - block signals while the function is doing its work on 
                persistent state
        - Unix designers decided --- avoidance is not worth it for most functions
    - Unix specifies a list of "async-signal-safe" functions that are written
        in a way to avoid these problems (but very small list)


- week6 quiz Q3
    - 3-level table
    - second level PT entry at 0x2011a
    - 10-bit page offset
    - 256-entry page tables w/ 4-byte entries
        - each VPN part is 8 bits
    - VA 0x12345678
        - least sig 10 bits: page offset
        - next 8 bits: 3rd VPN part
        - next 8 bits: 2nd VPN part
        - next 8 bits: 1st VPN part
    - PTBR @ 0x10000
    - PT format: MSB[v][w][e][u][unused][PPN]

    - answer: impossible
        - first-level page table entry indicated the locatoin
            of the second-level PT with a PPN
        - so the byte address of table was PPN * page size (2**10)
        - the byte address of the PTE = table address + 4 * (VPN part)
        - 0x2011a = PPN * page size + 4 * VPN part 
            is not possible if PPN and VPN part are integeers
                PPN * page size is a multiple of 4
                4 * VPN part is a multiple of 4
                0x2011a is not a multiple of 4
    - if 0x2011a was replaced with a possible address,
        what would be the PTE value the quesiton was looking for?

        PTE address = (PPN from 1st level PTE) * page size + 4 * (VPN part 2)
                                                                  ^^^^^^^^^^
                                                                  some bits of VA 0x12345678
        can find VPN part 2 from the VA, know the page size
        could solve for the PPN from 1st leve PTE

    - the PTE value found at the first level had this format:
        - MSB[v][w][e][u][unused][PPN]
                                 ^^^^^-- fill in from solving that equation
             ^^^^^^^^^^^^---- set based on knowing hte access suceeded
                so: definitely set Valid and User-mode accessiable
                    if it was just a read, don't know Writeable or Executale
                            (since Q asked for "a value", choose arbitrarily)


- temporal/spatial locality
    - temporal locality:
        if I access Mmeory[X], I'm likely to access Memory[X] in the future
            priamry thing caches are taking advantages of
    - spatial locality:
        if I access Memory[X], I'm likely to access memory[X+c] for some small c soon
            a way caches take advantage of this (but not the only way to)
            is by having large(ish) cache blocks
    - quiz week06 Q5: sptial locality for
            for (int i = 0; i < SIZE; i += 1) {
                for (int j = 0; j < SIZE; j += 1) {
                    A[i] = A[i] + B[i * 4] * C[j * SIZE + i];
                }
            }
        if:
            swapping the lines for (int i = 0; i < SIZE; i += 1) { and
                               for (int j = 0; j < SIZE; j += 1) {
                for A:
                    changes from A[0], A[0], A[0],... A[1], A[1], A[1]..., A[2], A[2], ...
                    to          A[0] A[1] A[2] A[3] A[4] ..., A[0], A[1] A[2], A[3], A[4]
                for B:
                    changes from B[0], B[0], B[0],... B[4], B[4], B[4]..., B[8], B[8], ...
                    to          B[0] B[4] B[8] B[12] B[16] ..., B[0], B[4] B[8], B[12], B[16]
                I'd argue: worse temporal locality, but not much change to spatial
                for C:
                    changes from C[0], C[SIZE], C[2*SIZE], ... C[1], C[1+SIZE], C[1+2*SIZE], ...
                    to           C[0], C[1], C[2], .... C[SIZE-1], C[0+SIZE], C[1+SIZE], C[2+SIZE], ...
                I'd argue: C has gotten way way better spatial locality

        replacing j * SIZE + i with i * SIZE + j
            does the same change to C without changing A, B

        replacing i * 4 with i * 2
            means that instead of doing B[0], ... B[4] .... B[8], do B[0] ... B[2] ... B[4]
                better spatial locality (closer to adjacent accesses)


- dirty bit
    - with write-back, we want to keep a modified value only in the cache
    - BUT this means we eventaully have to update memory
        - dirty bit tracks that "memory still needs an update"
        - we want to track this so we don't update memory for thigs that haven't changed
            - empirically: most programs do many more reads than writes

- TLB
    - with page tables we have a way of taking a virtual page number
        and getting the final page table entry
    - this process is long and often slow
    - TLBs are a cache for this
        - uses the same structure as "normal" caches except:
            lookup with VPN instead of memory address
            have one page table entry per block (--> no offset bits)
            store page table entries, not actual program data

- counting number of threads that exist
    - given that threads can run at different speeds
    - potentially one more thread for each create call
    - that thread is potentially still running until there's a join call
    
    - yes, sometimes threads might finish early
        but they'll still have bookkeeping info around so join works
        unless they were detached
- Quiz 10 Q2 [mutexes]
    void reader(void) {
        pthread_mutex_lock(&lock);
        while (writing) pthread_cond_wait(&proceed, &lock);
        reading++;
        pthread_mutex_unlock(&lock);
       //reading code here
        pthread_mutex_lock(&lock);
        reading--;
        pthread_cond_signal(&proceed);
        pthread_mutex_unlock(&lock);
    }

    void writer(void) {
        pthread_mutex_lock(&lock);
        while (reading) pthread_cond_wait(&proceed, &lock);
        writing++;
        pthread_mutex_unlock(&lock);
       //writing code here
        pthread_mutex_lock(&lock);
        writing--;
        pthread_cond_signal(&proceed);
        pthread_mutex_unlock(&lock);
    }
    B, C: reader/writer can wait indefinitely if enough writers/readers
        yes, because there's no gaurnetee that reader writer gets
            the lock first when reading or writing is 0
            no progress guarentee because pthread_mutex_lock + pthread_cond_signal
                don't say who runs when
        yes, because pthread_cond_signal might not wake up 
            a thread that can actually go, and since cond_signal only
            wakes up ONE thread, even if there is a thread that can go
            it may keep waiting indefinitely
    D: multiple threads can write at a time
        as long as reading == 0, we can have multiple threads reach "Writing cod ehere"
            at the same time
    E: readers/writers can run at the same time
        not true because writers/readers count is accurate and
            readers wait for writers == 0 (or indefinitely)
            writers wait for readers == 0 (or indefinitely)

- pthread_cond_signal v pthread_cond_broadcast
    - if only some of the waiting threads can go, even if only one can make progress,
        signal might not choose one of the that can make progress
        - example: one CV for readers and writers but only one writer can go
            signal might select a reader
        - Quiz week10 Q2 is example where
            cond_signal could wakeup the wrong thread
            cond_broadcast would avoid this by definitely waking up at least one
                of the threads that could go
        - typically the "real" fix for this is more CVs for different conditions

    - if multiple threads can now make progress MUST use bbroadcast
        ("now make progress" means that can all stop waiting
            and when they double-chekc their condition they'll all be able to go)
        - example from lecture: WaitForFinished
            if multiple threads are waiting for this, we should broadcast
                bceause they all can go, not just one

- network address tranlsation
    - Internet ---- [small set of public addreses] Router ---- [large set of pirvate addresses]
    - make machines with private addreses think:
        they use their private address to talk to address on the internet
            w/o changes to any of the programs doing this
    - make machines on the Intenre tthink:
        htey are talking to one of the small set of public addreses
            (since they don't know how to send things to
            our private addreses themselves)
    - mechanism:
        table {public address + port, private address + port}
        edit addresses and port numbers in incoming packets and outgoing packets using that table

- when can M [machine-in-the-middle] change message A to B?
    if there's no MAC or digitial signature, probably they can
    also, they can reuse other messages if we don't take precautions like a number-used-once
        (and/or we don't include needed context like "this message if is for B")

    note: "only" encrypion doesn't prevent this

- TLS handshake
        - key idea:
            use "key agreement" to choose symmetric keys
                client combines server KeyShare + their data (that produced client KeyShare)
                server combines client KeyShare + their data (that produced server KeyShare)
                both get the same set of symmetric keys
            use certificate + digital signature to verify the key agreement was done
                by the correct server

                certificate says "here's a public for digital signatures and
                    it really belongs to website.com"
                    and certificate is verified by a certificate authority (who we've chosen to trust)
                        verified means it's signed and we can check the signature

                server makes a digital signature over its key share to prove it's
                    website.com's keyshare and not evil.com's keysahre

    > ClientHello, KeyShare
    < ServerHello, KeyShare, Certificate, CertificateVerify

    > Finished
    < Finished
        ~~ confirms that we've all seen the same messages in the whole handshake
            ~ make sure that we didn't downgrade TLS version


- pipeline: when to/not to forward
    - don't forward: if we read the value from a register and it's right!
    - do forward: if value is available somewhere (because we computed it) but not in a register yet
        - need to make sure it's the _current_ version of that value
        - need to make sure we need the value (example: not just overwriting it)