Contents
- 1 Your Task
- 2 Testing
-
3 Hints
- 3.1 xv6 book reading
- 3.2 Note on flushing the TLB
- 3.3 Handy xv6 functions/code snippets
- 3.4 On NULL pointers
- 3.5 Handling/avoiding kernel page faults
- 3.6 On synchronization
- 3.7 xv6’s guard page
- 3.8 Allocating pages on demand
- 3.9 Adding copy-on-write support
- 3.10 On debugging
- 3.11 Reminders about bitwise operators (based on common mistakes)
Changelog:
- 30 October 2019: be explicit about allowing swapping of the
U
andW
fields - 30 October 2019: clarify remark about allocations being rounded to a whole number of pages to explain that heap allocations being whole number of pages is a reason why accesses slightly beyond
sz
can be okay - 30 October 2019: correct some errors in the text about nested page faults where it talked about detecting page faults within page faults, but then gave advice assuming students were detecting page faults that occured within any kernel mode operation
- 1 November 2019: correct formatting in hints about reference count
- 4 November 2019: more explicitly suggest modifying
growproc()
in hints - 6 November 2019: add note about workaround for filesystem running out of space issue, which bizarrely seems to sometimes cause
log_write outside of trans
error
If you get an error from
mkfs
on occassion saying something likeread: Success
, then try editingparams.h
to specify a larger value for FSSIZE (e.g. 2000). Also try this to if you get a panic likelog_write outside of trans
orincorrect blockno
(but there are other possible ways that can happen, such as going outside the bounds of an array reference counts).
Your Task
- (required for checkpoint) Add a new system call
int dumppagetable(int pid)
that outputs the page table of the process with pidpid
to the console (like withcprintf()
) as follows:- Only last-level page table entries should be shown;
- Only page table entries for the user part of memory (bytes 0 through sz) should be output;
- Output a line starting with
START PAGE TABLE
before the page table entry information, and one starting withEND PAGE TABLE
immediately after. You may, if you choose, put additional text afterwards on these lines; - Output a line for each page table entry, with fields seperated by whitespace, in order from lowest virtual address to highest (you may use any kind of ASCII whitespace, even if it (like tabs) doesn’t display correctly in qemu’s graphical console);
- Optionally, output a header line describing the fields.
- The following fields (in this order) should be output for each present (valid) page table entry:
- the virtual page number in hexadecimal or address in hexadecimal (your choice which)
- the text ‘P’
- the text ‘U’ if the page is marked as user-mode accessible and ‘-‘ otherwise
- the text ‘W’ if the page is marked as writable and ‘-‘ otherwise
- the physical page number in hexadecimal or address in hexadcimal (your choice which)
- any other fields of your choice (for example, information that will help you debug the later steps)
As exception to the rule that the fields are in the order above, you may swap the order of the
U
andW
fields. (We intended to specify one order, but we were unintentionally inconsistent with the example below originally, so we will accept either order.) - For non-present pages, you may either omit them entirely from the output or output the following
fields in this order:
- the virtual page number in hexadecimal or address in hexadecimal (your choice which)
- the text ‘-‘ (instead of ‘P’)
- any other fields of your choice (for example, information that will help you debug the later steps)
For example, one possible output might look like:
START PAGE TABLE (pid 54) 0 P W U 8a 1 P W U 8b 2 P W - 8c 3 P W U 8d 4 P W U 89 END PAGE TABLE
indicating that a process has five pages allocated, with virtual page numbers 0 through 4, to various physical page numbers in the range
0x89
through0x8d
, and virtual page 2 is marked as non-user-accessible.We do not care what your system call does if the
pid
supplied is invalid or ifpid
supplied has its page table modified or freed while the system call occurs.When successful, your system call should return
0
. -
(required for checkpoint) xv6 currently allocates stack and heap memory immediately. More commonly, OSs will allocate this memory on demand, saving memory when not all of the stack or heap is used immediately. Modify xv6 to allocate heap memory based on page faults rather than immediately. It is okay if non-heap memory is still allocated statically.
Your automatic allocation scheme:
- Should initialize all newly allocated memory to zeroes, like happens with a stock version of xv6.
- Should kill a process when it attempts to access memory outside of its allocation, including memory beyond the end of heap ([rephased to (hopefully) clarify 2019-10-30:] except that heap allocations should be made in whole pages, so it’s okay to allow processes can access slightly beyond the end of their heap if the end of their heap is in the middle of page) or kernel memory, like happens with a stock version of xv6.
- Should kill a process when it attempts to access memory requiring an allocation on demand and no more memory can be allocated. Make sure a message is printed to the console when this happens.
- Should never make the “guard page” xv6 allocates below the user stack accessible in user mode.
- Must make system calls that attempt to read or write to not-yet-allocated parts of the heap work. For example,
malloc()
ing a very large buffer, then usingread()
to fill it must work. - Should not leak memory.
- Add copy-on-write support for xv6’s
fork()
system call. xv6 currently makes a copy of each page of a process when it forks. Instead, you should not copy the page and instead mark each page as read-only. Then, when a protection fault happens, actually make a copy of the page, update the corresponding page table entry, and mark it as writeable. Your copy-on-write scheme:- Should kill a process when it attempts to write to memory, but there is not enough memory to allocate a copy-on-write page. Make sure a message is printed to the console when it happens.
- Should not leak memory.
- Should never make the “guard page” xv6 allocates below the user stack accessible.
- Must make system calls that attempt to read or write to not-yet-allocated parts of the heap work. For example,
malloc()
ing a very large buffer, then usingread()
to fill it must work.
- Run
make submit
to create an archive and upload the result to Collab.
Testing
-
dumppt.c is a sample program that calls
dumppagetable()
. If supplied no arguments, it calls it on its own pid, otherwise, you can pass a pid to it. -
Since the xv6 shell uses
fork
andsbrk
and you might break the implementation of these, it may be very difficult to debug when you break one of these. One strategy for debugging is to runxv6
with a test program acting asinit
(the first program the xv6 kernel runs). We have supplied some suitable test programs that work this way. Each of them uses thispagingtestlib.h
header file which contains functions frompagingtest.c
above and an additionalsetup()
function that duplicates code frominit
needed to make outputting the console work correctly:alloc_small.c
alloc_large.c
copy_on_write_minimal.c
copy_on_write_minimal_parallel.c
copy_on_write_largeplus_alloc.c
To run these files as
init
, make sure you have a copy of ourxv6
repository where you rangit pull
or cloned it after 11 March 2019, put them in yourxv6
directory and instead of runningmake qemu
ormake qemu-nox
run something likemake qemu-nox FS=fs-alloc_small-as-init.img
This will create a virtual disk
fs-alloc_small-as-init.img
which is a copy of the xv6 filesystem, but withinit
program replaced byalloc_small
, then boot xv6 using this virtual disk instead of the normalfs.img
virtual disk.(When these tests work they should print a message like
TEST:PASS:
before exiting. Messages about a Makefile recipe failing are spurious if xv6 is actually run and prints theTEST:PASS:
message.)If you get an error from make about
temp-_alloc_small
or something similar existing, you can remove this temporary directory and try again.When debugging, I would suggest using
cprintf()
in the kernel to print out what is happpening (e.g., what page table entries you are changing for what pid, when page faults happen, etc.). -
We have supplied a program called pagingtest.c as an example of how you can verify that your implementation works. This test is intended to be run to test a complete implementation; we do not recommend using (at least without modification) it to test implementations you know to be partial. We will run other and/or additional tests when grading your submission, so this is not a substitute for doing additional testing.
This includes tests for both allocate on demand and copy-on-write. The tests do a mix of things that should work regardless of whether you’ve implemented these features, and things which will run out of memory unless you’ve implemented allocate-on-demand and/or copy-on-write. When
fork()
ing fails (such as due to running out of memory), some of the copy-on-write tests may print a message about this a hang rather than printing a clear failure message. (Yes, this is probably a bug.)If you get panics about
log_write outside of trans
, see note at top of the writeup.
Hints
xv6 book reading
- Chapter 2 of the xv6 book describes how xv6 manages its page table structures.
Note on flushing the TLB
-
You will need to flush the TLB after a changing any valid page table entries that might be cached in the TLB. One way to do this is by reloading the page table base register CR3 using:
lcr3(V2P(myproc()->pgdir));
(It would be better to use an instruction that only flushes the TLB entries for a particular page, like
invlpg
, but xv6 doesn’t have a convenient way to run that.) -
Not flushing the TLB is one cause of particular werid behavior with copy-on-write, where the TLB (strictly speaking, the virtual TLB qemu implements) may cache a page table entry marked as writeable rather than using the read-only version.
Handy xv6 functions/code snippets
-
walkpgdir
takes a page directory (first-level page table) and returns a pointer to the page table entry for a particular virtual address. It optionally will allocate any needed second-level page tables.You can find a page directory to pass to
walkpgdir
with something likemyproc()->pgdir
.On an error, like failing to allocate a second-level page table,
walkpgdir
return the address 0x0 converted to apte_t*
. You should check for this address explicitly. In particular, accessing address0x0
will not crash, but instead will read memory from the code of the currently running program. Trying to interpret this memory as a page table entry is likely to lead to surprising results. -
mappages
takes a page directory (first-level page table), a virtual address, a physical address, and a size, and modifies that page directory so size bytes of virtual memory starting the specified virtual address point to the specified physical address. It will panic if the specified virtual address is already mapped to a physical address.If you pass mappages a range of address that spans multiple pages, it will remap multiple pages. For example,
mappages(pgdir, 0x1800, PGSIZE, ...)
will try to remap virtual page number 1 (virtual addresses 0x1000 through 0x1FFF) and virtual page number 2 (virtual addresses 0x2000 through 0x2FFF) since they both overlap with the range 0x1800 through 0x1800+PGSIZE = (0x2800). To avoid this problem, you can usePGROUNDDOWN()
to convert an address in the middle of a page to an address at the beginning of the page. (For example,PGROUNDDOWN(0x1800)
is 0x1000.) -
P2V
converts from a physical address to a virtual address in the kernel’s memory. The xv6 kernel makes sure that every page table maps (for the kernel’s use only) virtual address0x80000000+x
to physical addressx
, andP2V
uses these addresses (that is, it adds0x80000000
).V2P
does the opposite mapping. -
kalloc
andkfree
allocate or free a physical page. They both return pointers to the virtual address of the page in the kernel’s memory. -
You can check if a page table entry pointed to by a
pte_t *pte
is present with*pte & PTE_P
. Similarly, you can check if it is writeable with*pte & PTE_W
. You can set that page table entry to point to physical pagemypage
as writeable, present, and accessible in usermode using*pte = mypage | PTE_P | PTE_W | PTE_U;
. You can also call the utility functionmappages
to do this, provided the page is not already marked as present.
On NULL pointers
-
NULL, which is address
0
, is a valid memory address in on xv6 and usually contains some of the machine code for the currently running program. However, some xv6 routines return NULL to indicate errors, likewalkpgdir()
andkalloc()
. Be careful to always check for these routines returning NULL, because otherwise you’re only likely to know there’s a problem because of very weird behavior. -
Compilers generally assume that if you can successful access a pointer that it is not NULL, despite situations like in
xv6
where NULL is a valid pointer. This means, for example, that many compilers will optimize:int *ptr = get_pointer_or_null(); int value = *ptr; if (ptr != NULL) { do_something_with(value); } else { panic(); }
in a way that will never call
panic()
, no matter whatget_pointer_or_null
returns.To avoid this issue, make sure any checks for pointers being NULL occur before you use a potentially NULL pointer.
Handling/avoiding kernel page faults
-
For debugging, I strongly recommend having code to print out a message and/or panicing when a page fault occurs within your page fault handling code. Otherwise, it is extremely difficult to figure out what’s going on when a memory error occurs.
The default handler code has a check
(tf->cs & 3) == 0
to detect exceptions that occur in kernel mode, and triggers a panic in this case. You could use this to detect whether a page fault occurs in kernel mode for debugging purposes. Note, however, that this would be normal for a system calls likeread()
that’s reading into allocate-on-demand mmeory -
xv6 supports nested exception handlers, but you must be careful to not exhaust the single kernel stack, which must have enough space to handle both a system call and a page fault that occurs within the system call.
When an exception occurs in kernel mode, the processor recognizes that it’s running in kernel mode already and doesn’t change the stack pointer to the top of the kernel stack, unlike an exception that occurs in user mode. This means that a page fault that occurs in the middle of a system call won’t overwrite information on the stack used by the system call.
(In contrast, when an exception occurs in user mode, the processor switches to the kernel-mode stack specified in the “Task State Segment”. This is set by
switchuvm()
.) -
If a page fault triggers a second page fault that triggers a third fault, the processor reboots the machine. This is also known as a triple fault.
On synchronization
-
When implementing copy-on-write, there is a potential for synchronization issues when modifying information about pages (like reference counts) shared between different processes from system calls. To avoid this, you can use a spinlock or disable interrupts when accessing such potentially shared data.
xv6 disables interrupts while handling page faults — so on a single core (which we will use), you don’t need to worry about the page fault handlers being interrupted. However, system calls are run without page faults or timer interrupts being disabled.
xv6’s guard page
- xv6 allocates a guard page below the user stack to catch stack overflows. This is the only page in the user’s memory which is marked as present but not user-accessible in the stock version of xv6.
Allocating pages on demand
-
Currently, when a program tries to allocate more heap space it calls the
sbrk()
system call, which callsgrowproc()
. This sets thesz
variable to track the size of the allocation and usesallocuvm()
. to create new page table entries. You can modifygrowproc()
to still set thesz
variable but avoid actually creating new page tables right away. With your modification,growproc()
will no longer need to create new page table entries. Then you’d have make your page fault handler handle creating the new page table entries on demand that would have previously been created when new heap memory was allocated.You could potentially modify
allocuvm()
instead ofgrowproc()
, but you would need to modifyloaduvm()
to handle loading data into a page table entry that is not yet allocated. (loaduvm()
is used to implementexec()
in xv6, for loading data from an executable into memory when the program starts.) -
The xv6 book chapter on trap handling is useful.
-
You can modify the functions in
trap.c
to detect when a page marked invalid in the page table is accessed. -
traps.h
defines the constantT_PGFLT
, which is the code x86 uses to identify page faults. This will be placed in thetrapno
member variable of thetrapframe
. -
When a page fault occurs in x86, the processor sets control register 2 (CR2) to the address the program was attempting to access. You can read this in xv6 using
rcr2()
. -
You need to make sure that your allocate-on-demand code does not trigger for out-of-range memory access, such as attempts to access kernel memory from userspace or accesses to unallocated (beyond end of heap) memory.
-
xv6 contains page allocation and deallocation functions in
kalloc.c
, which arekalloc()
(allocates a page) andkfree()
(deallocates a page). -
You should use code similar to what’s in
allocuvm()
to set each page table entry. But you should not callallocuvm()
since it allocates all pages up to the maximum. If a program accesses address 0x400000, then only the page at address 0x400000 should be created, not all the pages between 0x0 and 0x400000. -
To make
fork()
work correctly,copyuvm()
should only copy pages that are actually allocated. Later on, you will modify this function more to implement copy-on-write.
Adding copy-on-write support
- For copy-on-write, you will essentially be changing:
- the code that copies the pages on fork() in
copyuvm()
to only mark the pages read-only (remember to flush the TLB!), and - moving the copying that was previously done by
copyuvm()
to the page fault handler, which will be run whenever the process attepmts to write to a read-only page. - tracking how many times pages are referenced in order to delete pages only when they are no longer being used
- the code that copies the pages on fork() in
-
When a program attempts to write to a read-only page, it will trigger a fault where the
trapno
member variable oftrapframe
isT_PGFLT
, just like it does for virtual pages which are marked as not present.Most simply, you can distinguish it from a “normal” page fault by looking up the page table entry for the faulting address (the faulting address is available by calling
rcr2()
) and checking if that page table entry is marked as present (valid) but not writable. (It also posssible to determine that a page fault was caused due to a page table entry not being marked writeable through the value oftf->err
, see the Intel manuals, volume 3A, page 6-40, section “Exception Error Code”.)Fortunately, xv6 has no other reason it marks user pages as read-only, so you can always assume that a page fault for a present, read-only user page is because of your copy-on-write scenario. (Even pages that only contain machine code are still loaded as writeable.) Alternately, you could add extra entries to
struct proc
to track where a process’s writeable memory is and look this up when a page fault occurs. -
The xv6 function
memmove
is handy for copying the contents of pages. -
When changing a valid page table entry to another valid page table entry, you may need to clear the TLB (as in the
lcr3()
code snippet above). -
deallocuvm
is called to deallocate user pages from a page table. Currently, it unconditional callkfree
on each page. You should change it to not deallocate pages that other processes have a reference to. -
You will need to track how many times each physical page that might be used for copy-on-write is used, commonly known as its “reference count”. While one could scan all pointers to pages to determine the reference count, this would be extremely slow, so I would recommend tracking the reference count of each page as you edit page tables and/or allocate pages and/or deallocate pages and/or etc.
A simple way to do this is an array like:
unsigned char cow_reference_count[PHYSTOP / PGSIZE];
Then, use the physical address of the page divided by
PGSIZE
(that is, the physical page number) to access the appropriate element of the array.(
PHYSTOP
is the maximum physical memory address xv6 supports.)If you want to store extra information for each physical page, you can use a similar approach but with an array of structs.
- As part of tracking reference counts,
you need to decide which pages you’re going to track the reference count for and what types of references you will count. There are
multiple options which will work, but it is very important to be consistent:
- you could tracking reference counts for each page allocated with
kalloc()
, regardless of whether it is referenced by the user part of a process’s page table or for the kernel’s use - you could track reference counts for pages allocated each time it is placed in a page table
- you could track reference counts only for pages which are marked as read-only for copy-on-write
Which option you choose will change how you initialize, increment, and decrement reference counts. For example:
- for the first two options, you need to set reference counts for pages not yet involved in copy-on-write, which will involve modifying non-copy-on-write-related code; and
- for the last option, you need to make sure you don’t check the (presumably uninitialized) reference counts of non-copy-on-write pages, and therefore accidentally deallocate or leak memory for these pages
- you could tracking reference counts for each page allocated with
On debugging
-
Refer to the notes from the xv6intro assignment on debugging. This includes information about interpreting panic messages.
-
It can be useful to deliberately disable deallocation based on reference counts to determine whether that is an
Reminders about bitwise operators (based on common mistakes)
-
!(foo & CONSTANT)
and(foo & ~CONSTANT)
have different values; make sure you are choosing the appropriate one for your situation. -
foo & CONSTANT_A & CONSTANT_B
does not have the same value as(foo & CONSTANT_A) && (foo & CONSTANT_B)
.