Contents
- 1 Your Task
- 2 Testing
-
3 Hints
- 3.1 xv6 book reading
- 3.2 Note on flushing the TLB
- 3.3 Handy xv6 functions
- 3.4 Identifying free pages
- 3.5 Checking and editing page table entries
- 3.6 Allocating pages on demand
- 3.7 Adding copy-on-write support
- 3.8 On debugging
- 3.9 xv6’s guard page
- 3.10 Examples of page table dumps from tests run with -dump
- 3.11 Handling/avoiding kernel page faults
Your Task
Changelog:
- 23 March 2022: update tests to be more complete, allow testing of guard page accesses, allow large tests to skip certain checks for performance
- 23 March 2022: update pagingtestlib.h to fix problem reading arguments in run-as-init mode
- 23 March 2022: update pp_suite to choose more complete/performant set of tests; fix issue with -fork flag pagingtestlib.h’s alloc test not always working correctly
- 23 March 2022: add some description of expected memory layout that can be observed with
ptetool
- 24 March 2022: update pp_suite command list to catch more errors early
- 24 March 2022: mention
switchuvm()
as a way to flush the TLB in addition tolcr3(...)
- 24 March 2022: add more description of physical memory layout
- 25 March 2022: update
ptetool
to include0x
when outputting hexadecimal numbers from isfree - 26 March 2022: update ptetool.c to fix bug in hextoi
- 27 March 2022: mention freelist being stored in freed pages in hints
- 30 March 2022: mention workaround for assertion failure in mkfs.c caused by too-big executables
- 30 March 2022: add note about using
pp_test
to test forking empty page tables - 31 March 2022: fix bug in
pagingtestlib.h
that prevented-fork-after-alloc
option from working - 31 March 2022: add note about “communicating via pipe failed” messages
- 31 March 2022: update pagingtestlib.h so pp_test alloc handles -read-start being greater than -read-end correctly
- 1 April 2022: update pagingtestlib.h: fix behavior in -as-init mode where last argument was ignored in some cases
- 4 April 2022: change “only copy pages that are actually allocated” to “handle (and not copy) pages that are not allocated”
- 8 April 2022: note how
deallocuvm
is called on process exit
-
(required for checkpoint) Add a new system call
int getpagetableentry(int pid, int address)
that returns the last-level page table entry forpid
at virtual addressaddress
, or0
if there is no such page table entry. -
(required for checkpoint) Add a new system call
int isphysicalpagefree(int ppn)
that returns a true value if physical page numberppn
is on the free list managed bykalloc.c
and a false value (0) otherwise. - (required for checkpoint) Add a new system call
int dumppagetable(int pid)
that outputs the page table of the process with pidpid
to the console (like withcprintf()
) as follows:- Only last-level page table entries should be shown;
- Only page table entries for the user part of memory (bytes 0 through
p->sz
wherep
is the correspondingstruct proc*
) should be output; - Output a line starting with
START PAGE TABLE
before the page table entry information, and one starting withEND PAGE TABLE
immediately after. You may, if you choose, put additional text afterwards on these lines; - Output a line for each page table entry, with fields seperated by whitespace, in order from lowest virtual address to highest (you may use any kind of ASCII whitespace, even if it (like tabs) doesn’t display correctly in qemu’s graphical console);
- Optionally, output a header line describing the fields.
- For each present page table entry, output a series of space-seperated fields
in this order:
- the virtual page number used to lookup the page table entry in hexadecimal or the virtual address corresponding to the first byte of that virtual page number in hexadecimal (your choice which)
- the fields output by the
dump_pte()
function in suppliedptetool.c
(whose code you may copy, modifying it to usecprintf()
):- the text ‘P’
- the text ‘U’ if the page is marked as user-mode accessible and ‘-‘ otherwise
- the text ‘W’ if the page is marked as writable and ‘-‘ otherwise
- the physical page number contained in the page table entry in hexadecimal or the physical address of the first byte of that physical page in hexadecimal (your choice which)
- any number of fields of your choosing
- For non-present pages, you may either omit them entirely from the output or output the following
fields in this order:
- the virtual page number in hexadecimal or address in hexadecimal (your choice which)
- the text ‘-‘ (instead of ‘P’)
- any other fields of your choice (for example, information that will help you debug the later steps)
For example, one possible output might look like:
START PAGE TABLE (pid 54) 0 P U W 8a 1 P U W 8b 2 P - W 8c 3 P U W 8d 4 P U W 89 END PAGE TABLE
indicating that a process has five pages allocated, with virtual page numbers 0 through 4, to various physical page numbers in the range
0x89
through0x8d
, and virtual page 2 is marked as non-user-accessible.We do not care what your system call does if the
pid
supplied is invalid or ifpid
supplied has its page table modified or freed while the system call occurs.When successful, your system call should return
0
. -
(required for checkpoint) xv6 currently allocates stack and heap memory immediately. More commonly, OSs will allocate this memory on demand, saving memory when not all of the stack or heap is used immediately. Modify xv6 to allocate heap memory based on page faults rather than immediately. It is okay if non-heap memory is still allocated statically.
Your automatic allocation scheme:
- Should initialize all newly allocated memory to zeroes, like happens with a stock version of xv6.
- Should kill a process when it attempts to access memory outside of its allocation, including memory beyond the end of heap ( except that heap allocations should be made in whole pages, so it’s okay to allow processes can access slightly beyond the end of their heap if the end of their heap is in the middle of page) or kernel memory, like happens with a stock version of xv6.
- Should kill a process when it attempts to access memory requiring an allocation on demand and no more memory can be allocated. Make sure a message is printed to the console when this happens.
- Should never make the “guard page” xv6 allocates below the user stack accessible in user mode.
- Must make system calls that attempt to read or write to not-yet-allocated parts of the heap work. For example,
malloc()
ing a very large buffer, then usingread()
to fill it must work. We do not care, however, what happens if the process runs out of memory during such a system call and must be killed. (For the most part, this should just require treating page faults triggered by the kernel the same as page faults triggered from user mode.) - Should not leak memory.
For suggestions on how to accomplish this, see the “allocating pages on demand” section of the hints below.
- Add copy-on-write support for xv6’s
fork()
system call. xv6 currently makes a copy of each page of a process when it forks. Instead, you should not copy the page and instead mark each page as read-only. Then, when a protection fault happens, actually make a copy of the page, update the corresponding page table entry, and mark it as writeable. Your copy-on-write scheme:- Should kill a process when it attempts to write to memory, but there is not enough memory to allocate a copy-on-write page. Make sure a message is printed to the console when it happens.
- Should not leak memory.
- Should never make the “guard page” xv6 allocates below the user stack accessible.
- Must make system calls that attempt to read or write to not-yet-allocated parts of the heap work. For example,
malloc()
ing a very large buffer, then usingread()
to fill it must work. We do not care, however, what happens if the process runs out of memory during such a system call. (For the most part, this should just require treating page faults triggered by the kernel the same as page faults triggered from user mode.)
For suggestions on how to accomplish this, see the “adding copy-on-write support” section of the hints below.
- Run
make submit
to create an archive and upload the result to the submission site.
Testing
system call test program
If you downloaded
ptetool.c
on/before 26 March 2022, your version may have a bug in how it processes hexadecimal digits A-F.
ptetool.c [last updated 26 March 2022] is a a program that
calls the getpagetableentry
, isphysicalpagefree
and dumppagetable
system calls.
If you run it with no arguments, it shows this usage message:
ptetool pte PID VA show what getpagetableentry() entries for the contents of the (last-level) page table entry for pid PID (in decimal) and virtual address VA (in hexadecimal) ptetool dump PID call dumppagetable(PID). PID is specified in decimal ptetool isfree PPN call isphysicalpagefree(PPN). PPN is specified in hexadecimal
To help determine a PID to specify,
the xv6 kernel shows a list of active programs when you type control-P. Usually,
pid 1 is init
, and pid 2 is the shell (sh
). Additional pids are assigned sequentially.
expected xv6 memory layout
Note that xv6 processes generally have a layout where the lowest addreses (starting
with address 0) are assigned to code and program data (which will be
marked as present and user accessible), then there is a guard page
(which will be marked as present, and not user accessible), then there is
a stack and one or more pages of heap (which will be marked as present and user accessible).
You should be able to observe this using pte
and dump
if they are working correctly.
Also, the kernel part of xv6 memory will have virtual page 0x80000 (virtual addresses 0x80000000 through 0x80000FFF)
mapped to physical page 0x0
0x80001 to physical page 0x1 and so on, which you should be able to observe using the pte
command if it is working correctly.
In physical memory, the xv6 loads the kernel code and data at physical address 0x100000 (physical page number 0x100; constant EXTMEM
in memlayout.h
),
and the memory between the kernel code and data and physical address 0xE000000 (physical page number 0xE000, constant PHYSTOP
in memlayout.h
)
is managed by kalloc()
and kfree()
. (The memory managed by kalloc()/kfree()
is what is used to allocate pages for
user program data, page tables, kernel stacks, etc.)
tests for allocate-on-demand and copy-on-write
pp_test
tool
pagingtestlib.h
was updated around 7pm 31 March 2022 to fixpp_test
unintentionally discarding the-fork-after-alloc
option, and later to fix more minor issues (see changelog at top)
pp_test.c,
which requires this pagingtestlib.h
header file [last updated 1 April 2022],
will use the getpagetableentry
and isphysicalpagefree
system calls to check that
your allocate-on-demand and copy-on-write implementations appear to be behaving correctly.
pp_test -help
shows information about what kinds of test it supports, and
pp_test TEST-TYPE -help
shows information about options that test type supports.
One place you can start to look for specific commands to use is the list of commands done by pp_suite
, described below.
assertion failure in mkfs?
If you get an error like mkfs: mkfs.c:279: iappend: Assertion `fbn < (12 + (512 / sizeof(unit)))` failed
after adding pp_test.c
or pp_alloc.c
to the Makefile, this means that your compiler
made pp_test
or pp_alloc
or some other executable too big for xv6’s filesystem to support.
Commonly this is because the compiler generated a lot of debug information. Try editing the Makefile
rule that starts _%: %.o $(ULIB)
to add the command strip $@
; when this is done that
Makefile rule should read:
_%: %.o $(ULIB)
$(LD) $(LDFLAGS) -T program.ld --gc-sections -o $@ $^ $(LIBGCC_A)
$(OBJDUMP) -S $@ > $*.asm
$(OBJDUMP) -t $@ | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $*.sym
strip $@
if you see “error communicating … via pipe”
pp_test
internally often forks child processes and checks things in both the parent and child processes.
When a check fails, sometimes, the parent or child process will terminate abnormally after printing a message.
In this case, you may also see some message like “communicating with child process via pipe” in addition
to the test failure.
If you see such messages without a test failure, then probably something else happened that prevented the parent or child process from running properly.
running tests when the shell doesn’t work
Make sure you have a version of
pagingtestlib.h
from 1 April or later; arguments are dropped in some cases in prior versions.
Since the xv6 shell uses fork
and sbrk
, and you might break the implementation
of these when developing your allocate-on-demand or copy-on-write functionality,
it may be helpful to run tests without the shell. To support this,
our test programs support being run as the init
(first processs run), by doing
something like
make qemu-nox FS=fs-pp_test-as-init.img
where pp_test
matches the name of the pp_test.c
file you saved in your
xv6 directory.
(You can also use this technique for pp_suite
below.)
This will create a virtual disk fs-pp_test-as-init.img
which is a copy of the xv6 filesystem, but with init
program
replaced by pp_test
, then boot xv6 using this virtual disk instead of the normal fs.img
virtual disk.
dumping page tables for debugging
The pp_test
’s alloc
and cow
subcommands support the -dump
option, which will call
dumppagetable()
at several points during the text. This can be helpful for debugging
what your implementation is doing. We have some example outputs with brief explanations
near the end of the hints section.
a test suite
pp_suite.c [last updated 24 March 2022] (which
requires the same pagingtestlib.h file used by pp_test.c
)
is a set of tests that is relatively comprehensive. It is equivalent to running
several pp_test
commands; running pp_suite -list
shows which ones:
pp_test exec
pp_test oob -offset=0x1000
pp_test oob -offset=0x80000000 -write
pp_test oob -guard -offset=0x800
pp_test oob -guard -offset=0x800 -write
pp_test alloc -size=0x1000 -read-start=0x40 -read-end=0x100 -write-start=0x0 -write-end=0x0
pp_test alloc -size=0x2000 -read-start=0x40 -read-end=0x100 -write-start=0x0 -write-end=0x0
pp_test alloc -size=0x1000 -read-start=0x40 -read-end=0x100 -write-start=0x0 -write-end=0x0 -fork-after-alloc
pp_test alloc -size=0x1000 -read-start=0x0 -read-end=0x0 -write-start=0x40 -write-end=0x100
pp_test alloc -size=0x3000 -read-start=0x1040 -read-end=0x1080 -write-start=0x2010 -write-end=0x2020
pp_test alloc -size=0x500000 -read-start=0x402000 -read-end=0x404000 -write-start=0x404000 -write-end=0x406000 -fork -fork-after-alloc
pp_test alloc -size=0x50000000 -read-start=0x0 -read-end=0x0 -write-start=0x40000000 -write-end=0x49000000 -skip-free-check -skip-pte-check -fork
pp_test cow -forks=1 -size=0x1000 -parent-never -no-child-write -start=0x80 -end=0x100
pp_test cow -forks=1 -size=0x1000 -parent-first -no-child-write -start=0x80 -end=0x100
pp_test cow -forks=1 -size=0x1000 -parent-first -start=0x80 -end=0x100
pp_test cow -forks=1 -size=0x1000 -parent-last -start=0x80 -end=0x100
pp_test cow -forks=2 -size=0x4000 -parent-middle -start=0x2020 -end=0x3040
pp_test cow -size=0x500000 -forks=1 -parent-last -start=0x402040 -end=0x40a000 -pre-fork
pp_test cow -size=0x60000000 -forks=3 -child-write-except=1 -start=0x5f080000 -end=0x5f0f0000 -pre-fork
pp_test cow -size=0x4000000 -forks=4 -parent-middle -child-write-only=2 -start=0x0000000 -end=0x3FFFFFF -skip-free-check -skip-pte-check -pre-fork
pp_test alloc -size=0x1000 -read-start=0x0 -read-end=0x0 -write-start=0x40 -write-end=0x41 -use-sys-read
pp_test cow -forks=1 -size=0x1000 -parent-never -start=0x80 -end=0x81 -use-sys-read-child
This test is intended to be run to test a complete implementation; we do not recommend using (at least without modification) it to test implementations you know to be partial. Note that we may run other and/or additional tests when grading your submission.
on testing forking with unallocated memory
[note added 30 March:] After releasing the test suite, I noticed that although it tests forking with unallocated pages, it only does so on larger tests, which may make it more difficult to debug if there’s a problem with how you handle this issue. If you want simpler tests for this, you might try something like:
pp_test alloc -size=0x2000 -read-start=0x1040 -read-end=0x1100 -write-start=0x0 -write-end=0x0 -fork-after-alloc
or
pp_test alloc -size=0x1000000 -read-start=0x800040 -read-end=0x800080 -fork-after-alloc
Hints
xv6 book reading
- Chapter 2 of the xv6 book describes how xv6 manages its page table structures.
Note on flushing the TLB
-
You will need to flush the TLB after a changing any valid page table entries that might be cached in the TLB. One way to do this is by reloading the page table base register CR3 using:
switchuvm(myproc())
or
lcr3(V2P(myproc()->pgdir));
(It would be better to use an instruction that only flushes the TLB entries for a particular page, like
invlpg
, but xv6 doesn’t have a convenient way to run that.) -
Not flushing the TLB is one cause of weird behavior with copy-on-write, where the TLB (strictly speaking, the virtual TLB qemu implements) may cache a page table entry marked as writeable rather than using the read-only version.
Handy xv6 functions
walkpgdir
(finding page table entries)
-
walkpgdir
takes a page directory (first-level page table) and returns a pointer to the page table entry for a particular virtual address. It optionally will allocate any needed second-level page tables. -
walkpgdir
is declared asstatic
invm.c
and is not declared in a header file likedefs.h
, so it can’t be used from other files. You are welcome to change this if you think that would be convenient. -
You can find a page directory to pass to
walkpgdir
with something likemyproc()->pgdir
. -
Make sure you check for
walkpgdir
returning NULL. Do not rely on a crash occuring.On an error, like failing to allocate a second-level page table,
walkpgdir
return the address 0x0 (NULL
) converted to apte_t*
.Accessing address
0x0
will not crash, but instead will read memory from the code of the currently running program. Trying to interpret this memory as a page table entry is likely to lead to surprising results.
mappages
and other ways to set page table entries
-
In my reference solution, I set page table entries by using the pointer returned by
walkpgdir
rather than usingmappages
. (But most students have not done this in their implementations.) -
The version of
mappages
supplied with xv6 will panic (with the messagepanic: remap
) if the specified virtual address is already mapped to a physical address. This is because in unmodified xv6, once a virtual address has a valid mapping, that mapping is never changed. If you don’t want this behavior, then you would need to modifymappages
. -
mappages
takes a page directory (first-level page table), a virtual address, a physical address, and a size, and modifies that page directory so size bytes of virtual memory starting the specified virtual address point to the specified physical address. -
If you pass mappages a range of address that spans multiple pages, it will remap multiple pages. For example,
mappages(pgdir, 0x1800, PGSIZE, ...)
will try to remap virtual page number 1 (virtual addresses 0x1000 through 0x1FFF) and virtual page number 2 (virtual addresses 0x2000 through 0x2FFF) since they both overlap with the range 0x1800 through 0x1800+PGSIZE = (0x2800). To avoid this problem, you can usePGROUNDDOWN()
to convert an address in the middle of a page to an address at the beginning of the page. (For example,PGROUNDDOWN(0x1800)
is 0x1000.)
P2V
and V2P
(converting physical to kernel virtual addresses and vice-versa)
P2V
converts from a physical address to a virtual address in the kernel’s memory. The xv6 kernel makes sure that every page table maps (for the kernel’s use only) virtual address0x80000000+x
to physical addressx
, andP2V
uses these addresses (that is, it adds0x80000000
).V2P
does the opposite mapping.
kalloc
and kfree
: memory allocation
kalloc
andkfree
allocate or free a physical page. They both return pointers to the virtual address of the page in the kernel’s memory.
Identifying free pages
-
kfree
inkalloc.c
maintains a linked list of free pages, whosehead
is stored inkmem.freelist
. -
The linked list nodes are stored in the free pages themselves.
-
Items on the linked list are kernel virtual addresses, as you might get by using
P2V
on a physical byte address. Note that the argument toisphysicalpagefree()
is a physical page number rather than a physical byte address.
Checking and editing page table entries
-
You can check if a page table entry pointed to by a
pte_t *pte
is present with*pte & PTE_P
. Similarly, you can check if it is writeable with*pte & PTE_W
.If you want to check if a page table entry is present and writeable use something like
(*pte & PTE_P) && (*pte & PTE_W)
. ((*pte & PTE_P & PTE_W)
has a very different effect.) -
If you want to check if a page table entry is not present, you can use something like
!(*pte & PTE_P)
. ((*pte & !PTE_P)
and(*pte & ~PTE_P)
have very different effects.) -
You can set that page table entry to point to physical page
mypage
as writeable, present, and accessible in usermode using*pte = mypage | PTE_P | PTE_W | PTE_U;
. -
If you want to clear the PTE_W bit from a page table entry pointed to by
pte
, you could use something like*pte &= ~ PTE_W;
Allocating pages on demand
-
Currently, when a program tries to allocate more heap space it calls the
sbrk()
system call, which callsgrowproc()
. This sets thesz
variable to track the size of the allocation and usesallocuvm()
. to create new page table entries. You can modifygrowproc()
to still set thesz
variable but avoid actually creating new page tables right away. With your modification,growproc()
will no longer need to create new page table entries. Then you’d have make your page fault handler handle creating the new page table entries on demand that would have previously been created when new heap memory was allocated.You could potentially modify
allocuvm()
instead ofgrowproc()
, but you would need to modifyloaduvm()
to handle loading data into a page table entry that is not yet allocated. (loaduvm()
is used to implementexec()
in xv6, for loading data from an executable into memory when the program starts.)
Adding a page fault handler
-
When a page fault occurs, the
trap()
function in traps.c is called. Currently, the default case in theswitch
statement is reached, which crashes the current program (or panics, if the page fault occurred in kernel mode). -
traps.h
defines the constantT_PGFLT
, which is the code x86 uses to identify page faults. This will be placed in thetrapno
member variable of thetrapframe
. -
When a page fault occurs in x86, the processor sets control register 2 (CR2) to the address the program was attempting to access. You can read this in xv6 using
rcr2()
. -
You need to make sure that your allocate-on-demand code does not trigger for out-of-range memory access, such as attempts to access kernel memory from userspace or accesses to unallocated (beyond end of heap) memory.
-
You should use code similar to what’s in
allocuvm()
to set each page table entry. But you should not actually callallocuvm()
since it allocates all pages up to the maximum. If a program accesses address 0x400000, then only the page at address 0x400000 should be created, not all the pages between 0x0 and 0x400000. -
To make
fork()
work correctly,copyuvm()
should handle (and not copy) pages that are not allocated. Later on, you will probably modify this function more to implement copy-on-write.
Adding copy-on-write support
- For copy-on-write, you should:
- change the code that copies the pages on fork() in
copyuvm()
to instead map each virtual page in the child process to the same physical page as the parent process and to mark the pages read-only in both processes, - moving the copying (and setting of a new page table entry) that was previously done by
copyuvm()
to the page fault handler, which will be run whenever the process attepmts to write to a read-only page, and - tracking how many times pages are referenced (“reference counting”) in order to delete pages only when they are no longer being used
- change the code that copies the pages on fork() in
-
You will need to flush the TLB whenever you change a valid page table entry that the processor may have cached. This most likely will be in both your
copyuvm()
implementation and your page fault handler. - Before you implement reference counting (or are sure of your reference counting implementation),
I recommend testing your code with
freeing (e.g. in
deallocuvm
) disabled. This will leak memory – which you eventually need to fix – but will allow you test the rest of your copy-on-write logic without worrying as much about your reference counting implementation.
Distinguishing copy-on-write page faults from others
-
In your copy of xv6, the only reason for a user page to be marked as a read-only is because it is involved in copy-on-write. (In a more featureful OS, there would be other types of read-only pages to consider, but xv6 does not presently have any other sort of read-only pages, even for code.)
So, you can use one of two ways of detecting a page fault was triggered by a page being read-only:
-
Looking up the page table entry for the page that triggered the fault and is marked as present but not writable. You can use
rcr2()
to determine the faulting address, and then usewalkpgdir()
to find the corresponding page table entry (if it exists). -
Using the value of the processor’s “error code” (stored in
tf->err
) which will have a particular bit set if the page fault was caused by a page table entry not being writable. You can lookup the fields in this error code in the Intel manuals, volume 3A, page 6-40, section “Exception Error Code”.)
-
-
If one were to support having some memory read-only for reasons other than copy-on-write, one would need to add additional information to
struct proc
indicating where writable and truly read-only pages were located. (For example, it might have a table indicating where code is loaded.)
Copying data
- The xv6 function
memmove
is handy for copying the contents of pages.
Where pages are deallocated
-
deallocuvm
is called to deallocate user pages from a page table. Currently, it unconditionally callskfree
on each page. If pages are shared between processes, this may incorrectly free a page that other processes have a reference (via a page table entry) to. -
To free pages when a process exits,
wait
(eqiuvalent ofwaitpid
) callsfreevm
which callsdeallocuvm
. -
If you make a copy of a page from the page fault handler, and, after you make this copy, there are no references to the original page, you would need to free the original page. Alternately, you could avoid making the copy in this case.
Counting references
-
You will need to track how many times each physical page that might be used for copy-on-write is used, commonly known as its “reference count”. While one could scan all pointers to pages to determine the reference count, this would be extremely slow, so I would recommend tracking the reference count of each page as you edit page tables and/or allocate pages and/or deallocate pages and/or etc.
A simple way to do this is an array like:
unsigned char cow_reference_count[PHYSTOP / PGSIZE];
Then, use the physical address of the page divided by
PGSIZE
(that is, the physical page number) to access the appropriate element of the array.(
PHYSTOP
is the maximum physical memory address xv6 supports.)If you want to store extra information for each physical page, you can use a similar approach but with an array of structs.
-
To aid in debugging reference count tracking, I would strongly recommend printing out reference counts from your
dumppagetable
functions. (You can add extra columns to the output.) - As part of tracking reference counts,
you need to decide which pages you’re going to track the reference count for and what types of references you will count. There are
multiple options which will work, but it is very important to be consistent:
- you could track reference counts for each page allocated with
kalloc()
- you could track reference counts for pages each time they are placed in a the user part of page table
- you could track reference counts only for pages which are marked as read-only for copy-on-write
Which option you choose will change how you initialize, increment, and decrement reference counts. For example:
- for the first two options, you need to change reference counts for pages not (yet) involved in copy-on-write, which will involve modifying non-copy-on-write-related code; and
- for the last option, you need to make sure you don’t check the (presumably uninitialized) reference counts of non-copy-on-write pages, and therefore accidentally deallocate or leak memory for these pages
- you could track reference counts for each page allocated with
-
I strongly recommend having explicit checks that:
- you do not try to decrement a reference count that is already 0. (Note that checking if an
unsigned
reference count is negative after decrementing it will not do this.) - the initial value of a “new” page’s reference count is what you expect
- the value of reference counts in
copyuvm()
is in the range you expect
- you do not try to decrement a reference count that is already 0. (Note that checking if an
On synchronization
-
When implementing copy-on-write, there is a potential for synchronization issues when modifying information about pages (like reference counts) shared between different processes from system calls. To avoid this, you can use a spinlock or disable interrupts when accessing such potentially shared data. However, it is extremely rare that I have seen this matter in our tests (and we don’t intend to take off credit if you are missing synchronization in your implementation).
xv6 disables interrupts while handling page faults — so on a single core (which we will use), you don’t need to worry about the page fault handlers being interrupted. However, system calls are run without page faults or timer interrupts being disabled.
Note on how kalloc works at boot
-
If you modify kalloc and kfree as part of your reference counting implementation, note that
kfree
is used at boot to setup the free list thatkalloc
reads from.In particular,
kfree
is called from the functionskinit1
andkinit2
before any calls tokalloc
have happened. In this way, rather than writing special code to initialize the list of free pages, xv6 gets to reuse the code it has for adding a newly free’d page to the list for this purpose.
On debugging
-
Refer to the notes from the xv6intro assignment on debugging. This includes information about interpreting panic messages.
-
It can be useful to deliberately disable deallocation based on reference counts to determine whether the problem you are experiencing is due to deallocating memory when it is still in use.
-
It may be useful to call your dumppagetable() code and/or your isphysicalpagefree() code for debugging issues with allocate-on-demand or copy-on-write.
xv6’s guard page
- xv6 allocates a “guard page” below the user stack to catch stack overflows. This is the only page in the user’s memory which is marked as present but not user-accessible in the stock version of xv6.
Examples of page table dumps from tests run with -dump
pp_test alloc -dump
Example output from pp_test alloc -dump
:
$ pp_test alloc -dump
Testing allocating 0x1000 bytes of memory
and checking that its pages aren't allocated before use
and reading 0x80 bytes from offsets 0x80 through 0x100
and writing 0x100 bytes from offsets 0x200 through 0x300
>> About to call dumppagetable() for allocation-pre-allocate
START PAGE TABLE pid=6
0 P U W deea
1 P U W dee8
2 P U W dee7
3 P U W dee6
4 P U W dee5
5 P U W dee4
6 P U W dee3
7 P U W dee2
8 P - W dee1
9 P U W dee0
>> Finished call to dumppagetable() for allocation-pre-allocate
>> About to call dumppagetable() for allocation-pre-access
START PAGE TABLE pid=6
0 P U W deea
1 P U W dee8
2 P U W dee7
3 P U W dee6
4 P U W dee5
5 P U W dee4
6 P U W dee3
7 P U W dee2
8 P - W dee1
9 P U W dee0
a - - - 0
>> Finished call to dumppagetable() for allocation-pre-access
>> About to call dumppagetable() for allocation-post-access
START PAGE TABLE pid=6
0 P U W deea
1 P U W dee8
2 P U W dee7
3 P U W dee6
4 P U W dee5
5 P U W dee4
6 P U W dee3
7 P U W dee2
8 P - W dee1
9 P U W dee0
a P U W df2c
>> Finished call to dumppagetable() for allocation-post-access
In the example above, initially the test program has 10 pages (virtual page numbers 0 through 9, page 8
is the guard page, and page 9 in the stack), resulting in the page
table dump labeled allocation-pre-allocate
.
Then, after sbrk()
was called, the size of the program’s memory was increased, but nothing was added to the page
table, resulting in the page table dump labeled allocation-pre-access
.
The dumppagetable()
implementation that produced the output above choose not to omit non-present pages
from the output, so one additional line is shown compared to the first dump,
reflecting that heap size was increased for the process.
Then, the process accesses the memory in that newly allocated virtual address. This results in the final dump, labeled
allocation-post-access
. Here, a new page is made present, writable, and user-accessble,and a distinct physical
page is allocated for it.
pp_test cow -parent-never -dump
Example output:
Running copy-on-write test:
allocating 0x1000 bytes on heap
writing to bytes 0x200 through 0x400 of it
using fork() to start 1 child processes
writing to those bytes from parent process
writing to those bytes from child process 0
and checking that appropriate pages are shared/not shared
and checking that pages in child which should not be shared are freed
>> About to call dumppagetable() for copy-on-write-parent-before
START PAGE TABLE pid=3
0 P U W deea
1 P U W dee8
2 P U W dee7
3 P U W dee6
4 P U W dee5
5 P U W dee4
6 P U W dee3
7 P U W dee2
8 P - W dee1
9 P U W dee0
a P U W dfbc
>> Finished call to dumppagetable() for copy-on-write-parent-before
>> About to call dumppagetable() for copy-on-write-child-before-writes
START PAGE TABLE pid=4
0 P U - deea
1 P U - dee8
2 P U - dee7
3 P U - dee6
4 P U - dee5
5 P U - dee4
6 P U - dee3
7 P U - dee2
8 P - - dee1
9 P U W dede
a P U - dfbc
>> Finished call to dumppagetable() for copy-on-write-child-before-writes
>> About to call dumppagetable() for copy-on-write-child-after-write
START PAGE TABLE pid=4
0 P U - deea
1 P U - dee8
2 P U - dee7
3 P U - dee6
4 P U - dee5
5 P U - dee4
6 P U - dee3
7 P U - dee2
8 P - - dee1
9 P U W dede
a P U W dee0
>> Finished call to dumppagetable() for copy-on-write-child-after-write
>> About to call dumppagetable() for copy-on-write-parent-after
START PAGE TABLE pid=3
0 P U - deea
1 P U - dee8
2 P U - dee7
3 P U - dee6
4 P U - dee5
5 P U W dedd
6 P U - dee3
7 P U - dee2
8 P - - dee1
9 P U W dedf
a P U - dfbc
>> Finished call to dumppagetable() for copy-on-write-parent-after
Test successful.
This test first includes a dump in the parent before forking, labeled copy-write-parent-before
. In this dump,
one page of heap is allocated, in this case virtual page 0xA (decimal 10).
The next two dumps are from the child. The first copy-write-child-before-writes
is before the child runs anything that writes to the heap.
In this case, the memory of the child is the same as the parent was before forking except:
all pages were marked read-only in the page table to allow for copy-on-write.
After that is a dump of the child’s memory, labeled child-write-child-after-write
,
after the child writes to the heap. In the example above, we see
that virtual page 0xA, which is part of the child’s heap has been copied to a new physical page and made writable.
The physical page chosen hpapens to have the same physical page number as the parent’s stack originally occupied, but this is not
a problem since the parent’s stack was copied to a new physical page (as we can see form the dump labeled copy-on-write-parent-after
).
In addition the child’s stack (virtual page 0x9)
has been copied to a new physical page and made writable,
presumably because the child process used its stack at some point.
The final dumps show the state of the parent’s memory as the test completes. This shows that nothing changed further in the child. In the parent, we see the effects of the parent needing to write to its stack and its heap.
Handling/avoiding kernel page faults
-
For debugging, I strongly recommend having code to print out a message when a page fault occurs within your page fault handling code. Otherwise, it is extremely difficult to figure out what’s going on when you have a memory error in your page fault handling code.
The default handler code has a check
(tf->cs & 3) == 0
to detect exceptions that occur in kernel mode, and triggers a panic in this case. You could use this to detect whether a page fault occurs in kernel mode for debugging purposes. Note, however, that this would be normal for a system calls likeread()
that’s reading into allocate-on-demand mmeory -
xv6 supports nested exception handlers, but you must be careful to not exhaust the single kernel stack, which must have enough space to handle both a system call and a page fault that occurs within the system call.
When an exception occurs in kernel mode, the processor recognizes that it’s running in kernel mode already and doesn’t change the stack pointer to the top of the kernel stack, unlike an exception that occurs in user mode. This means that a page fault that occurs in the middle of a system call won’t overwrite information on the stack used by the system call.
(In contrast, when an exception occurs in user mode, the processor switches to the kernel-mode stack specified in the “Task State Segment”. This is set by
switchuvm()
.) -
If a page fault triggers a second page fault that triggers a third fault, the processor reboots the machine. This is also known as a triple fault.