Changelog:

  • 9 Feb 2023: edit example code to use 0x456789abcdef instead of 0x123456789abcdef to avoid virtual addresses having unused bits.
  • 11 Feb 2023: fix page table offsets to page offsets; explicitly indicate that page tables are all the same size and start at the beginning of a page
  • 11 Feb 2023: mention that page tables are one page in the Your task section in addition to the sections below.
  • 13 Feb 2023: replace present with valid in Your task to be consistent with rest of writeup
  • 13 Feb 2023: explicitly indicate that page_allocate also allocates pages which don’t hold page tables in order to make the mapping work
  • 13 Feb 2023: explicitly note that not all of size_t va is used for the address translation because of the page table sizes
  • 14 Feb 2023: add hints section with some common issues
  • 18 Feb 2023: copy diagrams from lecture into hints section; along with new diagram for 2-level lookup; more prominenantly point to reading.
  • 20 Feb 2023: note more prominently that example code assumes LEVELS = 4
  • 27 Feb 2023: correct URL for reading link at bottom of hints

Your task:

  1. This assignment has multiple submissions.

    For the first submission, you must complete:

    • item 2 below with LEVELS only set to 1 and POBITS only set to 12.

    and two of:

    • item 2 with LEVELS being set to 2 (in addition to 1) and POBITS only set to 12
    • item 2 with LEVELS and POBITS having any valid value
    • item 3 below (a Makefile)
    • item 4 below (a README)
    • item 5 below (a LICENSE and licenses.txt file)

    For all parts of this assignment, follow the coding style described below and make sure your code does not give warnings when compiled with -Wall1.

    For the second submission, which is in preparation for a code review, you must complete all parts except part 5 and 6 below.

    For the final submission, you must complete all parts below (and should fix any problems identified in the code review).

    (This assignment will be worth considerably more than a normal one-week assignment. A vast majority of the grade for the assignment will be based on what you submit on the final submission. For the final submission, we will grade (again) tasks which should have been completed on earlier submissions.)

  2. Implement a simulation of multi-level page table lookup and allocation. For the purposes of this simulation physical addresses are the ones your simulation program can access via pointers.

    The layout of this page table must be configurable via config.h, which will #define the constants:

    • LEVELS – number of levels in the page table
    • POBITS – the size of page offsets

    Page table entries will take up 8 bytes, with the least significant bit being the valid bit. Each page table (at each level) will take up one page.

    Your code must implement API provided by a header file called mlpt.h, an version of which is supplied here. It includes:

    • extern size_t ptbr — page table base register; must be initially 0. Later, will contain the address of a page table your code allocates as a result of calls to page_allocate.

    • size_t translate(size_t va) — translate the virtual address va using the currently setup page table. If the address is not allocated, return a value with all bits set to 1.

    • void page_allocate(size_t va) — use posix_memalign to create page tables and other pages sufficient to have a mapping between the given virtual address and some physical address. If there already is such a page, does nothing.

  3. Prepare a Makefile for your code that: - produces, as its default target, a library called libmlpt.a that exports the API defined in mlpt.h (and does not export a function called main)

    • has CC, CFLAGS and LDFLAGS defined
    • uses defined arguments in its work
    • has correct dependencies for each target
    • has all and clean targets
  4. Write a README explaining (at a minimum):

    • How to customize config.h, with some guidance on how to pick values for that file.
    • An example use case for this code. Note that this cannot be to implement virtual memory since this code relies on posix_memalign, which cannot work without virtual memory already being in place.
    • Any known bugs or limitations (if you know of none, there is no need to say so; if you have an incomplete implementation so far, you’ll probably have some to document).

    You are encouraged to add additional detail, such as

    • Any limitations, aspects you think are incomplete, or suggestions for future expansion.
    • Big-O analysis (time, space, or both).
    • Description of any testing hooks you added.
    • Code samples for how to use this library.

    If anyone or any resource helped your implementation, also include an ACKNOWLEDGEMENTS file acknowledging those people and/or resources.

  5. Include a LICENSE and licenses.txt files as described below.

  6. See if you can add some kind of de-allocate interface without substantially reworking your prior implementation. If so, add it to your implementation (including mlpt.h) and document how it is used. If it seems to tricky to add easily, include some explanation of what complicates it in the README instead of implementing it.

1 Specification

We supply two header files.

1.1 config.h

config.h does nothing other than #defineing two constants.

/** LEVELS = number of PTEs used to translate one address. */
#define LEVELS  4

/** POBITS = number of bits used for the page offset. */
#define POBITS  12

Your code must use these constants, must not redefine them, and must continue to work even if we change their values. For example, if we compile with a different config.h that #defines LEVELS as 3 instead of 4, your code should create a 3-level page table instead of a 4-level page table. Your code should work for all integer values of LEVELS between 1 and 6 inclusive and all integer values of POBITS between 4 and 18 inclusive. As an exception, it does not need to (but may) support cases where (POBITS − 3) × (LEVELS + 1) > 60.

1.2 mlpt.h

mlpt.h defines the public API your code will use. You are welcome to edit it, provided such edits will not cause testing code to fail to compile.

/**
 * Page table base register.
 * Declared here so tester code can look at it; because it is extern
 * you'll need to define it (without extern) in exactly one .c file.
 */
extern size_t ptbr;

/**
 * Given a virtual address, return the physical address.
 * Return a value consisting of all 1 bits
 * if this virtual address does not have a physical address.
 */
size_t translate(size_t va);

/**
 * Use posix_memalign to create page tables and other pages sufficient
 * to have a mapping between the given virtual address and some physical address.
 * If there already is such a page, does nothing.
 */
void page_allocate(size_t va);

1.3 Behavior

Prior to the first invocation of page_allocate, ptbr should be NULL; thereafter it should be a pointer to a heap-allocated array, cast as a size_t. It should not change after the first page_allocate invocation.

If ptbr is not NULL, it should point to an array of size_ts occupying 2POBITS bytes. Each such size_t should be interpreted as a page table entry.

Each other page table should be the same size. All page tables should start at the beginning of a page. Since page tables are limited in size, probably not all bits of the size_t va passed to translate and page_allocate will be used in the address translation process.

Some texts refer to a PTBR containing a physical address; others to it containing a physical page number. The above definition asserts it contains a physical address, not page number. As such, it will always have 0 in its low-order POBITS bits.

Each page table entry should be either

0 in the low-order bit

There is no physical page.

The remaining bits have no defined meaning; you may use them however you wish.

PTE 0x1234567812345678 has a zero in the low-order bit, and thus indicates the absence of a physical page.

1 in the low-order bit

The bits above the POBITS low-order bits are the physical page number of either the next level page table or the physical page of memory. The POBITS − 1 low-order bits (excluding the 1 in the lowest-order bit) have no defined meaning; you may use them however you wish.

PTE 0x1234567812345679 has a one in the low-order bit, and thus indicates the presence of a physical page or another level page table. If POBITS is 12, the physical page number (of the page or next page table level) is 0x1234567812345.

No pages should be allocated unless requested by a call to page_allocate.

The comments in the following code correctly describe the number of pages that have ever been allocated assuming that LEVELS is 4 and POBITS is 12. (With smaller LEVELS values, either you would not use all the non-zero bits of the virtual addresses being tested or the virtual page numbers would be out of range.)


#include <stdio.h>
#include <assert.h>
#include "mlpt.h"
#include "config.h"

int main() {
    // 0 pages have been allocated
    assert(ptbr == 0);

    page_allocate(0x456789abcdef);
    // 5 pages have been allocated: 4 page tables and 1 data
    assert(ptbr != 0);

    page_allocate(0x456789abcd00);
    // no new pages allocated (still 5)
    
    int *p1 = (int *)translate(0x456789abcd00);
    *p1 = 0xaabbccdd;
    short *p2 = (short *)translate(0x456789abcd02);
    printf("%04hx\n", *p2); // prints "aabb\n"

    assert(translate(0x456789ab0000) == 0xFFFFFFFFFFFFFFFF);
    
    page_allocate(0x456789ab0000);
    // 1 new page allocated (now 6; 4 page table, 2 data)

    assert(translate(0x456789ab0000) != 0xFFFFFFFFFFFFFFFF);
    
    page_allocate(0x456780000000);
    // 2 new pages allocated (now 8; 5 page table, 3 data)
}

2 Coding Style

3 LICENSE and licenses.txt

You need to choose a license for the code you supply. Include the chosen license as LICENSE and a file licenses.txt indicating your exploration of licenses.

This should be based on an existing family of license agreements. Unless you have had formal training in commercial law, you should use an existing license, not try to create your own.

3.1 licenses.txt

In licenses.txt, list at least three licenses you considered; include a short (sentence or two) summary of what they seemed to say; and why you chose to or not to use each as your license.

Your review should include at least three licenses, including

All should be software licenses; this excludes licenses such as the various CC licenses for creative works, the SIL Open Font License, operating system licenses, etc.

3.2 LICENSE

Include a LICENSE file with the license agreement under which you provide your code to us.

4 Not required: create a manual page

If you want to understand the real world a bit better, you might also try writing a manual page. These are written in a language known as roff; to see an example of what this looks like, try running

cp /usr/share/man/man3p/poll.3p.gz ./
gunzip poll.3p.gz
less poll.3p

to see the roff source of the manual page man 3p poll. I don’t personally know anyone who writes roff by hand; generally it is output from a source-code processing tool like doxygen2 or a converter like pandoc3 from an easier-to-write format like markdown or reStrucuredText.

Including man page page_allocate.3.gz and translate.3.gz in your submission will earn you kudos. To test that these work properly,

  1. create a directory man3 (i.e., mkdir man3)

  2. gzip the roff file and move it to man3/term_to_look_up.3.gz (e.g., gzip translate.roff; mv translate.roff.gz man3/translate.3.gz) – note that some authoring tools may do some of this for you

  3. use the -M flag with man pointing to the parent directory of man3 (i.e., man -M. 3 translate)

5 Hints

5.1 Common C issues

  1. posix_memalign(&x, ...) is similar to x = malloc(...) except that the memory is guarenteed at a multiple of a particular value. The pointer to pointer passed into posix_memalign is used in lieu of returning a pointer to the memory allocated. (The return value of posix_memalign indicates success (0) or failure (non-zero error code).)

  2. (size_t*) ptbr uses the value contained in ptbr as a pointer — in other words, it uses ptbr as a pointer. In contrast, &ptbr gets a pointer to the value contained in ptbr; in other words, it makes a pointer to ptbr.

5.2 Page table format and page numbers

  1. Our testing code will expect you to follow the page table format specified above. We specify the format in terms of page numbers rather than full addresses (which would have a page number and page offset). If our testing code seems to think that page table entries don’t contain the addresses you think they do, using a different format might be why.

5.3 Slide diagrams of translate and lookup

All these assuming ptbr variable stored at address 0x5898, containing 0x10000, as well as some other value stored in the page tables found.

5.3.1 1-level translate

5.3.2 1-level page_allocate

5.3.3 2-level translate

5.4 reading

In addition to materials from slides, our reading on multi-level page tables work generally also include some relevant diagrams and pseudocode.


  1. …because you resolved the warnings, not because you disabled them with warning-ignoring #pragmas or the like.↩︎

  2. which is installed on department machines; see man doxygen for more.↩︎

  3. a very old version (1.3) is installed on the department servers without man support; visit https://pandoc.org/ to get a copy yourself.↩︎