Changelog:
- 12 Sep 2023: require page_allocate/translate to work for arbitrary
va
(not just page offset = 0) for part 1 - 17 Sep 2023: add back unintentionally removed
config.h
example from Specification - 18 Sep 2023: note
page hasn’t been used before
in page_allocate testing =virtual page not valid in the page table
- 18 Sep 2023: in sample code for testing page_allocate, cast to int for using %d to avoid warning
- 19 Sep 2023: write the value with all 1-bits as
~0
(instead of-1
) and indicate what it is - 19 Sep 2023:
when no page
→when no page tables are allocated
- 20 Sep 2023: add item to hints on if
posix_memalign
is not defined - 22 Sep 2023: remove stray semicolon in == expression in translate testing advice
- 30 Sep 2023: note explicitly that
posix_memalign
does not initialize memory
Your task:
This assignment has multiple submissions.
For all parts of this assignment, follow the coding style described below and make sure your code does not give warnings when compiled with
-Wall
1.For the first submission, you must complete one of the following:
ptbr
andtranslate
from item 2 for LEVELS=1 and POBITS=12. We describe below how you can test translate without workingpage_allocate
.ptbr
andpage_allocate
from item 2 for LEVELS=1 and POBITS=12, whereptbr
is manually set to an allocated, empty page table as described below (i.e., its lower 12 bits are 0).
(You can choose which to complete; if you attempt both, we will take the grade based on whichever option’s implementation is more complete.)
For the second submission, which is in preparation for a code review, you must complete all of items 2 and 3 below.
For the final submission, you must complete all parts below (and should fix any problems identified in the code review).
(This assignment will be worth considerably more than a normal one-week assignment. A vast majority of the grade for the assignment will be based on what you submit on the final submission. For the final submission, we will grade (again) tasks which should have been completed on earlier submissions.)
Implement a simulation of multi-level page table lookup and allocation.
For the purposes of this simulation
physical
addresses are the ones your simulation program can access via pointers.The layout of this page table must be configurable via
config.h
(example shown below), which will#define
the constants:LEVELS
– number of levels in the page tablePOBITS
– the size of page offsets
Page tables will follow the format described below.
Although pointers will be represented using 64-bit
size_t
s, usually not all bits of the virtual addresses will be used; you may assume that any unused bits will be set to0
when your code is called.Your code must implement API provided by a header file called
mlpt.h
, an version of which is supplied here. It includes:extern size_t ptbr
— page table base register; must be initially 0. Later, will contain the address of a page table your code allocates as a result of calls topage_allocate
.size_t translate(size_t va)
— translate the virtual addressva
using the currently setup page table. If the address is not allocated, return a value with all bits set to 1.void page_allocate(size_t va)
— setup a mapping in the page tables pointed to byptbr
so that it contains a mapping from the virtual addressva
to physical address (if such a mapping does not already exist). Any page tables and pages needed and not yet allocated must be allocated usingposix_memalign
. (Any pages or page tables already created, such as by a prior call topage_allocate
, should be reused. If the mapping is already entirely setup, the function should do nothing.)
Prepare a
Makefile
for your code that: - produces, as its default target, a library called libmlpt.a that exports the API defined inmlpt.h
(and does not export a function calledmain
)- has CC, CFLAGS and LDFLAGS defined
- uses defined arguments in its work
- has correct dependencies for each target
- has
all
andclean
targets
Write a
README
explaining (at a minimum):- How to customize
config.h
, with some guidance on how to pick values for that file. - Any known bugs or limitations (if you know of none, there is no need to say so; if you have an incomplete implementation so far, you’ll probably have some to document).
You are encouraged to add additional detail, such as
- Any limitations, aspects you think are incomplete, or suggestions for future expansion.
- Big-O analysis (time, space, or both).
- Description of any testing hooks you added.
- Code samples for how to use this library.
If anyone or any resource helped your implementation, also include an
ACKNOWLEDGEMENTS
file acknowledging those people and/or resources.- How to customize
Include a
LICENSE
andlicenses.txt
files as described below.See if you can add some kind of de-allocate interface without substantially reworking your prior implementation. If so, add it to your implementation (including
mlpt.h
) and document how it is used. If it seems to tricky to add easily, include some explanation of what complicates it in theREADME
instead of implementing it.
1 Specification
We supply two header files.
1.1 config.h
config.h
does nothing other than #define
ing
two constants.
/** LEVELS = number of PTEs used to translate one address. */
#define LEVELS 1
/** POBITS = number of bits used for the page offset. */
#define POBITS 12
and must continue to work even if we change their values. For
example, if we compile with a different config.h
that
#define
s LEVELS
as 3
instead of
1
, your code should create a 3-level page table instead of
a 1-level page table. Your code should work for all integer values of
LEVELS
between 1
and 6
inclusive
and all integer values of POBITS
between 4
and
18
inclusive. As an exception, it does not need to (but
may) support cases where (POBITS
− 3) ×
(LEVELS
+ 1) > 60.
1.2 mlpt.h
mlpt.h
defines the public API your code will use. You
are welcome to edit it, provided such edits will not cause testing code
to fail to compile.
/**
* Page table base register.
* Declared here so tester code can look at it; because it is extern
* you'll need to define it (without extern) in exactly one .c file.
*/
extern size_t ptbr;
/**
* Given a virtual address, return the physical address.
* Return a value consisting of all 1 bits
* if this virtual address does not have a physical address.
*/
size_t translate(size_t va);
/**
* Use posix_memalign to create page tables and other pages sufficient
* to have a mapping between the given virtual address and some physical address.
* If there already is such a page, does nothing.
*/
void page_allocate(size_t va);
2 Pages, page table and ptbr format
Since
POBITS
is the size of page offsets, pages occupy 2POBITS
bytes (wherePOBITS
is a constant defined inconfig.h
that we may change).Each page table will be an array of 8-byte page table entries. Each page table will occupy one physical page. (For example, with
POBITS
set to 12, this means that page tables will contain 212/8 = 512 entries.)When interpreted as a
size_t
, a page table entry’s least significant bit will indicate whether it is valid:If that bit is 0, then that indicates that there is no corresponding physical page. The remaining bits have no meaning, so you may use them however you wish.
If that bit is 1, then most significant
(64-POBITS)
are the physical page number of the corresponding physical page. For a non-last-level page table entry, this will be the location of the next level’s page table; otherwise, this will be a location where (non-page-table) data could be stored.If
POBITS
is set to 12, then PTE0x1234567812345679
has a one in the low-order bit, and thus indicates the presence of a physical page or another level page table. IfPOBITS
is 12, the physical page number (of the page or next page table level) is0x1234567812345
. The corresponding physical address of the first byte of that page is0x1234567812345000
.
There will be
LEVELS
level of page tables (whereLEVELS
is a constant defined inconfig.h
that we may change).The global variable
ptbr
represents the page table register. It should be0
(a null pointer) when no page tables are allocated
3 Coding Style
4-space indentation
{
on the same line as)
and}
on a line by itselfNo empty statements
Spaces around operators
Space before
(
if and only if it is preceded by a control construct (e.g.,if (
buttranslate(
)No postfix
++
or--
unless part of an expression that uses the return value (such asa[i++]
). Replacing with either prefix++
or+= 1
is OK.Lower-case identifiers with underscores separating words
Upper-case
#define
constantstypedef
such that each type name uses a single identifier (e.g., nounsigned char
except as part of atypedef
).Use variable names that indicate the meaning of the contents of the variable; function names that indicate the purpose of the function; etc.
Have a readable code organization; as rough (not strict) guidelines,
- If a sequence of lines of code depend upon one another more than they depend on the code around them and appear together in multiple places, they should be in their own function.
- Functions should be small enough to look at on one screen unless
that have a very orderly structure such as a long
switch
with simple code in thecase
s. - Comments should be present to describe any esoteric code decisions.
- All code should have a clear purpose.
4 LICENSE
and
licenses.txt
You need to choose a license for the code you supply. Include the
chosen license as LICENSE
and a file
licenses.txt
indicating your exploration of licenses.
This should be based on an existing family of license agreements. Unless you have had formal training in commercial law, you should use an existing license, not try to create your own.
4.1 licenses.txt
In licenses.txt
, list at least three licenses you
considered; include a short (sentence or two) summary of what they
seemed to say; and why you chose to or not to use each as your
license.
Your review should include at least three licenses, including
- at least one of GPL or LGPL
- at least one of MIT, BSD, Boost, Apache, SQLite
All should be software licenses; this excludes licenses such as the various CC licenses for creative works, the SIL Open Font License, operating system licenses, etc.
4.2 LICENSE
Include a LICENSE
file with the license agreement under
which you provide your code to us.
5 Not required: create a manual page
If you want to understand the real world a bit better, you might also
try writing a manual page. These are written in a language known as
roff
; to see an example of what this looks like, try running
cp /usr/share/man/man3p/poll.3p.gz ./
gunzip poll.3p.gz
less poll.3p
to see the roff source of the manual page man 3p poll
. I
don’t personally know anyone who writes roff by hand; generally it is
output from a source-code processing tool like doxygen
2 or a converter like
pandoc
from an easier-to-write format like markdown or
reStrucuredText.
Including man page page_allocate.3.gz
and
translate.3.gz
in your submission will earn you kudos. To
test that these work properly,
create a directory
man3
(i.e.,mkdir man3
)gzip the
roff
file and move it toman3/term_to_look_up.3.gz
(e.g.,gzip translate.roff; mv translate.roff.gz man3/translate.3.gz
) – note that some authoring tools may do some of this for youuse the
-M
flag withman
pointing to the parent directory ofman3
(i.e.,man -M. 3 translate
)
6 Hints
6.1 Common C issues
6.1.1 posix_memalign issues
posix_memalign(&x, ...)
is similar tox = malloc(...)
except that the memory is guarenteed at a multiple of a particular value. The pointer to pointer passed intoposix_memalign
is used in lieu of returning a pointer to the memory allocated. (The return value ofposix_memalign
indicates success (0
) or failure (non-zero error code).)If
posix_memalign
is not defined after#include <stdlib.h>
, adding#define _XOPEN_SOURCE 700
or similar before any#include
may help.posix_memalign
is not part of standard C; it’s something Linux and other Unix-like systems have added. If you are compiling with-std=c11
or similar, then by default<stdlib.h>
only includes things from the C standard. You can override this to request additional functions withfeature test macros
like_XOPEN_SOURCE
. On Linux, the commandman feature_test_macros
brings up the full documentation about thea available feature test macros.posix_memalign
does not initialize the memory it allocates to any particular value (so you cannot rely on it being all 0s, for example).
6.1.2 using ptbr
(size_t*) ptbr
uses the value contained inptbr
as a pointer — in other words, it usesptbr
as a pointer. In contrast,&ptbr
gets a pointer to the value contained inptbr
; in other words, it makes a pointer toptbr
.
6.2 Page table format and page numbers
- Our testing code will expect you to follow the page table format specified above. We specify the format in terms of page numbers rather than full addresses (which would have a page number and page offset). If our testing code seems to think that page table entries don’t contain the addresses you think they do, using a different format might be why.
6.3 Testing translate
without page_allocate
You can manually set
ptbr
to a hard-coded page table array to testtranslate
withoutpage_allocate
being implemented. The below items show examples of how to do this with POBITS = 12 and LEVELS = 1.In GCC and Clang, you can use
__attribute__((aligned(4096)))
to make a variable be allocated at an address that is a multiple of 4096. Using this, you can declare an array ofsize_t
s which takes up a (simulated)physical
page as follows:__attribute__((aligned(4096))) static size_t page_of_data[512];
We could also declare it as an array of 1-byte
char
s as follows:__attribute__((aligned(4096))) static char page_of_data[4096];
though this would make it more difficult to set individual page table entries later.
By default, global-scope variables in C (and C++) will be zero-initialized, so
page_of_data
declared as above will initially be an array of 0s. Since these have a valid bit of 0 in our page table format, you can testtranslate()
with an empty page table by doing something like:__attribute__((aligned(4096))) static size_t testing_page_table[512]; static void set_testing_ptbr(void) { ptbr = (size_t) &testing_page_table[0]; }
(and calling
set_testing_ptbr()
)With a proper implementation of
translate()
,translate(X)
will be~0
(all 1 bits) for every address (that is not too big to be a virtual address).Since page table entries are 8 bytes (the size of
size_t
), setting index X of the array declared will set index X of the page table. So, let’s say (with POBITS = 12 and LEVELS = 1), we want to set virtual page 3 to point to a (simulated)physical
page of data. We’ll first allocate memory for that physical page of data as a global variable:__attribute__((aligned(4096))) static char data_for_page_3[4096];
Then, we need to convert the physical address of
data_for_page_3
into a page table entry. To do this, we can first extract the page number from the address:size_t address_of_data_for_page_3_as_integer = (size_t) &data_for_page_3[0]; size_t physical_page_number_of_data_for_page_3 = address_of_data_for_page_3_as_integer >> 12; // instead of >> 12, we could have written: // address_of_data_for_page_3_as_integer / 4096 size_t page_table_entry_for_page_3 = ( // physical page number in upper (64-POBITS) bits (physical_page_number_of_data_for_page_3 << 12) | // valid bit in least significant bit, set to 1 1 );
Then, we can store that page number at index 3 in a page table:
// assuming testing_page_table initialized as above and ptbr points to it testing_page_table[3] = page_table_entry_for_page_3;
After this, we should observe that passing an address with virtual page number 3 to
translate()
will returnphysical
addresses that are part ofdata_for_page_3
. For example, we would expecttranslate(0x3045) == (size_t) &data_for_page_3[0x45]
6.4 Testing
page_allocate
Since
page_allocate
’s primary job is to allocate memory when neeeded, for testing, I would recommend having some way to count how many allocations occur.With
LEVELS=1
, there are three cases to test forpage_allocate
:- on the first call, two physical pages should be allocated — one to
create the page table (and set
ptbr
to a non-zero value) and one to allocate a page for data - on later calls:
- if the page number supplied to
page_allocate
hasn’t been used before (is not marked valid in the page table), then one page for data should be allocated; - otherwise, no pages should be allocated
- if the page number supplied to
- on the first call, two physical pages should be allocated — one to
create the page table (and set
page_allocate
should set page table entries pointed to byptbr
. You can treatptbr
like an array ofsize_t
s to examine them. For example, to print out the page table entry with index 3 (with POBITS=12, LEVELS=1):size_t *pointer_to_table; pointer_to_table = (size_t *) ptbr; size_t page_table_entry = pointer_to_table[3]; printf("PTE @ index 3: valid bit=%d physical page number=0x%lx\n", (int) (page_table_entry & 1), (long) (page_table_entry >> 12) );
6.5 Example for LEVELS=4, POBITS=12
The comments in the following code correctly describe the number of
pages that have ever been allocated assuming that
LEVELS
is 4 and POBITS
is 12. (With
smaller LEVELS values, either you would not use all the non-zero bits of
the virtual addresses being tested or the virtual page numbers would be
out of range.)
If you want to use this code to test your solution, it would be
useful to have a counter of how often posix_memalign
is
called.
#include <stdio.h>
#include <assert.h>
#include "mlpt.h"
#include "config.h"
int main() {
// 0 pages have been allocated
(ptbr == 0);
assert
(0x456789abcdef);
page_allocate// 5 pages have been allocated: 4 page tables and 1 data
(ptbr != 0);
assert
(0x456789abcd00);
page_allocate// no new pages allocated (still 5)
int *p1 = (int *)translate(0x456789abcd00);
*p1 = 0xaabbccdd;
short *p2 = (short *)translate(0x456789abcd02);
("%04hx\n", *p2); // prints "aabb\n"
printf
(translate(0x456789ab0000) == 0xFFFFFFFFFFFFFFFF);
assert
(0x456789ab0000);
page_allocate// 1 new page allocated (now 6; 4 page table, 2 data)
(translate(0x456789ab0000) != 0xFFFFFFFFFFFFFFFF);
assert
(0x456780000000);
page_allocate// 2 new pages allocated (now 8; 5 page table, 3 data)
}
6.6 Diagrams of translate and lookup
All these assume a ptbr
variable stored at address
0x5898
, containing 0x10000
and a
POBITS
setting of 12.
The diagrams show physical
memory addresses, with lowest
address at the bottom of the diagram and the address of each thing
labeled on the left-hand side. For example, translate(va)
labeling a line indicates that the translate(va)
(that is, the
return value of translate) is equal to the address of this line. Lines
with arrows at both ends marking a portion of the address space indicate
that the size of that space is based on the value of something. For
example, the region labeled virtual page number from va
indicates
that the size of that region is computed from the virtual page number
stored in va.
6.6.1 1-level translate
6.6.2 1-level page_allocate
6.6.3 2-level translate
6.7 reading
In addition to materials from slides, our reading on multi-level page tables work generally also include some relevant diagrams and pseudocode.