Contents
Changelog:
- 9 April 2022: refer to the “important specification details” from the “references” section.
- 13 April 2022: clarify that “disk image” and “filesystem image” mean the same thing; make it clearer that comment on path formats refers to function descriptions; move comment about
/
separating directories from fat_open description to general description of paths passed to commands - 13 April 2022: add note about
fat_test
testing things before callingfat_mount
- 13 April 2022: add explicit note about beginning of disk
- 15 April 2022: make example paths in section on
.
and..
have leading/
s like we require for paths
Your Task
-
Download the skeleton code (link — last updated 27 March 2022) containing library function declarations, the beginnings of some testing code, and a sample Makefile.
- Write a C++ library to read from a FAT32 filesystem
(without relying on the underlying
OS supporting FAT itself), except that
- You may assume filenames only contain 7-bit ASCII characters.
- You may assume that the filesystem is small enough to map into memory, such as with
mmap
. - You do not need to support “Long Directory Entries” (including any filenames longer than 8 characters plus a 3 character extension, case preserving filenames, etc.)
You will be given the contents of the FAT32 disk as a file to read from, which we call a “disk image” or “filesystem image” below.
Your library should support a POSIX-like API (but using “file descriptors” your library tracks internally rather than ones tracked by the OS) including the following functions, which are declared in
fat.h
(and much exactly match these prototypes).Refer to the references below for precise information about the layout of the FAT filesystem, since our lecture slides gloss over some important details, like the size of FAT entries or the actual mapping from cluster numbers to locations on the disk.
Whenever a path within the FAT filesystem is required for the functions below, paths must start with a
/
, and are relative to the root directory of the filesystem, and seperate directories with a/
. For example/foo.txt
refers to a file namedfoo.txt
in the root directory of the filesystem, and/bar/quux.txt
refers to a file namedquux.txt
in thebar
directory in the root directory of the filesystem, and/foo/bar/quux.txt
to a filequux.txt
inbar
infoo
in the root directory. The path/
always refers to the root directory.bool fat_mount(const std::string &path)
– open the specified FAT disk image and set it up to be used by all subsequentfat_*
calls. In this function onlypath
is a path on the underlying OS’s filesystem instead of a path in the FAT volume. This should return true if the disk image is successfully opened, and false on any error.int fat_open(const std::string &path)
– open the specified file within the filesystem image previously passed tofat_mount
. Returns an integer file descriptor (one used by your library, not a real file descriptor on the underlying OS) that your library will accept for other calls. On failure, returns -1 instead; you must fail if you ask to open a file that does not exist. You must support opening at least 128 files at a time; after 128 files are opened you may return an error if more are opened. You may (but are not required to) also return an error when a directory is opened.bool fat_close(int fd)
– close the specified file descriptor returned byfat_open
. Returnsfalse
on failure (e.g. file not open).int fat_pread(int fd, void *buffer, int count, int offset)
– copycount
bytes starting with byte numberoffset
in the previously opened file intobuffer
. Byte offsets start with0
, so reading with an offset of0
and acount
equal to the size of the file will read the entire file. Returns the number of bytes read, which will be less thancount
if the caller asks to read past the end-of-file. Returns -1 if the read fails (e.g.fd
was not opened). Reading 0 bytes succesfully, such as whencount
is 0 or whenoffset
is past the end of file should return0
, not-1
.vector<AnyDirEntry> fat_readdir(const std::string & path)
– return an vector of directory entries for a directory identified bypath
within the filesystem image previously passed tofat_mount
.AnyDirEntry
is a struct declared in the supplied filefat.h
. These directory entries should be exactly as found on disk. You need not filter out unused entries. In the event of an error, return an empty vector.
You should include a Makefile that builds your library as libfat.a, like our sample Makefile will.
- For the checkpoint, implement any three of the following:
fat_readdir
of a single-cluster root directoryfat_readdir
of single-cluster directories contained within a single-cluster root directoryfat_readdir
of a multi-cluster root directoryfat_readdir
of single-cluster subdirectories of directories contained within a single-cluster root directoryfat_open
of a single-cluster file in a single-cluster root directory, followed byfat_pread
of the file using an offset of0
and a size equal to its real sizefat_open
of a single-cluster file in subdirectories of the root directory, followed by afat_pread
of the file using an offset of0
and size equal to its real sizefat_open
of a multi-cluster file in the root directory, followed byfat_pread
of the file using an offset of0
and a size equal to its real sizefat_open
of a multi-cluster file in the root directory, followed byfat_pread
to read an arbitrary part of the file, possibly using a size or offset that should result in thepread
call returning less than thecount
requested
- Create a tar.gz archive like the one our supplied Makefile does with
make archive
, and submit it via the submission site
References
-
Refer to the FAT specification. This specification documents FAT32, FAT16, and FAT12, but you are only required to implement FAT32.
- Note particularly the section on “important specification details” below.
-
https://wiki.osdev.org/FAT may be a convenient reference.
-
Here are some disk images to test with:
- sampledisk32.raw
- testdisk1.raw (modified version of sampledisk32.raw, expected by included
fat_test
)
For reference this is a archive of the files you should expect to find in sampledisk32.raw (with the root directory of the disk represented by the ‘sampledisk32’ directory in the archive), and this is an archive of the filesin testdisk1.raw.
Included files
-
We include a
fat.h
which includes the prototypes necessary forAnyDirEntry
.AnyDirEntry
has the exact same layout in memory that the directory entries have on disk, so you do not need to convert field-by-field. -
We include a
fat_internal.h
which includes a struct that has the same layouts as the Bios Parameter Block to keep you from needing to worry about the details of converting that to a C struct. -
The structs declared in
fat.h
andfat_internal.h
are laid out so that values in memory have the exact same layout that have one disk. This allows you to convert the bytes on disk directly to the struct rather than copying variable-by-variable.To make this work, in addition to setting the size of each variable in the struct, we’ve used
__attribute__((packed))
to tell GCC to not add any extra space between each of the values in the struct. -
We include two testing programs that will be built and linked by the supplied Makefile:
-
one called
fat_shell
that lets you have a shell-like interface to the filesystem.Type
help
in this utility to see a brief description of the commands it offers.This utility, or custom test programs you write (perhaps by modifying this or our other test program), should be the primary way you check that your filesystem works correctly.
You are welcome to read the source code to
fat_shell
. If you find something you want to use in yourfat.cc
, this is fine so long as you clearly indicate that this in your submission. -
one called
fat_test
that runs some fixed tests for thetestdisk1.raw
image linked above. Note that we may run different tests when grading your submission, including tests that cover functionality not exercised by this test. Some of the tests included in this are:- calling various
fat_*
functions before callingfat_mount
to make sure they return an error - reading from the multi-cluster root directory in
testdisk1.raw
- reading the single-cluster file
congrats.txt
in the root directory - reading from the contiguous multi-cluster file
gamecopy.txt
in the root directory - reading from the noncontiguous multi-cluster file
gamefrag.txt
in the root directory (which contains the same text asgamecopy.txt
) - reading from files with non-3-character extensions like
foo.a
- reading from files in subdirectories and subsubdirectories which requires traversing multi-cluster directories and directory entries that are marked as deleted
Selected notes on included tests:
- before calling
fat_mount
,fat_test
first checks that other functions return an error when called before anything is mounted — often students who don’t handle this situation will get errors from derferencing a null pointer or similar; fat_test
includes a test labelled “readdir of root dir – past 0xE5”. This test will not work unless multicluster root directory reading works.
Look at the source code (in particular the
mounted_tests()
function) for complete information. - calling various
-
Hints
Viewing FAT filesystems for testing/debugging
-
You may find a hex editor like
bless
helpful to examine the contents of a FAT32 disk image. -
You can mount a FAT disk image in your VM (that is, setup the disk image so you can access its files like other files in the VM) with the mount command. After creating a directory like
/mnt
to which to mount it, run something likesudo mount -o umask=0000 the-fat-disk-image.img /mnt
to mount it. Then you can look in
/mnt
to view its files. Theumask=0000
option ensures that the files in the filesystem are accessible to all users, which is probably simplest on a VM, but not secure on a multiuser system. Alternately, you could replaceumask=0000
withuid=XXX
whereXXX
is the user ID number printed out with theid
command.You can then unmount it with
sudo umount /mnt
. If you’re worried about accidentally modifying the disk image this way, you can also mount it read-only with a command like
sudo mount -o ro,umask=0000 the-fat-disk-image.img /mnt
Note that this will show “long” filenames you are not required to support.
Important specification details
Locating clusters by cluster number
-
Our slides suggested that cluster 0 was located at the beginning of the disk. The way the FAT specification specifies to interpret cluster numbers in the filesystem is a bit more complex than this. (In the section about the general layout of the disk, the FAT specification describes how to find a sector (what an OS would use to read part of a disk) that contains the data for a cluster with a particular number. The OSdev wiki we point to has the same information duplicated in its outline of how to read a directory.)
-
What the specification considers sector 0 is at the beginning of the disk.
Locating the end of a list of clusters
- The FAT format specifies what special values (there is more than one possibility) indicate the end of a list of clusters, which the specification calls a “cluster chain”. See the FAT specification for details.
Representing filenames
-
In directory entries, filenames are represented using 11-character arrays which are not nul-terminated. Page 24 of the specification has examples of how filenames are mapped to array values.
-
One way of loading a non-nul-terminated character array into a C++
string
is to use the constructor that takes two arguments, a pointer to the start of the character array and a pointer to the end of the character array like:char some_array[11] = {'A','B','C','D','E','F','G','H','I','J','K'}; std::string array_as_string(&some_array[0], &some_array[11]);
There are also several other constructors that could be useful, or you could work with character arrays directly instead of C++
string
s, or you could copy a non-nul-terminated character array into a nul-termianted character array and then use functions that expect nul-terminated C-style strings.
Possible order of implementation
This is what I used for my reference implementation, you can do something different.
-
Load the BPB (the “header” indicating where the FAT and root directory are) from the disk into the supplied struct.
-
Write a function to read a particular cluster; use this to read the first cluster of the root directory (whose location is specified by information in the BPB) and implement
readdir
(for only the root directory, and only its first cluster). -
Write a function to read a particular entry from the FAT. Use this to implement
readdir
for when the root directory takes up multiple clusters. -
Write code for open and close that handles allocating and deallocating file descriptors, tracked by some global data structure. For now, don’t use a path yet, just open a file based on the number of its first cluster (and whatever else you need to track a file descriptor). The purpose of this step is just to have the code which chooses file descriptor numbers working.
-
Write code for open that handles paths in the root directory only and fails if the filename would be in the root directory but does not exist.
-
Write code for pread that handles reading from the first cluster of files only. You will need to modify your open code to store information needed to find this cluster.
-
Write code for pread that handles reading from more than the first cluster of files.
-
Modify
readdir
andopen
to traverse directories. You will probably want to create a shared utility function that looks up directory entries by pathname for both of these.
C++ tricks
Handy library functions
-
You may find the std::toupper function in
<cctype>
helpful for comparing filenames. -
std::string
has afind
method andsubstr
method you might consider using (see documentation). Or you can simply iterate through character-by-character for string opertaions. -
You can get the size of a file by seeking to the end and reading the location then seeking back to the beginning. You can also use the POSIX functions
fstat
orstat
.
Reading into structs
- Given a
char*
that contains data read from disk that contains the bytes of an array of structsFoo
, you can convert it into aFoo*
by casting. This is how I would recommend reading the array of directory entries from a directory.
Reading binary files generally in C/C++
-
Some options for reading files as binary data (like you’d want to do for the disk image file) include:
- using the POSIX functions open() and mmap()
- using the C standard library functions fopen() and fread() and fseek()
- using the C++ standard library class ifstream
- using the POSIX functions open() and pread()
- using the POSIX functions open() and read() and lseek()
On paths with .
and ..
-
FAT directories except the root directories should contain
.
and..
entries that point to the itself and the parent directory, so your filesystem should treat paths like/foo/../bar
as equivalent to/bar
“for free” — except that:- you will need to treat these paths specially because they don’t have an extension like
.txt
and don’t use the extension part of the filename field of the directory entry - FAT uses cluster number
0
in directory entries to represent the root directory, even if it is stored in a different cluster number.
We do not care if you handle the directories
/..
or/.
(though a real OS would do this, even though the root directory does not contain these directory entries) - you will need to treat these paths specially because they don’t have an extension like