Changelog:
- (while still tentative) 2 Feb 2022 6pm: update testing script to increase portablity by looking for
false
in both/bin
and/usr/bin
and to identify likely incompatible tests; add note for those who started early - 8 Feb 2022: consistently recommend
module load gcc
for getting a recent GCC on department machines - 9 Feb 2022: add notes about using
module load python
on department machines for access topython3
- 9 Feb 2022: add duplicated or corrupted arguments to common problems list in hints
- 10 Feb 2022: point to testing tool section of hints from “Your Task”; move Linux-specificness discussion to that section.
- 11 Feb 2022: add note about test timeouts to hints about the test section; add note about simulating test script tests with
cat file | ./msh
- 11 Feb 2022: add note about how to imitate “fork fails” test manually; update test compatiblity note to mention
head
being different on OS X - 13 Feb 2022: add note about updated
shell_test.py
fortest/PD
issues (related to permissions working differently on a non-Linux filesystem); update shell-skeleton.tar.gz to include this updated shell_test.py - 14 Feb 2022: update
shell_test.py
to deal with more cases whereshell_test.py
does not work (related to permissions working differently on a non-Linux filesystem) - 16 Feb 2022: update
shell_test.py
to avoid usinghead
command in way that doesn’t work with OS X’s head command - 17 Feb 2022: in description of the testing tool, remove comment about exit status 255 that seems inconsistent with specification
Your Task
- Download the skeleton code last updated 2022-02-16 4:30pm, which contains:
- a starter
Makefile
which has a target to build a binary calledmsh
- a source file
main.cc
which implements a prompt that prints a “Not implemented” error on any command but “exit”.
- a starter
- Modify
msh
to be a Unix-like shell that supports running commands as described below, that:- prompts for commands using
>
(greater than, followed by space) - support running simple commands (e.g.
/bin/cat foo.txt bar.txt
) - support input redirection from a file (e.g. commands like
/usr/bin/gcc -E - < somefile.txt
) - support output redirection to a file (e.g. commands like
/usr/bin/gcc -E file.cc > somefile.txt
) - support pipelines of multiple commands (e.g. commands like
/bin/cat foo.txt | /bin/grep bar | /bin/grep baz
or/bin/cat foo | /bin/grep baz > ouptut.txt
) - supports the builtin command
exit
- outputs the exit status of each command as described below
- prints out error messages to stderr (e.g. via
std::cerr
) as described below - does not invoke the system’s shell to implement any of this functionality
The sections below attempt to precisely describe the syntax of commands and the procedure to be followed when running commands.
We strongly recommend creating partially working versions such as described below. A full solution will be around 200 lines of code.
You may use additional source files, rename source files, switch from C++ to C, etc. so long as you modify the Makefile appropriately and still produce an executable called
msh
. - prompts for commands using
-
For the checkpoint, you must be able to run simple commands with no arguments (like
/bin/something
) and wait for them to finish before executing the next command and implement at least three of the following:- printing the exit status of each command (like
/bin/foo exit status: 4
) or printing that a command terminated other than by exiting - running commands with one or more arguments (
/bin/something argument1 argument2
) - running commands with an input redirection (
/bin/something < input.txt
) - running commands with an output redirection (
/bin/something > output.txt
) - running a pipeline of two commands (
/bin/first-program | /bin/second-program
) - running a pipeline of three or more commands (
/bin/first-program | /bin/second-program | /bin/third-program
)
For the checkpoint, we do not care how your program behaves from commands that you do not implement. For example, if you do not implement pieplines of two or more commands, you may reject all commands with a
|
token, or interpret the|
as a normal argument, or something else. - printing the exit status of each command (like
-
We intend to test the shell you submit on a Linux enviornment similar to the course VM or to the department machines (e.g. portal.cs.virginia.edu). (If you using the department machines, run
module load gcc
before compiling to use a recent version of GCC, andmodule load python
to makepython3
available for tests.) Please make sure your submitted shell will build and work in this environment.To aid you in testing your final submission we have supplied a test in
shell_test.py
which can be run viamake test
. (See a more detailed explanation below of these tests, and certain caveats about them.) -
Prepare a
.tar.gz
file like the one built usingmake archive
in the given Makefile. This should have all your source files and a Makefile which can produce anmsh
binary. It should not contain any object files or a pre-builtmsh
executable. - Submit the
.tar.gz
file on the submission site for the final submission or for the checkpoint submission
Specification
Shell language
Shell commands are lines which consist of a sequence of whitespace-seperated tokens. Whitespace characters are space (' '
in C or C++), form feed ('\f'
), newline ('\n'
), carriage return ('\r'
), horizontal tab ('\t'
), vertical tab ('\v'
).
Each line has a maximum length of 100 characters. We do not care what your shell does with longer input lines. This limit implies certain limits on the number of arguments, number of commands in a pipeline, etc. that may simplify your implementation.
Each token is either
-
an operator, if it is
<
or>
or|
, or -
a word otherwise
Each line of inputs is a pipeline, which consists of a series of a commands (possibly just one)
seperated by |
tokens.
Each well-formed command consists of a series of (in any order):
-
up to one input redirection operation, which consists of a
<
token followed by a word token -
up to one output redirection operation, which consists of a
>
token followed by a word token, and -
one or more words (not part of any redirection operation), which are used to form the command for an exec system call
Every command must have at least one word which is not part of a redirection operation; otherwise it is malformed.
Every >
or <
operator must be followed by a word token; otherwise, the command is malformed.
Any command which has more than one input redirection operation or more than one output redirection operation is malformed.
As a special case, you may optionally accept and execute no programs for a line containing no tokens, or you may treat it as a malformed command.
Running commands
To run a pipeline, the shell runs each command in the pipeline, then waits for all the commands to terminate. If any command in the pipeline is malformed, you may instead print an error and not execute any command in the pipeline.
To run a command, the shell:
-
checks the command is malformed according to the specification above
-
first checks if it is the built-in command listed below, and if so does something special.
-
forks off a subprocess for the command
-
if it is not the first command in the pipeline, connect its stdin (file descriptor 0) to the stdout of the previous command in the pipeline. The output should go to a pipe created by
pipe()
(seeman pipe
) before the subprocess was forked. (You may not, for example, create a normal temporary file, even if this sometimes works.) -
if it is not the last command in the pipeline, connect its stdout (file descriptor 1) to the stdin of the next command in the pipeline
-
if there is an output redirection operation, reopen stdout (file descriptor 1) to that file. The file should be created if it does not exist, and truncated if it does.
-
if there is an input redirection operation, reopen stdin (file descriptor 0) from that file
-
uses the first word (ignoring words in redirection operations) as the pathof the executable run. The shell should not search the PATH environment variable.
-
if an error occurs that prevents running the executable after forking, then exit either with exit status 255 or in some way that permits printing out an special exit status message.
Both redirection operations above must occur after connecting file descriptors for pipelines, so running a command like foo > test.txt | bar
should result in foo
’s standard output being the file test.txt
and bar
’s input being a pipe that immediately indicates end-of-file.
You may assume that enough file descriptors are available to run any one pipeline, even if you were to
attempt to create all pipes and open all files necessary to run the pipeline all at once. However, you
must close file descriptors between commands, so that you do not run out of file descriptors after
running many commands that use pipes and/or redirection. You should also not exec
programs with extra
file descriptors open so that they have the full number of file descriptors available to them.
Any errors that occur executing or preparing to execute a command should result in error messages as described below.
If an error occurs while preparing to execute a command, then the executable should not be run. However, it
is okay if you fork()
and print an exit status in this case; this means that it is permissible to detect errors
after fork()
ing and exit with an error code.
Outputting exit statues
After running each command in the pipeline and before prompting for the next pipeline, the shell should wait for all commands in the pipeline and output their exit statuses. To output their exit statuses, for each command in the order the commands appeared in the pipeline, the shell should print out to stdout information about how the command terminated on its own line:
-
if the command terminates due to a signal (e.g. segfault), then print out a line starting with the name of the program, followed by any text of your choice that describes what happened;
Testing this with control-C probably won’t work — the command will terminate with an abnormal status, but your shell will probably also be terminated. We do not require you to change this behavior.
-
if you fork but then detect an error after forking, such as not being able to successfully
exec
the program, then either print out the same message you would print out if the program ran and exited normally with exit status 255, or the program name, optionally followed by its arguments, followed by a colon followed by a message of your choice. (If you choose a custom message, it may not be the same as a program exiting normally. The supplied testing script doesn’t always handle custom messages for this case; this is an oversight and we will do this when grading.) -
otherwise, then output the name of the program, optionally followed by its arguments, followed by a space, then
exit status:
then another space, then the numerical exit status. For example, if runningtest/foo argument
results in thefoo
executable callingexit(99)
(whereexit
is the C exit function), then you may outputtest/foo exit status: 99
ortest/foo argument exit status: 99
.
Built-in command
Your shell should support the built-in command exit
. When this command is run your shell should terminate normally (with exit status 0
). You should not run a program called exit
, even if one exists.
Handling errors
Your shell should print out all error messages to stderr (file descriptor 2).
You must use the following error messages:
- If an executable does not exist, print an error message containing “Command not found” or “No such file or directory” (case insensitive). You may include other text in the error message (perhaps the name of the executable the user was trying to run) or print out additional messages in this case. Note that “No such file or directory” is what
perror
orstrerror
will output for anerrno
value of ENOENT, as will happen whenexecv
is passed an executable path that does not exist. (See also the manual pages you can get by runningman perror
orman strerror
.) - If a command is malformed according to the language above, print an error message containing “Invalid command” (case insensitive)
You must also print out an error message (but you may choose what text to output) if:
exec
fails for another reason (e.g. executable found but not executable)fork
orpipe
fail for any reason. Your program must not crash in this case.open
ing a input or output redirected file fails for any reason.
If a command results in an error and you detect this error after successfully
fork()
ing, then in addition to printing out the error, you may also print out an exit status message
for the program (as described above, most likely indicating exit status 255), even though it was not executed successfully.
If one command in a pipeline results in an error, you must print out at least one error message, but it is okay if other commands in the pipeline are executed even though some error messages are printed out (e.g. it’s okay if something_that_does_not_exist | something_real
prints an error about something_that_does_not_exist
after starting something_real
). It is also acceptable for you to execute none of the commands in the pipeline in this case.
If multiple errors could occur while executing a command, then you may print out messages for any or all of the possible errors.
For example, when running the command foo < input.txt > output.txt
, if opening both input.txt
and output.txt
would fail, then
you may output an error message about opening input.txt
failing, about opening output.txt
failing, or both.
Example shell session
-
When your shell is complete, a typical session might look like:
> /bin/ls main.cc main.o Makefile msh msh.cc msh.h shell_test.py test /bin/ls exit status: 0 > /bin/cat Makefile | /bin/grep msh: msh: main.o /bin/cat exit status: 0 /bin/grep exit status: 0 > /usr/bin/wc < Makefile > wc.out /usr/bin/wc exit status: 0 > /bin/cat wc.out | /bin/sed -e s/^/wc:/g > wc.filtered.out /bin/cat exit status: 0 /bin/sed exit status: 0 > /bin/cat wc.filtered.out does-not-exist wc: 19 47 362 /bin/cat: does-not-exist: No such file or directory /bin/cat exit status: 1 > exit
Where text after a
>
prompt is the input to the shell.
Approximate grade breakdown
- about 25%: commands without redirection or pipes
- about 25%: commands with redirectfion
- about 33%: commands with pipes
- about 16%: error handling, parsing corner cases, etc.
Hints
Testing tool
-
We have supplied a
shell_test.py
program, which you can run by runningmake test
. This will often produce a lot of output, so you might try redirecting its output to a file like withmake test >test-output.txt
.These tests are intended for a Linux system; some of the tests will not work on, for example, OS X (outside a VM). As 2 Feb 2022 6pm, the test script supplied in the skeleton code should allow many tests to work correctly on OS X, but several remain incompatible. Until an update on 16 Feb, this included, notably, the test involving the command
head
, which uses an argument to head that the version ofhead
typically installed on OS X will not handle. Earlier versions are less compatible; if you have one of them, you can download an updated shell_test.py.If you experience problems relating to removing
test/PD
, this was an incompatibility with non-Linux filesytems, if you download an updated shell_test.py, it should work around this issue. ([updated 14 Feb 3pm:] My first attempt to workaround this issue did not handle some configurations, so if you downloaded an updatedshell_test.py
before 14 Feb around 3pm, try again.)We strongly advise against making changes and just looking at how it effects
make test
’s count of passed tests, except perhaps when you are very close to finishing. Some tests may be looking for things which you could pass “by accident” (e.g. not producing excess output in some circumstances, which both a very incomplete or buggy but complete implementation might do), so you must look at what tests in particular you are failing.Note that we may use different tests when we grade your submission. The supplied tests are intended to help you test your final submission and not to substitute for doing your own testing.
The supplied Makefile builds with AddressSanitizer to help you detect memory errors, such as out-of-bounds accesses and memory leaks (at the cost of making your shell run a bit slower). You should expect any submission with memory errors not to get full credit (even if you disable AddressSanitizer in the Makefile you submit). However, we do expect some tests to give a “LeakSanitizer has encountered a fatal error” message (see below).
-
The testing tool includes some “fork fails” tests which arranges for calls to
fork()
to return an error by setting a limit on the number of processes that can be created before running your shell. There are two issues you may encounter related to this setting:-
LeakSanitizer, the memory leak detector that is part of AddressSanitizer, does not work when it is not possible to use
fork()
. You may see an error message from LeakSanitizer because of this that looks like:Sanitizer output (main process): ==23797==LeakSanitizer has encountered a fatal error. ==23797==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1 ==23797==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
This is normal and does not indicate a problem with your code, just a limitation of LeakSanitizer.
-
The Windows Subsystem for Linux (WSL) apparently does not support the limit on the number of processes, so the test will not work properly on it. If you are concerned about this test, you may wish to try your shell on another system (such as via SSH’ing into portal.cs.virginia.edu (password reset; run
module load gcc
before compiling to use a recent C compiler andmodule load python
to get access topython3
) or using a Linux VM) instead.
-
-
Some of the tests in
shell_test.py
have time limits. I think the most common cause of exceeding time limits is code that hangs. However, in some cases it seems that student’s environments run substantially slower than the machine I used to set these time limits, so students will exceed the time limit even though their code isn’t hanging or doing something especially slow. You can edit the time limits inshell_test.py
by adding a something like'timeout': 10,
(for a 10 second timeout) to a test case. I would recommend verifying that you don’t appear to be doing something quadratic or otherwise especially slow if this seems necessary. -
Our test script runs your shell with its input coming from a pipe, so a good way to emulate a test outside the test script is to put its input into a file and do
cat file | ./msh
. -
If you want to imitate the “fork fails” test — where we run your shell in an environment where “fork” always fails — manually, you can set a limit on the number of processes using something like
ulimit -u 1
, then in the system shell do something likeexec ./msh
to run./msh
in place of the system shell. The resulting shell should not be abe to successfully fork.
A possible order of operations
(This roughly matches how I implemented my reference solution.)
-
Implement and test parsing commands into whitespace-separated tokens. Collect the tokens into an array or vector to make future steps easier.
-
Implement and test running commands without pipelines or redirection. In this case the list of tokens you have will be exactly the arguments to pass to the command.
-
Add support for redirection.
-
Add support for pipelines.
Parsing
-
In C++, you can read a full line of input with
std::getline
. In C, you can read a full line of input withfgets
. -
In C++, one way to divide a line of input into tokens is by using
std::istringstream
likestd::istringstream s(the_string); while (s >> token) { processToken(token); }
-
In C, one way to divide a line of input into tokens is by using
strsep
likechar *p = the_string; char *token; while ((token = strsep(&p, " \t\v\f\r\n")) != NULL) { ... }
Note that
strsep
changes the string passed to it. -
My reference implementation creates a class to represent each command in a pipeline (
|
-seperated list of things to run), and a class to represent the pipeline as a whole. I first collect each command line into a vector of tokens, then iterate through that vector to create and update command objects. -
Our specification does not require redirection operations to appear in any particular place in commands. This means, for example, that
foo bar < input.txt
and
foo < input.txt bar
and
< input.txt foo bar
are all equivalent.
Converting vector
s of strings
to arrays of char*
for execv
-
You can convert a
std::string
to nul-terminatedconst char *
(likeexecv
expects, except for const-ness, see next item) using its.c_str()
method:std::string example = "something" const char *example_as_char_ptr = example.c_str();
but be careful:
- The return value of
c_str()
is pointer to the string it comes from. It is only valid as long as the string it comes from is still in scope and is not moved. (This means, for example, that calling.c_str()
on a string in a vector and then resizing the vector is not safe.)
- The return value of
-
For no good reason,
execv()
expects an array of nul-terminatedchar *
instead of an array ofconst char *
for itsargv
array. You can convert fromconst char *
usingchar *
using a cast such as:(char *) some_string.c_str()
or
const_cast<char*>(some_string.c_str())
which is safe as long as you don’t try to modifying the string through the resulting
char*
.(This casting is not needed for the
path
argument.) -
Given a
vector<char*> v
, you can get a pointer to the underlying array using something likev.data()
or
&v[0]
Provided that the last element of the vector is a NULL pointer, this should be suitable to pass to
execv
.
Running commands
-
Pseudocode for running commands is as follows:
for each command in the line { pid = fork(); if (pid == 0) { do redirection stuff execv ( command, args ); oops, print out a message about exec failing and exit } else { store pid somewhere } } for each command in the line { waitpid(stored pid, &status); check return code placed in status; }
-
To implement redirection, probably the most useful function is
dup2
, which can replace stdin (file descriptor 0) or stdout (file descriptor 1) with another file you have opened. When redirecting to a file, you will most commonly useopen()
to open the file, calldup2
to replace stdin or stdout with a copy of the newly opened file descriptor, thenclose()
the original file descriptor. This occurs typically would be done just before the call toexecve
as in the pseudocode above. -
To open a file for reading, I recommend something like:
int fd = open("filename.txt", O_RDONLY);
-
To open a file for writing, I recommend something like:
int fd = open("filename.txt", O_WRONLY | O_CREAT | O_TRUNC, 0666)
The argument
0666
specifies to allow reading and writing of the created file (subject to a configuration setting called the “umask”). TheO_CREAT
flag specifies to create the file if it doesn’t exist, andO_TRUNC
specifies to truncate the file it does exist. -
To implement pipelines, the typical technique is to call
pipe
to obtain a connected pair of file descriptors, usedup2
to assign these to stdout or stdin, and close the original file descriptor just before callingexecve
. -
To convert from a
const char**
or array ofconst char*
s to the type expected byexecv
, you can use a cast like(char**) pointer_to_first_element
.
Printing Error Messages
-
In C++, one way to print to stderr is using
cerr
, which works likecout
:#include <iostream> ... std::cerr << "Some message.\n";
-
In C or C++, one way to print to stderr is using
fprintf
:#include <stdio.h> ... fprintf(stderr, "Some message.\n");
Close-on-exec
-
As an alternative to manually
close()
ing file descriptors before callingexec*()
in a child process, you can set them as “close-on-exec” using thefcntl
library call as in this example code from the POSIX documentation:#include <unistd.h> #include <fcntl.h> ... int flags = fcntl(fd, F_GETFD); if (flags == -1) /* Handle error */; flags |= FD_CLOEXEC; if (fcntl(fd, F_SETFD, flags) == -1) /* Handle error */;"
Setting this flag won’t prevent you from having to close to the file descriptor in the parent process that does not call an
exec
function (assuming it is opened beforefork
ing).You may (or may not) find that this simpler than inserting manually
close
calls before everyexec
.
Common problems
My shell hangs
-
If pipelines hang, then a likely cause is neglecting to
close
the ends of the pipes the parent process.(
read
s on the pipe will not indicate end-of-file until all of the write ends of the pipe are closed.)
My shell stops working after I run too many commands
- A likely cause is running out of file descriptors by failing to close all file descriptors.
Sometimes execv
fails with Bad address
execv
needs a NULL pointer at the end of the array of arguments. If you did not explicitly supply one, thenexecv
may try to read past the end of your array and read a bogus pointer.
Arguments passed to commands are all copies of a single argument or otherwise corrupted
- If you are using C++’s
string
class, a most common cause of this issue is getting a pointer to achar *
usingc_str()
and allowing that pointer to become invalid. See the first item on the section on convertingvector
s ofstring
s to arrays.
General sources for documentation
-
All the Unix functions have “manpages” (short for manual pages) retrieved with the
man
command. For example, to get documentation on thepipe()
function, you can run, from within a Linux enviornment, runman pipe
The
man
command retrieves documentation both for commands and C functions. If both a command and a function exist with the same name, man by default shows information about the command. For example, there is a command calledwrite
and a function calledwrite
. Runningman write
gives only the documentation about command. There are two ways to get the documentation about the function. One is to run
man -a write
which shows the documentation for all things named “write” — it will show the documentation for the command and then for the function. Alternately, the documentation retrieved by man is divided into “sections”. You can get a list of all the entries for a word like “write” along with their sections by running
man -k write
On my system this shows
write (1) - send a message to another user write (2) - write to a file descriptor write (1posix) - write to another user write (3posix) - write on a file
The text in parenthesis are the section numbers/names. For example, you can access the write entry in section 2 using
man 2 write
Generally, Linux divides its documentation into sections as follows:
* section 1: commands intended for normal users * section 1posix: commands, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a command * section 2: "low-level" functions that usually wrap a specific system call * section 3: other functions * section 3posix: functions, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a function * section 8: commands intended for system adminstrators