Contents
Changelog:
- 27 January 2019: updated the “background command, then command-killing command” test in
shell_test.py
to be much less likely to spuriously fail due to a race condition/bad timing. If you already downloaded the skeleton, you can just download a new version ofshell_test.py
here. - 28 January 2019: say to run ‘python3 shell_test.py non-pipe’ instead of target not in Makefile. Also updated Makefile of template to include
make test-nonpipe
for any who download it late. - 28 January 2019: update
shell_test.py
to remove “empty command is invalid” test which should not have been included. - 31 January 2019: correctly note where to submit
- 6 Feburary 2019: modify the “background command, then command-killing command” test to be even more less likely to spuriously fail due to a race condition/bad timing.
Before 6 February 2019,
shell_test.py
had a test for background processes which was unintentionally flakey, you can try getting an updated version here. Before 28 Jan 2019, it also had a test requiring empty commands to be considered invalid, which you can ignore.
Your Task
- Download the skeleton code last updated 2019-01-27, which contains:
- a starter
Makefile
which has a target to build a binary calledmsh
- a source file
main.cc
which implements a prompt that prints a “Not implemented” error on any command but “exit”.
- a starter
- Modify
msh
to be a Unix-like shell that supports running commands as described below, that includes:- running simple commands (e.g.
/bin/cat foo.txt bar.txt
) - input redirection (e.g. commands like
/usr/bin/gcc -E - < somefile.txt
) - output redirection (e.g. commands like
/usr/bin/gcc -E file.cc > somefile.txt
) - supports background commands that are started without waiting for them to complete (e.g. commands like
/bin/foo argument > output.txt &
) - (not required for checkpoint) pipelines of multiple commands (e.g. commands like
/bin/cat foo.txt | /bin/grep bar | /bin/grep baz
or/bin/cat foo | /bin/grep baz > ouptut.txt &
) - supports the builtin command
exit
In addition, your shell must
- not search for executables on
PATH
; all commands will include the absolute path to the executable or a relative path from the current directory - prompt for commands using
>
- print out error messages (to stderr):
- including the text “Invalid command” if a command is malformed according to the language specification below;
- including either the text “Command not found” or “No such file or directory” if the executable does not exist;
- including some error message (we do not care which one) for errors when fork, pipe, exec, or opening a file for redirection fails.
-
output to stdout the exit status(es) of every command using lines containing the name of the executable run, optionally followed by its arguments, followed by “ exit status: “ followed by the numerical exit status of each command (if the command terminates normally; if the command exits from a signal, you must output something, but we do not care what is output).
For example, a command like
/bin/ls /foo
which exits with a status of 0 could result in outputting/bin/ls exit status: 0
or/bin/ls /foo exit status: 0
.For non-background commands, this exit status should be output before the next prompt is printed.
-
For background commands, before each prompt, the shell should output the exit statuses of all background processes which terminated since the previous prompt was output. (The exit status should be output exactly once; if the shell detects a background command terminates earlier, it should not print out its exit status until the next prompt.)
The exit status of background processes should be output after the exit status of the non-background process that was just run.
- (not required for checkpoint) for a foreground pipeline that includes multiple commands, outputs exit statuses for the parts of the pipeline in the order they appear in the command. (If a command cannot be run due to the command being malformed or the executable not existing, you may print no exit status or a non-zero exit status for the not-found or invalid command, at your option.)
- not leave any extra file descriptors open (ones other than stdin, stdout, stderr, or any file descriptors that were open before the shell started) just before it calls executes a command.
We strongly recommend creating partially working versions as described below. A full solution will be around 200 lines of code.
You may use additional source files, rename source files, switch from C++ to C, etc. so long as you modify the Makefile appropriately and still produce an executable called
msh
. - running simple commands (e.g.
-
We have supplied a
shell_test.py
program, which you can run by runningmake test
(all tests) orpython3 shell_test.py non-pipe
(excludes some pipeline related tests). This will often produce a lot of output (especially when you haven’t implemented all shell features yet), so you might try redirecting its output to a file like withmake test >test-output.txt
.The original version of the
shell_test.py
program had a flakey “background command, then command-killing command” test (it might spuriously fail due to bad luck in timing); you can download ashell_test.py
with an updated version of that test here.Note that we may use additional and/or different tests when we grade your submission.
The supplied Makefile builds with AddressSanitizer to help you detect memory errors, such as out-of-bounds accesses and memory leaks (at the cost of making your shell run a bit slower). You should expect any submission with memory errors not to get full credit (even if you disable AddressSanitizer in the Makefile you submit).
We intend to test the shell you submit on a Linux enviornment similar to the course VM. Please make sure your submitted shell will build and work in this environment.
-
Prepare a
.tar.gz
file like the one built usingmake archive
in the given Makefile. This should have all your source files and a Makefile which can produce anmsh
binary. It should not contain any object files or a pre-builtmsh
executable. - Submit the
.tar.gz
file on the -001 submission site
Specification
Shell language
Shell commands are lines which consist of a sequence of whitespace-seperated tokens. Whitespace characters are space (' '
in C or C++), form feed ('\f'
), newline ('\n'
), carriage return ('\r'
), horizontal tab ('\t'
), vertical tab ('\v'
).
Each line has a maximum length of 100 characters. We do not care what your shell does with longer input lines.
Each token is either
-
an operator, if it is
<
or>
or|
or&
, or -
a word otherwise
Each line of inputs is a pipeline, which consists of a series of a commands (possibly just one)
seperated by |
tokens, optionally followed by a &
. (For the checkpoint, you do not need to handle any commands
including |
tokens.)
If a pipeline ends with a &
, then it should be run in the background (without waiting for them to complete),
Each well-formed command consists of a series of (in any order):
-
up to one input redirection operation, which consists of a
<
token followed by a word token -
up to one output redirection operation, which consists of a
>
token followed by a word token, and -
one or more words (not part of any redirection operation), which are used to form the command for an exec system call
Any command which has a <
or >
operator not followed by a word token is malformed. Any command which has no word which is not
part of a redirection operation is malformed. Any command which has an &
not at the end of the command is malformed.
As a special case, you may optionally accept and execute no programs for a line containing no tokens, or you may treat it as a malformed
command. (If you have a version of the tests before 28 January 2019, it may include a “empty command is invalid” test, which you can ignore;
it is removed from updated versions of shell_test.py
.)
Running commands
If a command contains excess input or output redirection operations, or operators followed immediately by other operator, that is an error.
To run a pipeline, the shell runs each command in the pipeline, then waits for all the commands to terminate. You may decide what to do if one of the commands in the pipeline cannot be executed as long as your shell does not crash or otherwise stop accepting new commands.
To run a command, the shell:
-
first checks if it is one of the built-in commands listed below, and if so does something special.
-
forks off a subprocess for the command
-
if it is not the first command in the pipeline, connect its stdin (file descriptor 0) to the stdout of the previous command in the pipeline. The output should go to a pipe created by
pipe()
(seeman pipe
). (You may not, for example, create a normal temporary file, even if this sometimes works.) -
if it is not the last command in the pipeline, connect its stdout (file descriptor 1) to the stdin of the next command in the pipeline
-
if there is an output redirection operation, reopen stdout (file descriptor 1) to that file. The file should be created if it does not exist, and truncated if it does.
-
if there is an input redirection operation, reopen stdin (file descriptor 0) from that file
-
uses the first word (ignoring words in redirection operations) as the name of the executable run, and
After running each command in the pipeline, unless it is a background pipeline, the shell should wait for each command and print out their exit statuses.
Built-in command
Your shell should support the built-in command exit
. When this command is run your shell should terminate normally (with exit status 0
). You should not run a program called exit
, even if one exists.
Handling errors
Your shell should print out error messages to stderr (file descriptor 2).
You must use the following error messages:
- If an executable does not exist, print an error message containing “Command not found” or “No such file or directory”. You may include other text in the error message (perhaps the name of the executable the user was trying to run). Note that “No such file or directory” is what
perror
orstrerror
will output for anerrno
value of ENOENT. (See their manpages by runningman perror
orman strerror
.) - If a command is malformed according to the language above, print an error message containing “Invalid command”
You must also print out an error message if:
exec
fails for another reason (e.g. executable found but not executable)fork
orpipe
fail for any reason. Your program must not crash in this case.open
ing a input or output redirected file fails for any reason.
If multiple commands in a pipeline fail, you must print out error messages, but it may be the case that commands in the pipeline are executed even though error messages are printed out (e.g. it’s okay if something_that_does_not_exist | something_real
prints an error about something_that_does_not_exist
after starting something_real
.)
Hints
Recommended order of operations
-
Implement and test parsing commands into whitespace-separated tokens. Collect the tokens into an array or vector to make future steps easier.
-
Implement and test running commands without pipelines or redirection. In this case the list of tokens you have will be exactly the arguments to pass to the command.
-
Add support for redirection.
-
Add support for pipelines.
Parsing
-
In C++, you can read a full line of input with
std::getline
. In C, you can read a full line of input with ``fgets`. -
In C++, one way to divide a line of input into tokens is by using
std::istringstream
likestd::istringstream s(the_string); while (s >> token) { processToken(token); }
-
In C, one way to divide a line of input into tokens is by using
strsep
likechar *p = the_string; char *token; while ((token = strsep(&p, " \t\v\f\r\n")) != NULL) { ... }
Note that
strsep
changes the string passed to it. -
My reference implementation creates a class to represent each command in a pipeline (
|
-seperated list of things to run), and a class to represent the pipeline as a whole. I first collect each command line into a vector of tokens, then iterate through that vector to create and update command objects.
Running commands
-
Pseudocode for running comands is as follows:
for each command in the line { pid = fork(); if (pid == 0) { do redirection stuff execv ( command, args ); oops, print out a message about exec failing and exit } else { store pid somewhere } } for each command in the line { waitpid(stored pid, &status); check return code placed in status; }
-
To implement redirection, probably the most useful function is
dup2
, which can replace stdin (file descriptor 0) or stdout (file descriptor 1) with another file you have opened. When redirecting to a file, you will most commonly useopen()
to open the file, calldup2
to replace stdin or stdout with a copy of the newly opened file descriptor, thenclose()
the original file descriptor. This occurs typically would be done just before the call toexecve
as in the pseudocode above. -
To implement pipelines, the typical technique is to call
pipe
to obtain a connected pair of file descriptors, usedup2
to assign these to stdout or stdin, and close the original file descriptor just before callingexecve
. -
To convert from a
const char**
or array ofconst char*
s to the type expected byexecv
, you can use a cast like(char**) pointer_to_first_element
. -
To implement background commands, it is probably easiest to take advantage of
waitpid
supporting an “options” argument that allows you to use waitpid to check if a process has finished, without actually waiting for it if it has not.
Printing Error Messages
-
In C++, one way to print to stderr is using
cerr
, which works likecout
:#include <iostream> ... std::cerr << "Some message.\n";
-
In C or C++, one way to print to stderr is using
fprintf
:#include <stdio.h> ... fprintf(stderr, "Some message.\n");
Common problems
My shell hangs
-
If pipelines hang, then a likely cause is neglecting to
close
the ends of the pipes the parent process.(
read
s on the pipe will not indicate end-of-file until all of the write ends of the pipe are closed.)
My shell stops working after I run too many commands
- A likely cause is running out of file descriptors by failing to close all file descriptors.
General sources for documentation
-
All the Unix functions have “manpages” (short for manual pages) retrieved with the
man
command. For example, to get documentation on thepipe()
function, you can run, from within a Linux enviornment, runman pipe
The
man
command retrieves documentation both for commands and C functions. If both a command and a function exist with the same name, man by default shows information about the command. For example, there is a command calledwrite
and a function calledwrite
. Runningman write
gives only the documentation about command. There are two ways to get the documentation about the function. One is to run
man -a write
which shows the documentation for all things named “write” — it will show the documentation for the command and then for the function. Alternately, the documentation retrieved by man is divided into “sections”. You can get a list of all the entries for a word like “write” along with their sections by running
man -k write
On my system this shows
write (1) - send a message to another user write (2) - write to a file descriptor write (1posix) - write to another user write (3posix) - write on a file
The text in parenthesis are the section numbers/names. For example, you can access the write entry in section 2 using
man 2 write
Generally, Linux divides its documentation into sections as follows:
* section 1: commands intended for normal users * section 1posix: commands, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a command * section 2: "low-level" functions that usually wrap a specific system call * section 3: other functions * section 3posix: functions, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a function * section 8: commands intended for system adminstrators