Changelog:

7 September 2018: Clarify that we may use tests that aren’t in the supplied test program when grading.
7 September 2018: Note that we will test your shell on a Linux environment similar to the course VM.
8 September 2018: Remove mention of “the checkpoint”
8 September 2018: Change jipeline example in “Your Task” to use absolute paths.
8 September 2018: Update supplied shell_test.py to fix bug in “redirect output truncates file” test case.
10 September 2018: Update supplied shell_test.py to fix bug in how output was compared for test cases where not all output was supplied. Also make output for test cases with setup code a little less confusing and make exit status checking tests consistently allow extra text after the exit status.
10 September 2018 08:25: Update shell_test.py to avoid some false test failures where command prompts could be confused with program outputs.
11 September 2018: replace execve with execv to be more consistent with slides.
11 September 2018 20:30: Incorrectly update shell_test.py to deal better with hangs.
11 September 2018 20:47: Correctly update shell_test.py to deal better with hangs.
12 September 2018: “by” -> “by or as if by” for increased clarity
12 September 2018: fix a cases where shell_test.py would get confused by a broken shell and crash, and make “redirect output then use normal input” test less sensitive.
12 September 2018: add hint about converting to char ** when running execv
13 September 2018: clarify first word – first word (not in redirection operation)
13 September 2018 21:20: provide updated version of shell_test.py that is more robust to unexpected non-UTF-8 output.
13 September 2018: provide updated version of shell_test.py which doesn’t care how many times invalid command is printed for a sequence contaiing a pipe.
16 September 2018: clarify that seperated by | means seperated by a | token

Note: Before 2018-09-10 07:30, the shell_test.py included with the skeleton had a bug where some tests that only specified partial expected output (particular to stderr) would sometimes pass if there was only one line of output to compare to.

Later on we fixed some other issues with the shell testing script (see above changelog).

You can update just shell_test.py to get the latest verison of the testing script.

Your Task

Download the skeleton code last updated 2018-09-11 20:47 (see note above), which contains:
- a starter Makefile which has a target to build a binary called msh
- a source file main.cc which implements a prompt that prints a “Not implemented” error on any command but “exit”.
Modify msh to be a Unix-like shell that supports running commands as described below, that includes:
- running simple commands (e.g. /bin/cat foo.txt bar.txt)
- input redirection (e.g. commands like /usr/bin/gcc -E - < somefile.txt)
- output redirection (e.g. commands like /usr/bin/gcc -E file.cc > somefile.txt)
- supports the builtin command exit In addition, your shell must
- not search for executables on PATH; all commands will include the absolute path to the executable or a relative path from the current directory
- prompt for commands using >
- print out error messages (to stderr):
  - including the text “Invalid command” if a command is malformed according to the language specification below;
  - including either the text “Command not found” or “No such file or directory” if the executable does not exist;
  - including some error message (we do not care which one) for errors when fork, pipe, exec, or opening a file for redirection fails.
- implement pipelines (e.g. commands like /bin/cat foo.txt | /bin/grep bar | /bin/grep baz)
- output to stdout the exit status(es) of every command using lines containing “exit status: “ followed by the numerical exit status of each command (if the command terminates normally; if the command exits from a signal, we do not care what is output). For a pipeline that includes multiple commands, output exit statuses for the parts of the pipeline in the order they appear in the command.
- not leave any extra file descriptors open (ones other than stdin, stdout, stderr) just before it calls executes a command.
We strongly recommend creating partially working versions as described below. A full solution will be around 200 lines of code.

You may use additional source files, rename source files, switch from C++ to C, etc. so long as you modify the Makefile appropriately and still produce an executable called msh.
We have supplied a shell_test.py program, which you can run by running make test. This will often produce a lot of output (especially when you haven’t implemented all shell features yet), so you might try redirecting its output to a file like with make test >test-output.txt.

Note that we may use additional and/or different tests when we grade your submission.

We intend to test the shell you submit on a Linux enviornment similar to the course VM. Please make sure your submitted shell will build and work in this environment.
Prepare a .tar.gz file like the one built using make archive in the given Makefile. This should have all your source files and a Makefile which can produce an msh binary. It should not contain any object files or a pre-built msh executable.
Submit the .tar.gz file on Collab.

Specification

Shell language

Shell commands are lines which consist of a sequence of whitespace-seperated tokens. Whitespace characters are space (' ' in C or C++), form feed ('\f'), newline ('\n'), carriage return ('\r'), horizontal tab ('\t'), vertical tab ('\v').

Each line has a maximum length of 100 characters. We do not care what your shell does with longer input lines.

Each token is either

an operator, if it is < or > or |, or
a word otherwise

Each line of inputs is a pipeline, which consists of a series of a commands (possibly just one) seperated by | tokens.

Each command consists of a series of:

one or more words, which are used to form the command for an exec system call
up to one input redirection operation, which consists of a < token followed by a word token
up to one output redirection operation, which consists of a > token followed by a word token

Running commands

If a command contains excess input or output redirection operations, or operators followed immediately by other operator, that is an error.

To run a pipeline, the shell runs each command in the pipeline, then waits for all the commands to terminate. You may decide what to do if one of the commands in the pipeline cannot be executed as long as your shell does not crash or otherwise stop accepting new commands.

To run a command, the shell:

first checks if it is one of the built-in commands listed below, and if so does something special.
forks off a subprocess for the command
if it is not the first command in the pipeline, connect its stdin (file descriptor 0) to the stdout of the previous command in the pipeline. The connection should be created by or as if by pipe() (see man pipe). (You may not, for example, create a normal temporary file, even if this sometimes works.)
if it is not the last command in the pipeline, connect its stdout (file descriptor 1) to the stdin of the next command in the pipeline
if there is an output redirection operation, reopen stdout (file descriptor 1) to that file. The file should be created if it does not exist, and truncated if it does.
if there is an input redirection operation, reopen stdin (file descriptor 0) from that file
uses the first word (ignoring words in redirection operations) as the name of the executable run.

Built-in command

Your shell should support the built-in command exit. When this command is run your shell should terminate normally (with exit status 0). You should not run a program called exit, even if one exists.

Handling errors

Your shell should print out error messages to stderr (file descriptor 2).

You must use the following error messages:

If an executable does not exist, print an error message containing “Command not found” or “No such file or directory”. You may include other text in the error message (perhaps the name of the executable the user was trying to run). Note that “No such file or directory” is what perror or strerror will output for an errno value of ENOENT. (See their manpages by running man perror or man strerror.)
If a command is malformed according to the language above, print an error message containing “Invalid command”

You must also print out an error message if:

exec fails for another reason (e.g. executable found but not executable)
fork or pipe fail for any reason. Your program must not crash in this case.
opening a input or output redirected file fails for any reason.

If multiple commands in a pipeline fail, you must print out error messages, but it may be the case that commands in the pipeline are executed even though error messages are printed out (e.g. it’s okay if something_that_does_not_exist | something_real prints an error about something_that_does_not_exist after starting something_real.)

Hints

Recommended order of operations

Implement and test parsing commands into whitespace-separated tokens. Collect the tokens into an array or vector to make future steps easier.
Implement and test running commands without pipelines or redirection. In this case the list of tokens you have will be exactly the arguments to pass to the command.
Add support for redirection.
Add support for pipelines.

Parsing

In C++, you can read a full line of input with std::getline. In C, you can read a full line of input with ``fgets`.
In C++, one way to divide a line of input into tokens is by using std::istringstream like
```
std::istringstream s(the_string);
while (s >> token) {
    processToken(token);
}
```
In C, one way to divide a line of input into tokens is by using strsep like
```
char *p = the_string;
char *token;
while ((token = strsep(&p, " \t\v\f\r\n")) != NULL) {
    ...
}
```
Note that strsep changes the string passed to it.
My reference implementation creates a class to represent each command in a pipeline (|-seperated list of things to run), and a class to represent the pipeline as a whole. I first collect each command line into a vector of tokens, then iterate through that vector to create and update command objects.

Running commands

Pseudocode for running comands is as follows:

for each command in the line {
    pid = fork();
    if (pid == 0) {
        do redirection stuff
        execv ( command, args );
        oops, print out a message about exec failing and exit
    } else {
        store pid somewhere
    }
}
for each command in the line {
    waitpid(stored pid, &status);
    check return code placed in status;
}

To implement redirection, probably the most useful function is dup2, which can replace stdin (file descriptor 0) or stdout (file descriptor 1) with another file you have opened. When redirecting to a file, you will most commonly use open() to open the file, call dup2 to replace stdin or stdout with a copy of the newly opened file descriptor, then close() the original file descriptor. This occurs typically would be done just before the call to execve as in the pseudocode above.
To implement pipelines, the typical technique is to call pipe to obtain a connected pair of file descriptors, use dup2 to assign these to stdout or stdin, and close the original file descriptor just before calling execve.
To convert from a const char** or array of const char*s to the type expected by execv, you can use a cast like (char**) pointer_to_first_element.

Printing Error Messages

In C++, one way to print to stderr is using cerr, which works like cout:
```
#include <iostream>
...
std::cerr << "Some message.\n";
```

In C or C++, one way to print to stderr is using fprintf:

#include <stdio.h>
...
fprintf(stderr, "Some message.\n");

Common problems

My shell hangs

If pipelines hang, then a likely cause is neglecting to close the ends of the pipes the parent process.

(reads on the pipe will not indicate end-of-file until all of the write ends of the pipe are closed.)

My shell stops working after I run too many commands

A likely cause is running out of file descriptors by failing to close all file descriptors.

General sources for documentation

All the Unix functions have “manpages” (short for manual pages) retrieved with the man command. For example, to get documentation on the pipe() function, you can run, from within a Linux enviornment, run
```
man pipe
```
The man command retrieves documentation both for commands and C functions. If both a command and a function exist with the same name, man by default shows information about the command. For example, there is a command called write and a function called write. Running
```
man write
```
gives only the documentation about command. There are two ways to get the documentation about the function. One is to run
```
man -a write
```
which shows the documentation for all things named “write” — it will show the documentation for the command and then for the function. Alternately, the documentation retrieved by man is divided into “sections”. You can get a list of all the entries for a word like “write” along with their sections by running
```
man -k write
```
On my system this shows
```
write (1)            - send a message to another user
write (2)            - write to a file descriptor
write (1posix)       - write to another user
write (3posix)       - write on a file
```
The text in parenthesis are the section numbers/names. For example, you can access the write entry in section 2 using
```
man 2 write
```
Generally, Linux divides its documentation into sections as follows:
```
*  section 1: commands intended for normal users
*  section 1posix: commands, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a command
*  section 2: "low-level" functions that usually wrap a specific system call
*  section 3: other functions
*  section 3posix: functions, but shows the POSIX standard's description (things that should be the same on all Unix-like OSs) rather than Linux-specific information about a function
*  section 8: commands intended for system adminstrators
```

Contents

Your Task

Specification

Shell language

Running commands

Built-in command

Handling errors

Hints

Recommended order of operations

Parsing

Running commands

Printing Error Messages

Common problems

My shell hangs

My shell stops working after I run too many commands

General sources for documentation