Changelog:
- 9 Oct 2023: do not omit trailing newline in example output from parallelgetoutput (though it is fine if your implementation strips it)
- 11 Oct 2023: mention testing 0 + lots of arguments for
parallelgetoutput
- 13 Oct 2023: consistently call test program in example
args.py
, not bothargs.py
andfoo.py
Write and submit a single file
fork_run.c
that defines the following two functions, specified below:char *getoutput(const char *command)
char *parallelgetoutput(int count, const char **argv_base)
You should not submit any other files (i.e., no
.h
files,Makefile
, etc). You may include helper functions in the file, but it should not containmain
.
1
char *getoutput(const char *command)
This should behave something like system
, except that
instead of letting the child print to stdout, it should collect what the
child prints and return it as a \0
-terminated
malloc
-allocated char *
. If the command’s
output includes a \0
itself, it should return your choice
of the output up to the first \0
or the entire output (with
an additional terminating \0
).
I am not aware of a standard library function that does this, but if you find one do not use it; do this by forking, execing, and piping yourself.
We will supply an example implementation of my_system
from the prior fork lab as a potential starting point sometime after the
late deadline for that lab on Canvas under the Files
tab.
The following main function
int main() {
("Hi!\n");
printf("Text: [[[%s]]]\n",
printf("echo 1 2 3; sleep 2; echo 5 5"));
getoutput("Bye!\n");
printf}
Should print
Hi!
then wait for 2 seconds before printing
Text: [[[1 2 3
5 5
]]]
Bye!
Note that this main
also has a memory leak:
my_getoutput
invokes malloc
and
main
is not invoking free
.
To do this, use the following outline.
Create a pipe. A pipe looks like a pair of file decriptors, one opened for reading and the other for writing, and is a tool used extensively to help processes talk to each other. See
man 2 pipe
for details.Make sure you invoke
pipe
before you invokefork
so that both processes have access to the same pipe.In the child,
- replace stdout with the write-end of the pipe. The
dup2
command is used for this, copying one file descriptor with a new number. You want to copy the write-end of the pipe to1
, stdout. - close both of the pipe file descriptors. You don’t need the read end in the child, and the write-end is now duplicated as stdout.
- exec, etc, as you did for
my_system
.
- replace stdout with the write-end of the pipe. The
In the parent,
- close the write end of the pipe – only the child needs that.
- read all contents from the read end of the pipe,
malloc
ing enough space to store it all. - close the read end of the pipe when you are done reading.
waitpid
on the child after reading everything. If you wait before reading everything, then you are relying on the OS buffering the program’s output for you. This will work when the program’s output is not too long, but when the program has a lot of output, the program will wait for more buffer space to be available. Since if you waited first, you wouldn’t be reading to help free up some buffer space, the program will hang.
You may assume that the command exists and executes normally; no need to add any error-handling logic.
As a tip, one of the easiest ways to read everything there is to read
is to use getdelim
with the delimiter
'\0'
.
getdelim
wants a FILE *
,
not a file descriptor; see fdopen
for
how to wrap a file descriptor in a FILE *
.
2
char *parallelgetoutput(int count, const char **argv_base)
Run count
child processes simulatenously and collect
their output into a single string, returning only after all the child
processes have finished. Each of the child processes should run a
command specified by the NULL-pointer-terminated array
argv_base
as follows:
- the executable run shall be
argv_base[0]
- the arguments (
argv
value) for the program run shall be the elements ofargv_base
followed by the 0-based index of the child process converted to a string
The output collected may interleave the outputs of the child processes (and an implementation that does this is simpler than one that does not).
[added 9 Oct]: You may optionally strip a trailing newline from the output; in the examples below we do not do this (but we did not do this in the first version of this writeup).
In order to ensure that the processes run in parallel, if they are writing to a pipe, you need to be reading from that pipe to prevent them from hanging (if they write too much). With multiple processes, the easiest way of doing this would be to use a single pipe for all the programs, which will result in their output being interleaved as if you ran them simulatenously in a terminal.
You may assume the executable argv_base[0]
exists and
executes normally and that we supply the full path of the
executable.
Before returning, parallelgetoutput
must
waitpid
for each child process.
For example, a main()
like:
int main() {
const char *argv_base[] = {
"/bin/echo", "running", NULL
};
const char *output = parallelgetoutput(2, argv_base);
printf("Text: [%s]\n", output);
}
would start two child processes. One of them would run something equivalent to:
const char *argv[] = {"/bin/echo", "running", "0", NULL};
execv("/bin/echo", argv);
And another would run something equivalent to:
const char *argv[] = {"/bin/echo", "running", "1", NULL};
execv("/bin/echo", argv);
Then, it would wait for both child processes to finish and collect
their combined output into a single string output
. On a
system with a /bin/echo
program like exists in portal, the
output would probably be either:
Text: [running 0
running 1
]
or:
Text: [running 1
running 0
]
(but on some systems, maybe other interleaved outputs would be possible like:
Text: [running running 1
0
]
)
Like with the getoutput
example, the program above has a
memory leak; output
is dynamically allocated and never
free
d.
You can test more directly that your parallelgetoutput passes the correct arguments by writing a program that checks the arguments it gets. For example, if you have the following python program:
#!/usr/bin/python3
import sys
print("args =",sys.argv)
prints out its arguments. (You could also write and compile a similar
C program.) If you save this a file called args.py
and make that
file executable, then a main() like:
int main() {
const char *argv_base[] = { "./args.py", "first", "second", "third", NULL };
const char *output = parallelgetoutput(3, argv_base);
printf("Text: [%s]\n", output);
}
should output something like:
Text: [args = ['./args.py', 'first', 'second', 'third', '0']
args = ['./args.py', 'first', 'second', 'third', '1']
args = ['./args.py', 'first', 'second', 'third', '2']
]
but possibly with the lines in a different order and/or interleaved.
3 Hints
3.1 Catching memory errors reliably
Memory errors such as using uninitialized memory unintentionally are a common problem on this assignment. To help avoid these problems, I recommend testing by compiling and linking using the flags:
-fsanitize=address -ftrivial-auto-var-init=pattern -Og -g
-fsanitize=address
will enable AddressSanitizer, which checks for accessing out-of-bounds mmeory;-ftrivial-auto-var-init=pattern
will initialize local variables with a pattern that should more consistently trigger errors (rather than oftenworking
by accident due to leftover values on the stack). If you are using GCC, note that that this requires GCC version at least 12 (available on portal/NX viagcc-12
or viamodule load gcc
). (It’s been supported in Clang for much longer.)-Og
enables compiler optimizations that should work well with debugging-g
enables debugging information. (In addition to being useful when using the debugger, this should allow AddressSanitizer to give better information about errors it finds).
3.2 fork lab code
You may find it useful to consult your code for the fork lab. If you did not complete it, an example solution will be available on Collab, under the resources tab.
3.3 Testing that you
waitpid
If your getoutput
and parallelgetoutput
call waitpid
properly, then after they return running
waitpid(-1, NULL, 0)
should return -1
and set errno
to ECHILD
(indicating that there are no child processes to wait for).
3.4 On reading all data from a pipe
The operating system maintains a limited amount of space to hold values which have been written to a pipe but not read yet. When it runs out of space, writes to the pipe will hang (until space is freed up by reading from the pipe). For this reason, you should not wait for one of the child processes to terminate before reading their output — otherwise, they might never terminate.
Reading from a pipe will result indicate end-of-file if the all the file descriptors referencing the write end of the pipe are closed. If the child processes have exited, then exiting will, as a side effect, close all their file descriptors. So, if you setup the pipes so the write end is only open in the child processes, then you can read the pipe until it indicates end-of-file.
For
parallelgetoutput
, I would strongly recommend using just one pipe. Since you need to read from a pipe to preevent processes producing a lot of output from hanging, if you have multiple pipes and read from then one at a time, it is very unlikely you’ll actually run them in parallel in general.
3.5 Some corner cases to test
I would recommend testing:
parallelgetoutput
with 0 arguments after the program name or with a bunch of arguments after the program name- programs that output a lot (tens of kilobytes) at least.
- programs that output something, then wait, then output another
thing, then wait, etc. If you’re writing programs to simulate this, I
recommend using
fflush()
or similar to make sure the output is sent when you expect it.