Changelog:
- 31 Mar 2023: add some hints regarding reading all data from a pipe
- 2 Apr 2023: be more explicit that
parallelgetoutput
requires reading from all processes at once and that therefore most likely the output will be interleaved - 2 Apr 2023: specify that
my_system
implementation is on resources tab on collab - 3 Apr 2023: include required NULL on execv’s in examples
Write and submit a single file
fork_run.c
that defines the following two functions, specified below:char *getoutput(const char *command)
char *parallelgetoutput(int count, const char **argv_base)
You should not submit any other files (i.e., no
.h
files,Makefile
, etc). You may include helper functions in the file, but it should not containmain
.
1
char *getoutput(const char *command)
This should behave something like system
, except that
instead of letting the child print to stdout, it should collect what the
child prints and return it as a \0
-terminated
malloc
-allocated char *
. If the command’s
output includes a \0
itself, it should return your choice
of the output up to the first \0
or the entire output (with
an additional terminating \0
).
I am not aware of a standard library function that does this, but if you find one do not use it; do this by forking, execing, and piping yourself.
We will supply an example implementation of my_system
from the prior fork lab as a potential starting point sometime after the
late deadline for that lab on the resources tab on Collab.
The following main function
int main() {
("Hi!\n");
printf("Text: [[[%s]]]\n",
printf("echo 1 2 3; sleep 2; echo 5 5"));
getoutput("Bye!\n");
printf}
Should print
Hi!
then wait for 2 seconds before printing
Text: [[[1 2 3
5 5
]]]
Bye!
Note that this main
also has a memory leak:
my_getoutput
invokes malloc
and
main
is not invoking free
.
To do this, use the following outline.
Create a pipe. A pipe looks like a pair of file decriptors, one opened for reading and the other for writing, and is a tool used extensively to help processes talk to each other. See
man 2 pipe
for details.Make sure you invoke
pipe
before you invokefork
so that both processes have access to the same pipe.In the child,
- replace stdout with the write-end of the pipe. The
dup2
command is used for this, copying one file descriptor with a new number. You want to copy the write-end of the pipe to1
, stdout. - close both of the pipe file descriptors. You don’t need the read end in the child, and the write-end is now duplicated as stdout.
- exec, etc, as you did for
my_system
.
- replace stdout with the write-end of the pipe. The
In the parent,
- close the write end of the pipe – only the child needs that.
- read all contents from the read end of the pipe,
malloc
ing enough space to store it all. - close the read end of the pipe when you are done reading.
waitpid
on the child after reading everything. If you wait before reading everything, then you are relying on the OS buffering the program’s output for you. This will work when the program’s output is not too long, but when the program has a lot of output, the program will wait for more buffer space to be available. Since if you waited first, you wouldn’t be reading to help free up some buffer space, the program will hang.
You may assume that the command exists and executes normally; no need to add any error-handling logic.
As a tip, one of the easiest ways to read everything there is to read
is to use getdelim
with the delimiter
'\0'
.
getdelim
wants a FILE *
,
not a file descriptor; see fdopen
for
how to wrap a file descriptor in a FILE *
.
2
char *parallelgetoutput(int count, const char **argv_base)
Run count
child processes simulatenously and collect
their output into a single string, returning only after all the child
processes have finished. Each of the child processes should run a
command specified by the NULL-pointer-terminated array
argv_base
as follows:
- the executable run shall be
argv_base[0]
- the arguments (
argv
value) for the program run shall be the elements ofargv_base
followed by the 0-based index of the child process converted to a string
The output collected may interleave the output from each of the child processes. [Added 2 April:] In order to ensure that the processes run in parallel, if they are writing to a pipe, you need to be reading from that pipe to prevent them from hanging (if they write too much). With multiple processes, the easiest way of doing this would be to use a single pipe for all the programs, which will result in their output being interleaved as if you ran them simulatenously in a terminal.
You may assume the executable argv_base[0]
exists and
executes normally and that we supply the full path of the
executable.
Before returning, parallelgetoutput
must
waitpid
for each child process.
For example, a main()
like:
int main() {
const char *argv_base[] = {
"/bin/echo", "running", NULL
};
const char *output = parallelgetoutput(2, argv_base);
printf("Text: [%s]\n", output);
}
would start two child processes. One of them would run something equivalent to:
const char *argv[] = {"/bin/echo", "running", "0", NULL};
execv("/bin/echo", argv);
And another would run something equivalent to:
const char *argv[] = {"/bin/echo", "running", "1", NULL};
execv("/bin/echo", argv);
Then, it would wait for both child processes to finish and collect
their combined output into a single string output
. On a
system with a /bin/echo
program like exists in portal, the
output would probably be either:
Text: [running 0
running 1]
or:
Text: [running 1
running 0]
(but on some systems, maybe other interleaved outputs would be possible like:
Text: [running running 1
0]
)
Like with the getoutput
example, the program above has a
memory leak; output
is dynamically allocated and never
free
d.
3 Hints
3.1 fork lab code
You may find it useful to consult your code for the fork lab. If you did not complete it, an example solution is available on Collab, under the resources tab.
3.2 Testing that you
waitpid
If your getoutput
and parallelgetoutput
call waitpid
properly, then after they return running
waitpid(-1, NULL, 0)
should return -1
and set errno
to ECHILD
(indicating that there are no child processes to wait for).
3.3 On reading all data from a pipe
The operating system maintains a limited amount of space to hold values which have been written to a pipe but not read yet. When it runs out of space, writes to the pipe will hang (until space is freed up by reading from the pipe). For this reason, you should not wait for one of the child processes to terminate before reading their output — otherwise, they might never terminate.
Reading from a pipe will result indicate end-of-file if the all the file descriptors referencing the write end of the pipe are closed. If the child processes have exited, then exiting will, as a side effect, close all their file descriptors. So, if you setup the pipes so the write end is only open in the child processes, then you can read the pipe until it indicates end-of-file.
For
parallelgetoutput
, I would strongly recommend using just one pipe. Since you need to read from a pipe to preevent processes producing a lot of output from hanging, if you have multiple pipes and read from then one at a time, it is very unlikely you’ll actually run them in parallel in general.
3.4 Some corner cases to test
I would recommend testing:
- programs that output a lot (tens of kilobytes) at least.
- programs that output something, then wait, then output another
thing, then wait, etc. If you’re writing programs to simulate this, I
recommend using
fflush()
or similar to make sure the output is sent when you expect it.