1 Your task
Create C files as follows:
in
split.c
, write an implementation of the functionchar **string_split(const char *input, const char *sep, int *num_words);
You can see examples of what this function should do below.
The function takes a
\0
-terminated stringinput
and a\0
-terminated string listing separating characters insep
.The function interprets
input
into a sequence ofwords
(which it returns as described in the next paragraph). These words are separated by one or more of the seperating characters insep
.input
should be considered to always start and end with a word. If theinput
starts or ends with a separating charcter, this word will be empty (zero-length). Empty words must not be generated in any other circumstance.The function should return the words using a dynamically allocated array of
\0
-terminated strings. It must possible to free the array by callingfree()
on each element of the array and the array itself. The function must store the length of the array in*num_words
.In addition:
- The function may not modify the input strings.
- The function may not leak memory. (When the function returns, the only newly allocated memory should be pointed to by the value returned.)
- We do not care what your function does if
input
is an empty string or ifsep
is an empty string. - We do not care what your function does if allocating memory fails (provided that your function does not try to allocate a huge amount of memory relative to the size of its arguments)
Your implementation may use additional utility functions, provided those are also in
split.c
(so we can easily test yoursplit.c
with our ownmain.c
). It may also use library functions that are part of the C standard or the POSIX standard. This includes most functions you may be familiar with declared in<string.h>
and<stdio.h>
, with the notable exceptions ofstrsep
,index
, andrindex
. You can determine if a function meets this criteria by looking for- the
CONFORMING TO
section of its manpage listing some verison ofPOSIX
or C89 or C99 or C11 (or multiple of these); or - it appearing in this list of functions from the 2018 POSIX specification
in
split.h
, write an appropriate header file declaringstring_split()
and include#include
guards to protect against multiple inclusion.in
main.c
, write amain()
function that:constructs a value for
sep
by the command line arguments, or, if no command line arguments are provided, uses" \t"
. When there are command-line arguments, the value ofsep
should be the result of concatenating all the command line arguments. For example, if themain()
is compiled into an executablesplit
, then running./split a b c
or
./split abc
or
./split ab c
should choose a sep value of
abc
.
reads lines of input (without prompting). For each line, it should:
- exit if the line is one period (
.
) with no other text or whitespace; - otherwise call
string_split
with that input and the chosensep
value, then print out the resulting array with each word surrounded by square brackets ([
and]
) and without spaces between words, followed by a newline, then frees the resulting array.
- exit if the line is one period (
Your program must support lines of input of at least 4000 bytes. If an input line is more than 4000 bytes, your program may not access out of bounds memory (such as might cause a segfault), but otherwise we do not care what it does.
We intend to test your
main.c
with our ownsplit.c
, so please put any functions other thanstring_split
that it needs withinmain.c
. Your implementation may use functions defined in the C or POSIX standards.(Some examples of expected transcripts are shown below.)
Create a Makefile such that typing
make
will build (if necessary)split.o
,main.o
and link them into an executablesplit
.Make sure your C files do not produce any warnings when compiled with
-D_XOPEN_SOURCE=700 -Og -g -std=c11 -Wall -pedantic
(assuming GCC or Clang). (It is okay if your Makefile uses different options.)Make sure your code does not have memory leaks or errors. We will test your code with AddressSanitizer enabled to help check for this (which can be enabled by compiling and linking with
-fsanitize=address
).We may use automated tests to assess your submission. Your solution may not hard-code the solution for any of the test cases or intentionally interfere with the testing environment. (By
hard-code the solution
, we mean code that looks for our specific test inputs (such as checking if the input contains theXfooX
because one of the test cases isXfooXbar
) rather than the more general situation being tested (such checking for separator characters at the beginning of the input); it’s not a prohibition on having less specific special cases.)Submit your solution to the submission site
2 Examples
2.1 string_split
Running
char **result; int size = ANY_VALUE; result = string_split("foo", ":", &size);
should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 1); result[0] = malloc(4); strcpy(result[0], "foo"); size = 1;
Running
char **result; int size; result = string_split("foo:bar:quux", ":", &size);
or
char **result; int size; result = string_split("foo:bar!quux", "!:", &size);
or
char **result; int size; result = string_split("foo:bar!quux", ":!", &size);
should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 3); result[0] = malloc(4); strcpy(result[0], "foo"); result[1] = malloc(4); strcpy(result[0], "bar"); result[2] = malloc(5); strcpy(result[0], "quux"); size = 3;
Running
char **result; int size; result = string_split(":foo!:bar::quux!", ":!", &size);
should have the same effect as
char **result; int size; result = calloc(sizeof(char *), 5) result[0] = malloc(1); strcpy(result[0], ""); result[1] = malloc(4); strcpy(result[1], "foo"); result[2] = malloc(4); strcpy(result[2], "bar"); result[3] = malloc(5); strcpy(result[3], "quux"); result[4] = malloc(1); strcpy(result[4], ""); size = 5;
2.2 split
executable
In the following example transcripts bold represents
input typed in and $
represents the shell’s prompt:
2.2.1 Example 1
$ ./split
foo bar baz
[foo][bar][baz]
quux-no-space quux-with space !
[quux-no-space][quux-with][space][!]
indented
[][indented]
.
$
2.2.2 Example 2
$ ./split XY Z
fooXXXXbarZXYXYXZbazYYYYY
[foo][bar][baz][]
XXXXXXXXXXXXX
[][]
X.X
[][.][]
.
$
3 Hints
I found the C standard library functions
strspn
andstrcspn
useful in my solution.To avoid scanning the string multiple times, you can use
realloc
to change the size of a dynamically allocated array. (But note that you might need to initialize space made byrealloc
— it might not default to 0/NULL.)You may find referencing the CSO1 labs on using C helpful, see for example labs 9 through 12 of the Spring 2023 offering of CSO1
In addition to the problems caught by AddressSanitizer, you might have memory errors that arise from assuming local variables are initialized in a particular way even though you don’t set them explicitly. With recent versions of GCC and with clang you can use the
-ftrivial-auto-var-init=pattern
flag to help catch this — this will make the compiler initialize any uninitialized local variables to a pattern likely to cause issues if it is used.