1 Your task

  1. Create C files as follows:

    • in split.c, write an implementation of the function

       char **string_split(const char *input, const char *sep, int *num_words);

      You can see examples of what this function should do below.

      The function takes a \0-terminated string input and a \0-terminated string listing separating characters in sep.

      The function interprets input into a sequence of words (which it returns as described in the next paragraph). These words are separated by one or more of the seperating characters in sep. input should be considered to always start and end with a word. If the input starts or ends with a separating charcter, this word will be empty (zero-length). Empty words must not be generated in any other circumstance.

      The function should return the words using a dynamically allocated array of \0-terminated strings. It must possible to free the array by calling free() on each element of the array and the array itself. The function must store the length of the array in *num_words.

      In addition:

      • The function may not modify the input strings.
      • The function may not leak memory. (When the function returns, the only newly allocated memory should be pointed to by the value returned.)
      • We do not care what your function does if input is an empty string or if sep is an empty string.
      • We do not care what your function does if allocating memory fails (provided that your function does not try to allocate a huge amount of memory relative to the size of its arguments)

      Your implementation may use additional utility functions, provided those are also in split.c (so we can easily test your split.c with our own main.c). It may also use library functions that are part of the C standard or the POSIX standard. This includes most functions you may be familiar with declared in <string.h> and <stdio.h>, with the notable exceptions of strsep, index, and rindex. You can determine if a function meets this criteria by looking for

      • the CONFORMING TO section of its manpage listing some verison of POSIX or C89 or C99 or C11 (or multiple of these); or
      • it appearing in this list of functions from the 2018 POSIX specification
    • in split.h, write an appropriate header file declaring string_split() and include #include guards to protect against multiple inclusion.

    • in main.c, write a main() function that:

      • constructs a value for sep by the command line arguments, or, if no command line arguments are provided, uses " \t". When there are command-line arguments, the value of sep should be the result of concatenating all the command line arguments. For example, if the main() is compiled into an executable split, then running

          ./split a b c

        or

         ./split abc

        or

         ./split ab c

        should choose a sep value of abc.

    • reads lines of input (without prompting). For each line, it should:

      • exit if the line is one period (.) with no other text or whitespace;
      • otherwise call string_split with that input and the chosen sep value, then print out the resulting array with each word surrounded by square brackets ([ and ]) and without spaces between words, followed by a newline, then frees the resulting array.

    Your program must support lines of input of at least 4000 bytes. If an input line is more than 4000 bytes, your program may not access out of bounds memory (such as might cause a segfault), but otherwise we do not care what it does.

    We intend to test your main.c with our own split.c, so please put any functions other than string_split that it needs within main.c. Your implementation may use functions defined in the C or POSIX standards.

    (Some examples of expected transcripts are shown below.)

  2. Create a Makefile such that typing make will build (if necessary) split.o, main.o and link them into an executable split.

  3. Make sure your C files do not produce any warnings when compiled with -D_XOPEN_SOURCE=700 -Og -g -std=c11 -Wall -pedantic (assuming GCC or Clang). (It is okay if your Makefile uses different options.)

  4. Make sure your code does not have memory leaks or errors. We will test your code with AddressSanitizer enabled to help check for this (which can be enabled by compiling and linking with -fsanitize=address).

  5. We may use automated tests to assess your submission. Your solution may not hard-code the solution for any of the test cases or intentionally interfere with the testing environment. (By hard-code the solution, we mean code that looks for our specific test inputs (such as checking if the input contains the XfooX because one of the test cases is XfooXbar) rather than the more general situation being tested (such checking for separator characters at the beginning of the input); it’s not a prohibition on having less specific special cases.)

  6. Submit your solution to the submission site

2 Examples

2.1 string_split

  1. Running

    char **result;
    int size = ANY_VALUE;
    result = string_split("foo", ":", &size);

    should have the same effect as

    char **result;
    int size;
    result = calloc(sizeof(char *), 1);
    result[0] = malloc(4);
    strcpy(result[0], "foo");
    size = 1;
  2. Running

    char **result;
    int size;
    result = string_split("foo:bar:quux", ":", &size);

    or

    char **result;
    int size;
    result = string_split("foo:bar!quux", "!:", &size);

    or

    char **result;
    int size;
    result = string_split("foo:bar!quux", ":!", &size);

    should have the same effect as

    char **result;
    int size;
    result = calloc(sizeof(char *), 3);
    result[0] = malloc(4);
    strcpy(result[0], "foo");
    result[1] = malloc(4);
    strcpy(result[0], "bar");
    result[2] = malloc(5);
    strcpy(result[0], "quux");
    size = 3;
  3. Running

    char **result;
    int size;
    result = string_split(":foo!:bar::quux!", ":!", &size);

    should have the same effect as

    char **result;
    int size;
    result = calloc(sizeof(char *), 5)
    result[0] = malloc(1);
    strcpy(result[0], "");
    result[1] = malloc(4);
    strcpy(result[1], "foo");
    result[2] = malloc(4);
    strcpy(result[2], "bar");
    result[3] = malloc(5);
    strcpy(result[3], "quux");
    result[4] = malloc(1);
    strcpy(result[4], "");
    size = 5;

2.2 split executable

In the following example transcripts bold represents input typed in and $ represents the shell’s prompt:

2.2.1 Example 1

$ ./split
foo     bar   baz
[foo][bar][baz]
quux-no-space quux-with space !
[quux-no-space][quux-with][space][!]
   indented
[][indented]
.
$

2.2.2 Example 2

$ ./split XY Z
fooXXXXbarZXYXYXZbazYYYYY
[foo][bar][baz][]
XXXXXXXXXXXXX
[][]
X.X
[][.][]
.
$

3 Hints

  1. I found the C standard library functions strspn and strcspn useful in my solution.

  2. To avoid scanning the string multiple times, you can use realloc to change the size of a dynamically allocated array. (But note that you might need to initialize space made by realloc — it might not default to 0/NULL.)

  3. You may find referencing the CSO1 labs on using C helpful, see for example labs 9 through 12 of the Spring 2023 offering of CSO1

  4. In addition to the problems caught by AddressSanitizer, you might have memory errors that arise from assuming local variables are initialized in a particular way even though you don’t set them explicitly. With recent versions of GCC and with clang you can use the -ftrivial-auto-var-init=pattern flag to help catch this — this will make the compiler initialize any uninitialized local variables to a pattern likely to cause issues if it is used.