Changelog:
- 5 Apr 2023: correct wrong URL for readings link in tips
- 6 Apr 2023: update Makefile to actually build life-tsan and have a
main.cc
that properly processes its arguments - 6 Apr 2023 5pm: correct link to main.c to not be broken link
- 6 Apr 2023: specify to create one fewer thread if the main thread is doing its share of the computations
- 6 Apr 2023 11pm: add links to copies of sanitizer-test.sh, which was also missing from the skeleton code
- 7 Apr 2023: add note about restrictions on timing of pthread_barrier_destroy due to libc issue
- 7 Apr 2023: be more explicit that paralle version needs to match serial one
- 10 Apr 2023: add information re: using debugger to diagnose hangs to hints
If you downloaded the skeleton code before 6 April around 4pm, then there are some minor issues with that skeleton code:
- the Makefile does not build the ThreadSanitizer-version of the testing program
life-tsan
. You can download an updated Makefile to correct this.- the
main.c
doesn’t properly distinguish between thetime
andtime-and-result
arguments; you can download an updated main.c to correct this.- the
sanitizer-test.sh
file was not included
1 Your Task
1.1 Before you code
Examine the supplied skeleton
code [last updated 2023-04-06 3:50pm] which implements Conway’s
Game of Life. You should only edit life-parallel.c
, but
that interacts with many other files in the package.
Make sure you look at input/README.md
, as well as
life.h
and any other files you wish.
The provided code will compile and run. We encourage experimenting with it before adding any code of your own.
Make sure you understand how simulate_life_serial
works.
You’ll be making a (more complicated) parallel version of this, and are
unlikely to be successful if you don’t understand this starting
point.
1.2 Implement parallel Life
Create a parallel version (using pthreads) in
life-parallel.c
. You must use the provided header
void simulate_life_parallel(int threads, LifeBoard *state, int steps)
life-serial.c
contains a correct, working
single-threaded implementation you are welcome to borrow from. You
should write helper functions as appropriate to keep your code
well-organized.
Additional constraints:
Your implementation must be correct (result in the expected final board configuration, matching the serial version) for all boards and iteration counts.
You must call
pthread_create
exactlythreads
times (assuming you are computing cell values only in new threads you create; if you also compute cell values in the main (initial) thread, call itthreads-1
times instead). Do not create more or fewer threads, nor create new threads for each iteration of the simulation.We must be able to call
simulate_life_parallel
from multiple threads concurrently. Do not use global variables or other thread-unsafe practices.You must have no memory leaks or other memory errors. The provided
Makefile
builds a version with the address sanitizer enabled automatically aslife-asan
; this must run without errors.ThreadSanitizer should not detect likely race conditions or other thread API errors in your code. The provided
Makefile
builds a version with the thread sanitizer enabled automatically aslife-tsan
.You must call the appropriate
pthread_***_destroy
for eachpthread_***_init
you call to reclaim any pthreads-allocated memory.- On portal/our testing machines,
pthread_barrier_destroy
won’t work properly unless called after allpthread_barrier_wait
calls have returned. (This is a bug in the pthread_barrier implementation on those machines.)
- On portal/our testing machines,
On a multi-core machine, your parallel code must be noticeably faster than the serial code when run on sufficiently large boards for sufficiently many iterations.
On sufficiently large boards (both enough rows and enough columns), you should split up the work roughly evenly between the threads. (With large sizes, no thread should end up doing more than a few percent more computations than another.)
And, of course, you are still bound by all the usual course policies.
You must not use OpenMP (subject of one of our labs) in your solution: we are trying to measure your understanding of the lower-level tools OpenMP uses.
1.3 Test your code.
Uncomment the lines of main.c
that are marked as
appropriate to uncomment after simulate_life_parallel
is
written. Then test your code, as e.g. by running
./life
to see the usage hints.life-asan 10 input/make-a time
to check for memory leaks and other memory errors (but do not trust its timing, the address sanitizer slows things down a lot).life-tsan 10 input/make-a time
to check for race conditions and some other types of synchornization errors (but do not trust its timing, the address sanitizer slows things down a lot).- if you get an error about a race condition with pthread_barrier_destroy and pthread_barrier_wait, see the first caveat below.
If you downloaded the skeleton code before 6 April, your Makefile may not properly build life-tsan; you can download an updated Makefile here.
./life 0 input/o0075 serial-result
to ensure you can load an example file../life 110 input/o0075 serial-result
to ensure you can simulate an example file../life 10 input/make-a time
to time themake-a
file with 10 steps../time-examples.sh
to get a sense of how you are doing on parallel performance../sanitizer-test.sh
to try a bunch of sizes in AddressSanitizer, ThreadSanitizerIf you downloaded the skeleton code before 6 April, your Makefile may not properly build life-tsan or include
sanitizer-test.sh
; you can download an updated Makefile here andsanitizer-test.sh
here.
2 Tips
This assignment was designed to be a natural fit for barriers. You are strongly encourage to get a barrier-based solution working first before attempting any other approaches.
You should choose some way to split up the grid between threads. You will get best performance if each thread works on a part of the grid that is contiguous in memory, so you get better locality within a thread. This also will avoid performance problems associated with two processors trying to modify data in different parts of the same cache block.
To compute the value of a cell in generation i, one needs the value of its neighbors in generation i-1. Barriers are one way to make sure that the values from generation i-1 are available before any thread starts computing generation i.
The supplied code calls
LB_swap()
to exchange boards. If one uses that code as is, one must ensure that all threads stop accessing the boards before the swap happens and don’t start accessing the boards again until the swap finishes.An alternative, which calls wait on barriers fewer times each iteration, is to avoid swapping by having an
even
state andodd
state and choose which board to write to based on the generation number. (In even iterations, threads would use theodd
version of the state to write to theeven
state; and in odd iterations, threads would use theeven
version of the state to write to theodd
state.) In this way, rather than waiting for the swap to finish before starting the next generation, threads just need to wait for all other threads to have finished the current generation.You should be able to reuse almost all of the
simulate_life_serial
code. You will probably need to split it into at least two functions — one that represents the work done in separate threads and one that spawns the threads and waits for them to finish.If you find yourself tempted to use a global variable (such as a global mutex or barrier), you can usually get away with adding it as a field of a
struct
passed in by reference to your thread function’svoid *
parameter instead.If your code hangs, you can run it in a debugger until it hangs, and then interrupt it (like with control-C) and examine where each thread is.
For doing this in LLDB, see their tutorial on examining thread state.
In GDB, you can use a command like
thread apply all backtrace
to get a backtrace from all threads, andthread THREAD-ID
to switch to debugging the thread with a particular ID; see also the manual for debugging with threads in GDB.
3 Caveats
The version of
pthread_barrier_destroy
on portal and our testing machines requires that you do not call pthread_barrier_destroy until you are sure that all threads have first returned fromptherad_barrier_wait
. This is a bug in the those machine’s version of thelibc
, see this mailing list post.MacOS and OS X ship with only a subset of the pthreads library, notably excluding
pthread_barrier_t
and its associated functions. You should probably use portal, a virtual machine, etc.