Contents
Changelog:
- 1 October 2018: clarify “avoids some synchronization” in hints to “calls wait on barriers fewer times”
- 5 October 2018: add note that “./barriertest” is not expected to be a complete test.
- 7 October 2018: correct barrier API to implement to actual match
barrier.h
and actually passmy_barrier*
to barrier_wait, etc.
Note: Before 7 October 2018, this writeup had function prototypes for the barrier API that disagreed with the supplied
barrier.h
. Please follow the API in the suppliedbarrier.h
, which is now what is listed below.
Your Task
-
(Required for checkpoint) Examine our supplied skeleton code (last updated 28 September 2018 16:05) which implements Conway’s Game of Life. See below for a summary of hat our skeleton code does and how to use it.
-
(Required for checkpoint) Create a parallel version that uses pthreads and pthread’s barriers. You can reuse the existing
LifeBoard
class. Your implementation must have the same prototype as the unimplemented version inlife-parallel.cc
(though you can implement it in other files, so long as you modify the Makefile accordingly). Your implementation can (and should!) span across multiple functions.The functions you need to implement take a number of threads that should do the computation. Your code should create the appropriate number of threads, then do as many iterations of the simulation as requested, then let the threads finish. You may not create new threads for each iteration of the simulation.
One way you can test your version is by uncommenting the corresponding entries in the
test_functions
table inmain.cc
and running scripttime-examples.sh
. When you are first developing, you may wish to simply run some of the./life
commands fromtime-examples.sh
with a smaller iteration count.When running on a machine with multiple cores (if your VM only has one virtual CPU, it doesn’t count), you should notice speedup when running your barrier implementation with more threads.
We may test your code by substituting our own versions of
main.cc
, so please do not add anything tomain.cc
that would prevent this from working. -
Write, probably in
barrier.cc
, your own implementation of barriers in terms of either semaphore or monitors (mutexes and condition variables) with the following API, declared in the suppliedbarrier.h
:extern "C" my_barrier *barrier_init(int num_threads)
extern "C" void barrier_wait(my_barrier*)
extern "C" void barrier_destroy(my_barrier*)
(Your implementation must not use the Pthreads barrier implementation.)
Running
make libbarrier.a
should create a statically linked library containing your implementation. We may test your code by runningmake libbarrier.a
and then linking our own programs against this library.You can use the supplied program
./barriertest
to help test your implementation, or write additional programs. (We do not gaurentee that passing the tests in./barriertest
on your system (especially if you run it with one core) means your implementation is correct.) -
Create a new parallel version of the Game of Life that uses your barrier implementation. Your implementation must have the same prototype as the unimplemented version in
life-parallel.cc
(though you can have the implementation in other files, so long as you modify the Makefile accordingly).Test this implementation like you did the barrier implementation, by modifying
main.cc
to add it to thetest_functions
list and running the scripttime-exampes.sh
-
(Requires for both checkpoint and final submission) Submit your code as a .tar.gz archive. You may find the supplied
make archive
target helpful for this. (If you created any new source files for your solution, please verify that they are included in this .tar.gz archive.)
On the skeleton code
Running the supplied code
-
Conway’s Game of Life is a simulation over a grid of cells, each of which is “dead” or “alive”. The simulation is done in “generations”, with a simple rule (see the wikipedia article or our code) determining what state a cell is in in the next iteration, based on the state of the cells around it in the current iteration. Our supplied code has a working serial implementation, except that the grid is fixed size and cells at the edge of the grid are always dead to avoid simulating an arbitrarily large grid.
-
If you download the skeleton code and run
make
this will produce a binary called./life
. If you run./life
you will receive a usage message. -
We have some example input files in the
input
directory. You can open these up in a text editor. The first line of these files consists of the size of the pattern grid and the following lines consist of its contents, where.
or ` ` represents a “dead” cell and anything else represents a “live” cell.This input format is parsed by the
operator<<
andoperator>>
(for iostreams) inlife.cc
. -
When run in
serial-result
mode the./life
binary runs the serial implementation and produces the result, for example:./life 0 input/make-a serial-result
outputs
input/make-a
after 0 generations, which will be the same as the original input../life 1 input/make-a serial-result
outputs
input/make-a
after 1 generations, and you can supply larger numbers to see later. For themake-a
pattern, generations numbers around 120 are interesting.
The Serial Implementation
-
The class
LifeBoard
, definedlife.h
, represents the grid of cells as a one-dimensional array. Theat
member function of this class can be used to access the cell at a givenx
andy
coordinate. In this representation rows are stored together. -
life.cc
containssimulate_life_serial
, our serial implementation, which looks in at 3x3 “window” around each cell to determine the next state. This update code works by making two life boards, one representing the next state, and one representing the current state. After each generation, these boards are swapped. The implementation of swapping forLifeBoard
allows swapping to occur without actually copying the cell values (the pointers in the corresponding vectors are swapped), so this is efficient.
Running and Timing Parallel Implementations
-
./life
does not run your parallel code until you changemain.cc
and run something like./life input/SOME-FILENAME time
(notserial-result
). -
main.cc
has an array of functionstest_functions
that you should change to include the parallel implementation you are implementing. We have supplied commented-out definitions you can use. -
After changing the array of
test_functions
inmain.cc
,./life 10 input/make-a time
will time all the versions of the life simulation listed in
test_functions
and verify that they have the correct result. When running an implementation, first each version is run once and the result is checked for correctness. The implementation is not timed if this produces an error. Then, timing runs each implementation many times in attempt to obtain a result that does not have “noise” from the machine. Each result is checked for correctness versus the reference implementation. -
If you run with
time-and-result
instead oftime
, the main function will print out the final results of your parallel versions whenever they fail to match the serial implemnetation. -
The list of test functions in
main.cc
expects each test function to have the prototypevoid single_threaded_function(LifeBoard &board, int num_iterations)
but your multithreaded versions will have the prototype
void multithreaded_function(int num_threads, LifeBoard &board, int num_iterations)
to accomodate this the commented out code uses the C++ standard library’s std::bind to create wrapper functions for
multithreaded_function
for various numbers of threads. For example:function = std::bind(&foo, 42, _1, _2); function(a, b);
is equivalent to doing
foo(42, a, b);
Scripts to Assist Testing
-
time-examples.sh
runs several./life
timing commands which may be useful for testing. -
convert_from_rle.py
is a program converts from the popular “RLE” format for life patterns you may be able to find online into the input format our code expects.
Hints
Pthreads documentation
-
The official POSIX documentation is available at http://pubs.opengroup.org/onlinepubs/9699919799/.
You can make this official documentation available through commands like
man pthread_cond_wait
on your VM (or any sufficiently Ubuntu-like system) by runningsudo apt install manpages-posix-dev
. -
This tutorial from LLNL is one, perhaps friendlier source of information (though it does not cover barriers, see the next section for that.)
-
Another reference source is Chapter 27 of Operating Systems: Three Easy Pieces along surrounding chapters
Understanding POSIX’s barriers
-
POSIX barriers have the following API:
-
int pthread_barrier_init(pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned count)
.Initialize the barrier.
count
is the number of threads that will use the barrier.attr
are “attributes” to configure how the barrier works. For this assignment, you can pass “NULL” forattr
. -
int pthread_barrier_destroy(pthread_barrier_t *barrier)
Deallocate any resources allocated for a barrier.
-
int pthread_barrier_wait(pthread_barrier_t *barrier)
Wait for exactly
count
(set byinit
) threads to callwait()
before returning. Aftercount
threads have called wait,wait
will returns for all threads and the state of the barrier is reset. These threads can then immediately use the barrier again to wait for all threads to reach a certain place. -
All these functions return an int that can, among other things, be used to know if an error occurred.
-
Parallelizing Life
-
You should choose some way to split up the grid between threads. You will get best performance if each thread works on a part of the grid that is contiguous in memory, so you get better locality within a thread. This also will avoid performance problems associated with two processors trying to modify data in different parts of the same cache block.
-
To compute the value of a cell in generation i, one needs the value of its neighbors in generation i-1. Barriers are one way to make sure that the values from generation i-1 are available before any thread starts computing generation i.
-
The supplied code calls
swap()
to exchange boards. If one uses that code as is, one must ensure that all threads stop accessing the boards before the swap happens and don’t start accessing the boards again until the swap finishes.An alternative, which calls wait on barriers fewer times each iteration, is to avoid swapping by having an “even” state and “odd” state and choose which board to write to based on the generation number. In this way, rather than waiting for the swap to finish before starting the next generation, threads just need to wait for all other threads to have finished the current generation.
-
You should be able to reuse almost all of the
simulate_life_serial
code. You will probably need to split it into at least two functions — one that represents the work done in seperate threads and one that spawns the threads and waits for them to finish.
Implementing a Barrier
-
If one is using mutexes and condition variables, one must deal with spurious wakeups, so there is a little more work than keeping a count of threads remaining.
Specifically: Suppose two threads A and B using a barrier. It is possible for thread B to finish waiting for the barrier and start waiting again before thread A completely exits the wait routine the first time:
A B start barrier_wait start waiting . start barrier_wait . notice all threads are ready, tell OS to wakeup thread A [OS marked thread A as runnable, but decides not to schedule it immediately] . finish barrier_wait . ... . start barrier_wait . should start waiting OS runs thread A again should stop waiting
When using mutexes and condition variables, this means that one needs more state than the number of threads that have joined the barrier this iteration — otherwise, one cannot distinguish a spurious wakeup with this situation.
One way to deal with this problem is to record something related to how many iterations the barrier has been used for when they start waiting and comparing this to the current iteration when they are worken up.
-
The same above problem arises when using semaphores; solutions might involve using more semaphores than might seem useful at first.