Changelog:
- 9 Oct 2023: add explicit pointer to synchronization reading; point to relevant additional resources
This document is intended to give a practical overview of using pthreads. It leaves out a lot of details.
For brevity, code in this document does not check error codes. This is bad coding practice! You should check error codes in your code.
1 Jumping-off point
For those eager to start coding, here’s a parallel summation program you might find to be a useful starting point.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
/* Allow compiler -DTHREADCOUNT=4 but have a default */
#ifndef THREADCOUNT
#define THREADCOUNT 16
#endif
/** Defines a particular task to handle */
typedef struct {
size_t from;
size_t to;
double (*getnum)(size_t);
} task_description;
/**
* Function invoked by each new thread.
* The argument must be a task_description *;
* the return value is a malloced double *.
*/
void *sum_array(void *args) {
*task = (task_description *)args;
task_description (" Summing from %zd to %zd...\n",
printf->from, task->to);
taskdouble work = 0;
for(size_t i=task->from; i<task->to; i+=1) {
+= task->getnum(i);
work }
(" ... sub-sum from %zd to %zd = %g\n",
printf->from, task->to, work);
taskdouble *sum = malloc(sizeof(double));
*sum = work;
return (void *)sum;
}
/** A simple reciprocal function */
double fraction(size_t i) {
return 1.0/(i+1);
}
/**
* Sum all fractions 1/n from 1 to pow(2,-28)
* in THREADCOUNT parallel threads
*/
int main(int argc, const char *argv[]) {
// set up task sizes to take a few seconds on 2020-era laptops
size_t max = 1<<28;
size_t step = max / THREADCOUNT;
// store per-thread information (don't re-use, memory is shared)
[THREADCOUNT];
pthread_t id[THREADCOUNT];
task_description tasks
// spawn the threads
for(int i=0; i<THREADCOUNT; i+=1) {
[i].from = i*step;
tasks[i].to = (i+1)*step;
tasks[i].getnum = fraction;
tasks(id+i, NULL, sum_array, tasks+i);
pthread_create}
// wait for and combine the results
double result = 0;
for(int i=0; i<THREADCOUNT; i+=1) {
void *ans;
(id[i], &ans);
pthread_join+= *(double *)ans;
result (ans); // was malloced in just-joined thread
free}
("The sum is %g\n", result);
printfreturn 0;
}
To see the time impact of threading, try comparing
clang -lpthread -O3 -DTHREADCOUNT=8 thiscode.c && time ./a.out
and
clang -lpthread -O3 -DTHREADCOUNT=1 thiscode.c && time ./a.out
2 Managing thread existence
Every process has at least one thread, the one that invoked
main
. Each other thread is created by invoking a system
call, wrapped by the various pthread_
library
functions.
2.1
pthread_create
The library function pthread_create
makes a new thread.
It is given four arguments:
Type | Kind | Purpose |
---|---|---|
pthread_t * |
output | Set to the ID of the created thread |
const pthread_attr_t * |
input | Rules about how the new thread will behave |
void *(*)(void *) |
input | Pointer to a function the new thread runs |
void * |
input | Value passed as argument to the new thread |
2.1.1
pthreads_attr_t
The second argument of pthread_create
is used to control
how the thread behaves. Much of this is fairly specialized, and passing
NULL
will work in many cases. If you want more control,
though, you need to use a few extra functions to gain such.
A pthread_attr_t
must be initialized using
pthread_attr_init
before invoking
pthread_create
and cleaned up using
pthread_attr_destroy
afterwards.
pthread_attr_init
is permitted to malloc
fields inside the pthread_t
, and if it did,
pthread_attr_destroy
will free
them.
The following is a skeleton of how to create a thread.
;
pthread_attr_t attr(&attr);
pthread_attr_init;
pthread_t idvoid *argument = /* ... initialize this here */
/* ... use various pthread_attr_setXXXX functions to set behavior ... */
(&id, &attr, run_this, argument);
pthread_create(&attr); pthread_attr_destroy
Many of the attributes that can be placed into a
pthread_attr_t
have to do with scheduling priority (how
often the thread gets CPU time) and stack organization (how large, etc.
the thread’s stack is) and can be ignored by the casual thread user.
However, one (the detached or joinable state of the thread) is important
enough to deserve its own section.
2.2
pthread_attr_setdetachstate
and
pthread_join
Every created thread is either detached or joinable.
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE)
-
A joinable thread (the default if not otherwise set) will, when it
exits, continue to exist until
pthread_join
is called to retrieve its exit status and reclaim its resources. pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED)
-
A detached thread will, when it exists, immediately be reclaimed by the
OS. Its exit status is lost, and other threads cannot use
pthread_join
to wait for it to terminate.
2.2.1 Return values
If a thread with pthread_t id
is joinable, then invoking
pthread_join(id, &retval)
will cause the invoking thread to suspend operation until thread
id
terminates; when thread id
terminates,
retval
stores the results of the thread’s computation.
The result is one of
- The return value of the start function of the terminated thread.
- The argument passed into
pthread_exit
to terminate the thread early. - The special value
PTHREAD_CANCELED
if the thread was stopped by another thread.
If you have a joinable thread, you need to join it before exiting. If you have detached threads, you cannot wait for them to terminate and the program will exit when the main thread does.
2.3 Crashing
What happens if one thread crashes? Since a crash means an unhandled signal, and since the behavior of an unhandled signal is to terminate the process, the whole program crashes.
However, signals are delivered to specific threads, so if you add a signal handler, it will be run by the thread that the OS believes is the recipient of the signal.
2.4 Debugging
Debugging threaded programs can be tricky. Debuggers like
lldb
work fine on multithreaded programs, but with multiple
threads there is more information to display. Additionally, some bugs
(e.g., race conditions and deadlock) can depend on exact instruction
scheduling, which may be different in a debugger than when run
normally.
We will not have time in this course to dive into multithreaded debugging in any great detail.
3 Synchronization with pthreads
See first: the reading on synchronization generally.
3.1 Mutex
Recall that a mutex only lets one thread have it locked at a time, excluding others until it is unlocked.
;
pthread_mutex_t mutex
void *thread_function(void *) {
/* ... */
(&mutex);
pthread_mutex_lock/* only one thread can get here at a time */
(&mutex);
pthread_mutex_unlock/* ... */
}
int main(int argc, const char *argv[]) {
/* ... */
(&mutex, NULL);
pthread_mutex_initfor(int i=0; i<THREADCOUNT; i+=1) {
/* ... */
(&id[i], NULL, thread_function, &arg[i]);
pthread_create}
/* ... */
}
See also:
pthread_mutex_trylock
for theacquire if possible
behaviorpthread_mutex_destroy
for destroying mutexes (freeing any resources allocated by init).
3.2 On sharing mutexes (and other synchronization objects)
Note that operations on mutexes take the pthread_mutex_t
using a pointer because they modify the mutex object itself. This means
that two threads need to have same pthread_mutex_t
object, like is achieved using global variables above. You could also do
this without global variables, such as by passings pointers to a mutex
stored elsewhere:
typedef struct {
*ptr_to_mutex;
pthread_mutex_t ...
} task_description;
void *thread_function(void *args) {
*task = (task_description *)args;
task_description /* ... */
(args->ptr_to_mutex);
pthread_mutex_lock/* only one thread can get here at a time */
(args->ptr_to_mutex);
pthread_mutex_unlock/* ... */
}
int main(int argc, const char *argv[]) {
;
pthread_mutex_t mutex[THREADCOUNT];
task_description tasks/* ... */
(&mutex, NULL);
pthread_mutex_initfor(int i=0; i<THREADCOUNT; i+=1) {
[i].ptr_to_mutex = &mutex;
tasks/* ... */
(&id[i], NULL, thread_function, tasks+i);
pthread_create}
/* ... */
}
However, passing by value (like modifying the above code to have a
pthread_mutex_t
in the task_description
struct) would not work — each thread would have a
separate mutex. (Also
Most of POSIX’s other synchronization objects (such as barriers, reader-writer locks, semaphores) work similarly.
3.3 Barrier
Recall that a barrier acts like a meet-up: no one moves until
everyone expected arrives. In pthreads, that everyone
criterion
is determined by a count: once that number of threads arrive, all will
be allowed to proceed.
;
pthread_barrier_t barrier
void *thread_function(void *) {
/* ... */
/* each thread reaches here at its own time */
(&barrier);
pthread_barrier_wait/* all threads proceed from here together */
/* ... */
}
int main(int argc, const char *argv[]) {
/* ... */
(&barrier, NULL, THREADCOUNT);
pthread_barrier_initfor(int i=0; i<THREADCOUNT; i+=1) {
/* ... */
(&id[i], NULL, thread_function, &arg[i]);
pthread_create}
/* ... */
}
Like with pthread_mutex_t
’s, the
barrier_wait
operation modifies the barrier object, so
barrier objects need to shared between threads; they cannot be passed by
value.
See also pthread_barrier_destroy
for destroying barriers
(freeing any resources allocated by pthread_barrier_init
).
Note that less recent versions of pthread_barrier_destroy
(such as on portal/our testing machines as of 2023) require that all
calls to pthread_barrier_wait
have completed
(returned).
3.4 Reader-writer lock
Recall that a reader-writer lock has two modes: either it can be used by exactly one writer (like a mutex) or by any number of readers (like unsynchronized data), but not both at once.
The relevant functions are documented in the following manual pages:
pthread_rwlock_init
– this is complicated because they have multiple attributes to handle how they handle if a writer is waiting for the readers to finish and a new reader arrives.pthread_rwlock_rdlock
andpthread_rwlock_wrlock
– acquire the lock in two different ways.pthread_rwlock_timedrdlock
andpthread_rwlock_timedwrlock
– try to acquire the lock, but if a specified time passes without success, return an error code instead.pthread_rwlock_unlock
– release the lock (no matter how it was acquired).pthread_rwlock_destroy
– free any resources allocated bypthread_rwlock_init
However, these details, aside from the overall usage, looks similar to how mutexes are used with pthreads.
3.5 Monitors
Pthreads supplies separate condition variables that can be used with a mutex to construct a monitor.
The relevant functions are documented in the following manpages:
pthread_cond_init
– create a condition variablepthread_cond_wait
– atomically unlock a mutex and add the current thread to the condition monitor’s wait queue. When the thread is woken up from the wait queue, lock the mutex again before returning.pthread_cond_timedwait
— likepthread_cond_wait
, but stop waiting on the wait queue if a specified time passes and return an errorpthread_cond_destory
– free any resources allocated bypthread_cond_init
4 Additional resources
- LLNL’s POSIX Threads Programming Guide
- Dive Into Systems chapter 14
- OS:TEP’s chapters 28–31