Changelog:
- 25 March 2023: add aside describing what are atomic operations
- 25 March 2023: add section on reordering/optimizations
- 30 March 2023: undo syntax error that truncated a lot of the text here (sorry!)
1 Day-to-day life
When you make a comment in a conversation, it is (usually) treated as an atomic operation: you can divide the conversation into what happened before and what happened after your comment, but during it nothing else occurred. This is not true in most text-based chats, though, where it is common for multiple comments to be created concurrently.
Study rooms, lavatory stalls, TV remotes, and many other resources are handled as if protected by a mutex: anyone can use them, but by using them you are excluding others from using them at the same time. Semaphores and monitors have some special properties but are roughly variations on the mutex theme.
One person walking along a sidewalk does not prevent others from
entering it as well, but one construction team repairing the sidewalk
does because maintenance staff tend to use a reader-writer lock to access sidewalks
and many other parts of our lived environment: they may either be used
by a single repair team (a writer
) or any number of pedestrians
(readers
), but not both at the same time.
If you’ve ever told a group a friends we’ll meet up by the statue
and then go out from there,
you’ve implemented a barrier: everyone arriving at the statue waits there
until everyone arrives, and then you all move forward together from
there.
When you go to a store, a purchase only occurs if both you give them money and they give you the product. If either step fails, neither step occurs. This is an example of a transaction: multiple steps, between which other things can occur, but guaranteed to either all happen or none happen.
2 Atomic Operation
An atomic operation is an action or set of actions are made to appear to happen all at once, with no ability for any other action to occur in the middle. For non-parallel concurrency, this can be guaranteed by the OS refusing to respond to interrupts until the atomic operation is completed. When there is parallelism, this generally requires extra support by the hardware. Either way, atomic operations are generally slower than their non-atomic counterparts and only a limited number of them are provided on any given system.
Atomic operations are used to implement all of the other synchronization tools on this page.
read/modify/write atomic operations
Typically, processors guarantee that loading and storing pointer-sized values from memory into a register is atomic. Technically, along with disabling interrupts, this is sufficient to implement the synchronization constructs described below on a multi-core system (see, for example, Peterson’s algorithm to find out how). But to make such implementations more practical, processors usually provide atomic instructions that load, modify, and write-back a value to make implementing these constructs more practical.
Which operations are supported varies between processors. One example
is an atomic swap operation (lock xchg
on x86-64), which
swaps a register with a value in memory. Unlike a normal swap, no other
processor can write the memory value in between when it is copied into
the register and when the register value overwrites the value in memory.
This can be achieved in hardware by ensuring that the value is stored in
a cache during the swap and that no other processor’s cache obtains the
value until the swap completes.
reordering and other optimizations
When compiling or running programs, it’s common for compilers and processors to perform memory accesses in a different order than is written.
For example, given code like:
global_variable = 0;
for (int i = 0; i < 1000; ++i) {
global_variable += array[i];
}
it is much better if this ends up writing
global_variable
just once after the loop rather than
reading and writing global_variable
multiple times. A
compiler might do this by keeping global_variable
temporarily in a register. Processors can do this kind of optimization,
too.
A result of this is that the reads and writes to
global_variable
effectively occur later than the reads of
the array
, even though that’s not what the code
specifies.
When global_variable
isn’t shared between cores, this
change is invisible. But when other cores are acting on the value, this
can change the result of the program. In an example like the above, this
is not a problem, but many naive ways of achieving sychronization end up
not working because of this, even if they use atomic operations. For
example, one common idea is to use global flag
variable set
by one thread to stop another thread from doing something:
counter = 0;
while (!flag && counter > 100000) {
counter += array[counter];
}
A problem here is that the compiler is allowed to read the value of
flag
just once and keep it in a register. Or, alternately,
an advanced processor will often do the same thing itself. It is not
sufficient that the operation of reading the flag
is
atomic. It is still being done atomically, just not at the time the code
appeared to specify.
To avoid these kinds of optimizations, we need to tell compiler and processor that some operations cannot be reordered like this. Implementations of the synchronization constructs described below will almost always do this, so you can use them without worrying about these problems.
3 Semaphore
A variable indicating how many (a counting semaphore) or if any (a boolean semaphore) resources are available. Because simple variables are not modified atomically, semaphores are typically implemented as an abstract datatype with three methods for modifying them:
- Acquire or wait
-
Attempt to acquire a resource, decrementing the count or setting the boolean to false. If the count is 0, suspend operation until it becomes 1 again.
This operation is sometimes called P, wait, down, acquire, pend, procure, or decrement.
- Acquire if possible
-
Attempt to acquire a resource, decrementing the count or setting the boolean to false, and return a success code if successful. If the count was already 0, instead do nothing to the semaphore and return a failure code.
This operation is present in many semaphore libraries, but rarely has its own name. Generally it is achieved by some kind of
don’t block
flag given to the same function that performedacquire or wait.
- Release
-
Increment the count, possibly causing a waiting process to become active.
This operation is sometimes called V, signal, up, release, post, vacate, or increment.
The naming of these operations is complicated by having first been defined by a dutch team (led by Edgar Dijkstra) who gave different words for P and V to stand for in different reports; for a brief summary of names, see https://en.wikipedia.org/wiki/Semaphore_(programming)#Operation_names.
Although I described counting semaphores as a count of how many resources are available, that can be adjusted; for example, some systems use a negative count to indicate how many processes are waiting for a resource. While these changes do not impact the overall functionality of a semaphore, they can impact initialization state and so on for a particular library, so be sure to read the manual for any library you use if the specific numbers are meaningful in your application.
4 Mutex / Lock
A mutex (from mutual exclusion
) or
lock is any device that ensures one and only one thread
or process has access to some resource at a time.
Mutexes can be implemented as a binary semaphore, provided that an additional limitation
is imposed where only the thread that has acquired and not yet released
the semaphore releases it. More commonly, mutexes are built as more
complicated constructs to provide extra features, including enforcing
the only the acquirer can release
property, automatic release if
the acquirer terminates without releasing, etc.
Operations on mutexes use similar vocabulary to the corresponding operations on semaphores.
5 Condition Variable
A condition variable, despite having variable
in its name, is not a variable, nor like one in any meaningful way.
Instead, it is composed of at least two fairly complicated
constructs:
An assertion1; that is, a condition that must be satisfied before operation continues.
Some implementations no not actually store the assertion in any way, instead assuming that it is handled elsewhere in the code and that the associated monitor will invoke wait only if it was false and signal any time it becomes true.
A wait-queue; that is, a collection (not necessarily a queue) of suspended threads awaiting the moment when the assertion will become true.
Optionally, a ready queue; that is, a collection (not necessarily a queue) of suspended threads where the assertion is true and the thread is ready to run but has not yet been scheduled to do so.
Condition variables are almost always discussed coupled with a mutex, the pair forming a monitor. Because condition variables are so often
used in monitors, I have sometimes seen a monitor called a condition
variable
colloquially.
6 Monitor
See also: pthreads API for condition variables
A monitor is a paired mutex and condition variable with the following operations defined upon the pair:
- wait
-
Used to wait until the condition variable’s assertion becomes true.
Suspend operation of the current thread, leaving the mutex unlocked; place the thread into the condition variable’s wait-queue.
- signal or notify
-
Used to indicate that the condition variable’s assertion became true, and that a waiting thread may thus execute.
Pick one thread from the condition variable’s wait-queue and resume it, having it lock the mutex before it resumes operation.
- broadcast or notifyAll
-
Used to tell all sleeping threads to wake up. Often used if the assertion of the condition variable was implicitly defined and the broadcasting code is unsure which, if any, sleeping threads are now ready to operate.
Since each waking thread is supposed to acquire the mutex associated with the monitor, monitors that have a broadcast function can benefit from a ready queue to store the set of threads that are ready to run but are waiting for the mutex. Depending on the implementation of the mutex, that ready queue may be implicitly handled by the mutex instead.
Monitors are sometimes implemented at a programming language level as
the most user-friendly technique for achieving mutual exclusion. Java,
for example, decided that every object would be a monitor and includes
the synchronized
keyword and associated control constructs for automatically waiting on
and signaling that monitor.
7 Reader-writer lock
See also: pthreads API for reader/writer locks
A reader-writer lock is used in two modes. If used by a writer, it is a mutex; if used by a reader, it is not. In particular, is has the following four operations
- Begin read
- If there are no writers, increment the number of readers. Otherwise, wait for the writer to finish.
- End read
- Decrement the number of readers.
- Begin write
- If there are neither readers nor writers, set the number of writers to 1. Otherwise, wait for the number of readers to become 0.
- End write
- Set the number of writers to 0.
There are implementations of this using multiple mutexes or a monitor, both augmented with a few extra variables. Several designs exist in part because of the multiple possible answers to the following question:
- If a writer is waiting for all the readers to be done, and a new reader arrives, should that reader be allowed in or made to wait for the writer to finish first?
Both read-preferring and write-preferring reader-writer locks have pros and cons, and both are commonly implemented.
8 Barrier
See also: pthreads API for barriers
A barrier is somewhat similar to a counting semaphore, in that it can be thought of as a simple integer. However, it needs to be configured in advance with the number of processes it is to wait for, and once initialized, it has just has one action:
- arrive
-
Increment the counter.
If that increment caused the counter to reach the preconfigured number of processes this barrier is to wait for, resume all processes suspended at this barrier and reset the counter to 0. Otherwise, suspend the arriving process.
In practice, somewhat more complicated structures are used to help this design scale and to ensure both that the increment-and-check happens atomically and that all waiting processes pass the barrier before others arrive.
9 Transaction
A transaction, or more formally an atomic
transaction, is a group of operations that is guaranteed to
occur as if it was a single atomic
operation. Appropriate use of mutexes and monitors can be used to implement transactions, but
often when transactions are identified by name, they are instead
implemented optimistically,
using something like the
read-copy-update mechanism.
The principle behind read-copy-update is that a process checks to make sure none of the inputs of a computed value have changed before placing that value into memory.
Suppose you wanted to compute x = a*x*x + b*x + c
as a
transaction. You would
- make a copy of the current values of
a
,b
,c
, andx
asA
,B
,C
, andX
. - compute the new value of
x
asx_new = A*X*X + B*X + C
. - use a mutex to ensure you are the only one accessing memory while
you
- verify that the value now in
a
is stillA
, and similarly forb
andB
,c
andC
, andx
andX
. - if so, write
x = x_new
and release the mutex. - if not, release the mutex and return to step 1 to try again.
- verify that the value now in
10 Further resources
- OS:TEP’s cahpters 26–32
- Dive Into Systems, chapter 14
Note that this is not the usual programming definition of
assertion
; in code, an assertion is a function that checks a Boolean value and terminates the program with an error message if it is false.↩︎