7,13 → 7,13 |
<section> |
<title>Introduction</title> |
|
<para>The HelenOS operating system is designed to make use of parallelism |
offered by hardware and to exploit concurrency of both the kernel and |
userspace tasks. This is achieved through multiprocessor support and |
several levels of multiprogramming (i.e. multitasking, multithreading and |
through userspace pseudo threads). However, such a highly concurrent |
environment needs safe and efficient ways to handle mutual exclusion and |
synchronization of many execution flows.</para> |
<para>The HelenOS operating system is designed to make use of the |
parallelism offered by the hardware and to exploit concurrency of both the |
kernel and userspace tasks. This is achieved through multiprocessor |
support and several levels of multiprogramming such as multitasking, |
multithreading and also through userspace pseudo threads. However, such a |
highly concurrent environment needs safe and efficient ways to handle |
mutual exclusion and synchronization of many execution flows.</para> |
</section> |
|
<section> |
22,8 → 22,8 |
<section> |
<title>Spinlocks</title> |
|
<para>The basic mutual exclusion primitive is the spinlock. Spinlock |
implements busy waiting for an availability of a memory lock (i.e. |
<para>The basic mutual exclusion primitive is the spinlock. The spinlock |
implements active waiting for the availability of a memory lock (i.e. |
simple variable) in a multiprocessor-safe manner. This safety is |
achieved through the use of a specialized, architecture-dependent, |
atomic test-and-set operation which either locks the spinlock (i.e. sets |
30,21 → 30,22 |
the variable) or, provided that it is already locked, leaves it |
unaltered. In any case, the test-and-set operation returns a value, thus |
signalling either success (i.e. zero return value) or failure (i.e. |
non-zero value) in acquiring the lock. Note that this makes the |
non-zero value) in acquiring the lock. Note that this makes a |
fundamental difference between the naive algorithm that doesn't use the |
atomic operation and the spinlock algortihm. While the naive algorithm |
is prone to race conditions on SMP configuratinos and thus is completely |
is prone to race conditions on SMP configurations and thus is completely |
SMP-unsafe, the spinlock algorithm eliminates the possibility of race |
conditions and is suitable for mutual exclusion use.</para> |
|
<para>The semantics of the test-and-set operation is that the spinlock |
remains unavailable until this operation called on the respective |
spinlock returns zero. HelenOS builds two functions on top of |
test-and-set operation. The first is the unconditional attempt to |
acquire the spinlock and is called <emphasis>spinlock_lock</emphasis>. |
It simply loops until test-and-set returns zero. The other operation, |
spinlock returns zero. HelenOS builds two functions on top of the |
test-and-set operation. The first function is the unconditional attempt |
to acquire the spinlock and is called |
<emphasis>spinlock_lock</emphasis>. It simply loops until the |
test-and-set returns a zero value. The other function, |
<emphasis>spinlock_trylock</emphasis>, is the conditional lock operation |
and calls the test-and-set only once to find out wheter it managed to |
and calls the test-and-set only once to find out whether it managed to |
acquire the spinlock or not. The conditional operation is useful in |
situations in which an algorithm cannot acquire more spinlocks in the |
proper order and a deadlock cannot be avoided. In such a case, the |
51,44 → 52,45 |
algorithm would detect the danger and instead of possibly deadlocking |
the system it would simply release some spinlocks it already holds and |
retry the whole operation with the hope that it will succeed next time. |
The unlock operation, <emphasis>spinlock_unlock</emphasis>, is quite |
easy - it merely clears the spinlock variable.</para> |
The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy |
- it merely clears the spinlock variable.</para> |
|
<para>Nevertheless, there is a special issue related to hardware |
optimizations that modern processors implement. Particularily |
problematic is the out-of-order execution of instructions within the |
critical section protected by a spinlock. The processors are always |
optimizations that modern processors implement. Particularly problematic |
is the out-of-order execution of instructions within the critical |
section protected by a spinlock. The processors are always |
self-consistent so that they can carry out speculatively executed |
instructions in the right order with regard to dependencies among those |
instructions. However, the dependency between instructions inside the |
critical section and those that implement locking and unlocking of the |
respective spinlock is not implicit on some processor architectures and |
the processor needs to be explicitly told about each occurrence of such |
a dependency. Therefore, HelenOS adds architecture-specific hooks to all |
<emphasis>spinlock_lock</emphasis>, |
respective spinlock is not implicit on some processor architectures. As |
a result, the processor needs to be explicitly told about each |
occurrence of such a dependency. Therefore, HelenOS adds |
architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>, |
<emphasis>spinlock_trylock</emphasis> and |
<emphasis>spinlock_unlock</emphasis> to prevent the instructions inside |
the critical section from bleeding out. On some architectures, these |
hooks can be a no-op because the dependencies are implicitly there |
because of the special properties of locking and unlocking instructions. |
However, other architectures need to instrument these hooks with |
different memory barriers, depending on what operations can bleed |
out.</para> |
<emphasis>spinlock_unlock</emphasis> functions to prevent the |
instructions inside the critical section from permeating out. On some |
architectures, these hooks can be void because the dependencies are |
implicitly there because of the special properties of locking and |
unlocking instructions. However, other architectures need to instrument |
these hooks with different memory barriers, depending on what operations |
could permeate out.</para> |
|
<para>Spinlocks have one significant drawback: when held for longer time |
periods, they harm both parallelism and concurrency. Processor executing |
<emphasis>spinlock_lock</emphasis> does not do any fruitful work and is |
effectively halted until it can grab the lock and proceed. Similarily, |
other threads cannot execute on the processor that holds the spinlock |
because the kernel disables preemption on that processor when a spinlock |
is held. The reason behind disabling preemption is priority inversion |
problem avoidance. For the same reason, threads are strongly discouraged |
from sleeping when they hold a spinlock.</para> |
periods, they harm both parallelism and concurrency. The processor |
executing <emphasis>spinlock_lock</emphasis> does not do any fruitful |
work and is effectively halted until it can grab the lock and proceed. |
Similarily, other execution flows cannot execute on the processor that |
holds the spinlock because the kernel disables preemption on that |
processor when a spinlock is held. The reason behind disabling |
preemption is priority inversion problem avoidance. For the same reason, |
threads are strongly discouraged from sleeping when they hold a |
spinlock.</para> |
|
<para>To summarize, spinlocks represent very simple and essential mutual |
exclusion primitive for SMP systems. On the other hand, spinlocks scale |
poorly because of the active loop they are based on. Therefore, |
spinlocks are used in HelenOS only for a short-time mutual exclusion and |
spinlocks are used in HelenOS only for short-time mutual exclusion and |
in cases where the mutual exclusion is required out of thread context. |
Lastly, spinlocks are used in the construction of passive |
synchronization primitives.</para> |
102,14 → 104,14 |
<title>Wait queues</title> |
|
<para>A wait queue is the basic passive synchronization primitive on |
which all other passive synchronization primitives build. Simply put, it |
allows a thread to sleep until an event associated with the particular |
wait queue occurs. Multiple threads are notified about incoming events |
in first come, first served fashion. Moreover, should the event come |
before any thread waits for it, it is recorded in the wait queue as a |
missed wakeup and later forwarded to the first thread that decides to |
wait in the queue. The inner structures of the wait queue are protected |
by a spinlock.</para> |
which all other passive synchronization primitives are built. Simply |
put, it allows a thread to sleep until an event associated with the |
particular wait queue occurs. Multiple threads are notified about |
incoming events in a first come, first served fashion. Moreover, should |
the event come before any thread waits for it, it is recorded in the |
wait queue as a missed wakeup and later forwarded to the first thread |
that decides to wait in the queue. The inner structures of the wait |
queue are protected by a spinlock.</para> |
|
<para>The thread that wants to wait for a wait queue event uses the |
<emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then |
124,18 → 126,19 |
<listitem> |
<para>another thread calls <emphasis>waitq_wakeup</emphasis> and the |
thread is the first thread in the wait queue's list of sleeping |
threads</para> |
threads;</para> |
</listitem> |
|
<listitem> |
<para>another thread calls |
<emphasis>waitq_interrupt_sleep</emphasis> on the sleeping |
thread</para> |
thread;</para> |
</listitem> |
|
<listitem> |
<para>the sleep timeouts provided that none of the previous occurred |
within a specified time limit; the limit can be infinity</para> |
<para>the sleep times out provided that none of the previous |
occurred within a specified time limit; the limit can be |
infinity.</para> |
</listitem> |
</orderedlist> |
|
142,26 → 145,27 |
<para>All five possibilities (immediate return on success, immediate |
return on failure, wakeup after sleep, interruption and timeout) are |
distinguishable by the return value of |
<emphasis>waitq_sleep_timeout</emphasis>. The ability to interrupt a |
sleeping thread is essential for externally initiated thread termination |
and the ability to wait only for a certain amount of time is used, for |
instance, to passively delay thread execution by several microseconds or |
even seconds in <emphasis>thread_sleep</emphasis> function. Because all |
other passive kernel synchronization primitives are based on wait |
queues, they also have the option of being interrutped and, more |
importantly, can timeout. All of them also implement the conditional |
operation. Furthemore, this very fundamental interface reaches up to the |
implementation of futexes - userspace synchronization primitive, which |
makes it possible for a userspace thread to request synchronization |
operation with a timeout or a conditional operation.</para> |
<emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a |
sleeping thread is essential for externally initiated thread |
termination. The ability to wait only for a certain amount of time is |
used, for instance, to passively delay thread execution by several |
microseconds or even seconds in <emphasis>thread_sleep</emphasis> |
function. Due to the fact that all other passive kernel synchronization |
primitives are based on wait queues, they also have the option of being |
interrutped and, more importantly, can timeout. All of them also |
implement the conditional operation. Furthemore, this very fundamental |
interface reaches up to the implementation of futexes - userspace |
synchronization primitive, which makes it possible for a userspace |
thread to request a synchronization operation with a timeout or a |
conditional operation.</para> |
|
<para>From the description above, it should be apparent, that when a |
sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when |
<emphasis>waitq_sleep_timeout</emphasis> succeeds immediatelly, the |
thread can be sure the event has come and the thread need not and should |
not verify this fact. This approach is called direct hand-off and is |
characteristic for all passive HelenOS synchronization primitives with |
one exception described below.</para> |
<emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the |
thread can be sure that the event has occurred. The thread need not and |
should not verify this fact. This approach is called direct hand-off and |
is characteristic for all passive HelenOS synchronization primitives, |
with the exception as described below.</para> |
</section> |
|
<section> |
171,7 → 175,7 |
missed wakeups is equal to the number of threads that will not block in |
<emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed |
instead. On the other hand, semaphores are synchronization primitives |
that will let predefined amount of threads in its critical section and |
that will let predefined amount of threads into its critical section and |
block any other threads above this count. However, these two cases are |
exactly the same. Semaphores in HelenOS are therefore implemented as |
wait queues with a single semantic change: their wait queue is |
192,13 → 196,13 |
<section> |
<title>Mutexes</title> |
|
<para>Mutexes are are sometimes referred to as binary sempahores. That |
means that mutexes are like semaphores that allow only one thread in its |
<para>Mutexes are sometimes referred to as binary sempahores. That means |
that mutexes are like semaphores that allow only one thread in its |
critical section. Indeed, mutexes in HelenOS are implemented exactly in |
this way: they are built atop semaphores. From another point of view, |
they can be viewed as spinlocks without busy waiting. Their semaphore |
heritage provides good basics for both conditional operation and |
operation with timeout. The locking operation is called |
this way: they are built on top of semaphores. From another point of |
view, they can be viewed as spinlocks without busy waiting. Their |
semaphore heritage provides good basics for both conditional operation |
and operation with timeout. The locking operation is called |
<emphasis>mutex_lock</emphasis>, the conditional locking operation is |
called <emphasis>mutex_trylock</emphasis> and the unlocking operation is |
called <emphasis>mutex_unlock</emphasis>.</para> |
209,26 → 213,26 |
|
<para>Reader/writer locks, or rwlocks, are by far the most complicated |
synchronization primitive within the kernel. The goal of these locks is |
to improve concurrency of applications in which threads need to |
synchronize access to a shared resource and that access can be |
to improve concurrency of applications, in which threads need to |
synchronize access to a shared resource, and that access can be |
partitioned into a read-only mode and a write mode. Reader/writer locks |
should make it possible for several, possibly many, readers to enter the |
critical section, provided that no writer is currently in the critical |
section, or to be in the critical section contemporarily. Writers are |
allowed to enter the critical section only individually, provided that |
no reader is in the critical section already. Applications in which the |
majority of operations can be done in the read-only mode can benefit |
no reader is in the critical section already. Applications, in which the |
majority of operations can be done in the read-only mode, can benefit |
from increased concurrency created by reader/writer locks.</para> |
|
<para>During reader/writer locks construction, a decision should be made |
<para>During reader/writer lock construction, a decision should be made |
whether readers will be prefered over writers or whether writers will be |
prefered over readers in cases when the lock is not currently held and |
both a reader and a writer want to gain the lock. Some operating systems |
prefer one group over the other, creating thus a possibility for |
starving the unprefered group. In the HelenOS operating system, none of |
the two groups is prefered. The lock is granted on the first come, first |
the two groups is prefered. The lock is granted on a first come, first |
served basis with the additional note that readers are granted the lock |
in biggest possible batches.</para> |
in the biggest possible batch.</para> |
|
<para>With this policy and the timeout modes of operation, the direct |
hand-off becomes much more complicated. For instance, a writer leaving |
235,15 → 239,15 |
the critical section must wake up all leading readers in the rwlock's |
wait queue or one leading writer or no-one if no thread is waiting. |
Similarily, the last reader leaving the critical section must wakeup the |
sleeping writer, if there are any sleeping threads at all. As another |
example, if a writer at the beginning of the rwlock's wait queue |
timeouts and the lock is held by at least one reader, the timeouting |
writer must first wake up all readers that follow him in the queue prior |
to signalling the timeout itself and giving up.</para> |
sleeping writer if there are any sleeping threads left at all. As |
another example, if a writer at the beginning of the rwlock's wait queue |
times out and the lock is held by at least one reader, the writer which |
has timed out must first wake up all readers that follow him in the |
queue prior to signalling the timeout itself and giving up.</para> |
|
<para>Because of the issues mentioned in the previous paragraph, the |
reader/writer locks imlpementation needs to walk the rwlock wait queue's |
list of sleeping threads directly in order to find out the type of |
<para>Due to the issues mentioned in the previous paragraph, the |
reader/writer lock imlpementation needs to walk the rwlock wait queue's |
list of sleeping threads directly, in order to find out the type of |
access that the queueing threads demand. This makes the code difficult |
to understand and dependent on the internal implementation of the wait |
queue. Nevertheless, it remains unclear to the authors of HelenOS |
258,7 → 262,7 |
the exclusive mutex. If it succeeds, the writer is granted the rwlock. |
However, if the operation fails (e.g. times out), the writer must check |
for potential readers at the head of the list of sleeping threads |
associated with the mutex's wait queue and proceed according to the |
associated with the mutex's wait queue and then proceed according to the |
procedure outlined above.</para> |
|
<para>The exclusive mutex plays an important role in reader |
268,19 → 272,19 |
|
<orderedlist> |
<listitem> |
<para>there are other readers in the critical section</para> |
<para>there are other readers in the critical section and</para> |
</listitem> |
|
<listitem> |
<para>there are no sleeping threads waiting for the exclusive |
mutex</para> |
mutex.</para> |
</listitem> |
</orderedlist> |
|
<para>If both conditions are true, the reader will bypass the mutex, |
increment the number of readers in the critical section and enter the |
critical section. Note that if there are any sleeping threads at the |
beginning of the wait queue, the first of them must be a writer. If the |
increment the number of readers in the critical section and then enter |
the critical section. Note that if there are any sleeping threads at the |
beginning of the wait queue, the first must be a writer. If the |
conditions are not fulfilled, the reader normally waits until the |
exclusive mutex is granted to it.</para> |
</section> |