Subversion Repositories HelenOS-doc

Rev

Rev 48 | Rev 62 | Go to most recent revision | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="sync">
  3.   <?dbhtml filename="sync.html"?>
  4.  
  5.   <title>Mutual exclusion and synchronization</title>
  6.  
  7.   <section>
  8.     <title>Introduction</title>
  9.  
  10.     <para>The HelenOS operating system is designed to make use of the
  11.     parallelism offered by the hardware and to exploit concurrency of both the
  12.     kernel and userspace tasks. This is achieved through multiprocessor
  13.     support and several levels of multiprogramming such as multitasking,
  14.     multithreading and also through userspace pseudo threads. However, such a
  15.     highly concurrent environment needs safe and efficient ways to handle
  16.     mutual exclusion and synchronization of many execution flows.</para>
  17.   </section>
  18.  
  19.   <section>
  20.     <title>Active kernel primitives</title>
  21.  
  22.     <section>
  23.       <title>Spinlocks</title>
  24.  
  25.       <para>The basic mutual exclusion primitive is the spinlock. The spinlock
  26.       implements active waiting for the availability of a memory lock (i.e.
  27.       simple variable) in a multiprocessor-safe manner. This safety is
  28.       achieved through the use of a specialized, architecture-dependent,
  29.       atomic test-and-set operation which either locks the spinlock (i.e. sets
  30.       the variable) or, provided that it is already locked, leaves it
  31.       unaltered. In any case, the test-and-set operation returns a value, thus
  32.       signalling either success (i.e. zero return value) or failure (i.e.
  33.       non-zero value) in acquiring the lock. Note that this makes a
  34.       fundamental difference between the naive algorithm that doesn't use the
  35.      atomic operation and the spinlock algortihm. While the naive algorithm
  36.      is prone to race conditions on SMP configurations and thus is completely
  37.      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
  38.      conditions and is suitable for mutual exclusion use.</para>
  39.  
  40.      <para>The semantics of the test-and-set operation is that the spinlock
  41.      remains unavailable until this operation called on the respective
  42.      spinlock returns zero. HelenOS builds two functions on top of the
  43.      test-and-set operation. The first function is the unconditional attempt
  44.      to acquire the spinlock and is called <code>spinlock_lock</code>. It
  45.      simply loops until the test-and-set returns a zero value. The other
  46.      function, <code>spinlock_trylock</code>, is the conditional lock
  47.      operation and calls the test-and-set only once to find out whether it
  48.      managed to acquire the spinlock or not. The conditional operation is
  49.      useful in situations in which an algorithm cannot acquire more spinlocks
  50.      in the proper order and a deadlock cannot be avoided. In such a case,
  51.      the algorithm would detect the danger and instead of possibly
  52.      deadlocking the system it would simply release some spinlocks it already
  53.      holds and retry the whole operation with the hope that it will succeed
  54.      next time. The unlock function, <code>spinlock_unlock</code>, is quite
  55.      easy - it merely clears the spinlock variable.</para>
  56.  
  57.      <para>Nevertheless, there is a special issue related to hardware
  58.      optimizations that modern processors implement. Particularly problematic
  59.      is the out-of-order execution of instructions within the critical
  60.      section protected by a spinlock. The processors are always
  61.      self-consistent so that they can carry out speculatively executed
  62.      instructions in the right order with regard to dependencies among those
  63.      instructions. However, the dependency between instructions inside the
  64.      critical section and those that implement locking and unlocking of the
  65.      respective spinlock is not implicit on some processor architectures. As
  66.      a result, the processor needs to be explicitly told about each
  67.      occurrence of such a dependency. Therefore, HelenOS adds
  68.      architecture-specific hooks to all <code>spinlock_lock</code>,
  69.      <code>spinlock_trylock</code> and <code>spinlock_unlock</code> functions
  70.      to prevent the instructions inside the critical section from permeating
  71.      out. On some architectures, these hooks can be void because the
  72.      dependencies are implicitly there because of the special properties of
  73.      locking and unlocking instructions. However, other architectures need to
  74.      instrument these hooks with different memory barriers, depending on what
  75.      operations could permeate out.</para>
  76.  
  77.      <para>Spinlocks have one significant drawback: when held for longer time
  78.      periods, they harm both parallelism and concurrency. The processor
  79.      executing <code>spinlock_lock</code> does not do any fruitful work and
  80.      is effectively halted until it can grab the lock and proceed.
  81.      Similarily, other execution flows cannot execute on the processor that
  82.      holds the spinlock because the kernel disables preemption on that
  83.      processor when a spinlock is held. The reason behind disabling
  84.      preemption is priority inversion problem avoidance. For the same reason,
  85.      threads are strongly discouraged from sleeping when they hold a
  86.      spinlock.</para>
  87.  
  88.      <para>To summarize, spinlocks represent very simple and essential mutual
  89.      exclusion primitive for SMP systems. On the other hand, spinlocks scale
  90.      poorly because of the active loop they are based on. Therefore,
  91.      spinlocks are used in HelenOS only for short-time mutual exclusion and
  92.      in cases where the mutual exclusion is required out of thread context.
  93.      Lastly, spinlocks are used in the construction of passive
  94.      synchronization primitives.</para>
  95.    </section>
  96.  </section>
  97.  
  98.  <section>
  99.    <title>Passive kernel synchronization</title>
  100.  
  101.    <section>
  102.      <title>Wait queues</title>
  103.  
  104.      <para>A wait queue is the basic passive synchronization primitive on
  105.      which all other passive synchronization primitives are built. Simply
  106.      put, it allows a thread to sleep until an event associated with the
  107.      particular wait queue occurs. Multiple threads are notified about
  108.      incoming events in a first come, first served fashion. Moreover, should
  109.      the event come before any thread waits for it, it is recorded in the
  110.      wait queue as a missed wakeup and later forwarded to the first thread
  111.      that decides to wait in the queue. The inner structures of the wait
  112.      queue are protected by a spinlock.</para>
  113.  
  114.      <para>The thread that wants to wait for a wait queue event uses the
  115.      <code>waitq_sleep_timeout</code> function. The algorithm then checks the
  116.      wait queue's counter of missed wakeups and if there are any missed
  117.       wakeups, the call returns immediately. The call also returns immediately
  118.       if only a conditional wait was requested. Otherwise the thread is
  119.       enqueued in the wait queue's list of sleeping threads and its state is
  120.      changed to <constant>Sleeping</constant>. It then sleeps until one of
  121.      the following events happens:</para>
  122.  
  123.      <orderedlist>
  124.        <listitem>
  125.          <para>another thread calls <code>waitq_wakeup</code> and the thread
  126.          is the first thread in the wait queue's list of sleeping
  127.           threads;</para>
  128.         </listitem>
  129.  
  130.         <listitem>
  131.           <para>another thread calls <code>waitq_interrupt_sleep</code> on the
  132.           sleeping thread;</para>
  133.         </listitem>
  134.  
  135.         <listitem>
  136.           <para>the sleep times out provided that none of the previous
  137.           occurred within a specified time limit; the limit can be
  138.           infinity.</para>
  139.         </listitem>
  140.       </orderedlist>
  141.  
  142.       <para>All five possibilities (immediate return on success, immediate
  143.       return on failure, wakeup after sleep, interruption and timeout) are
  144.       distinguishable by the return value of <code>waitq_sleep_timeout</code>.
  145.       Being able to interrupt a sleeping thread is essential for externally
  146.       initiated thread termination. The ability to wait only for a certain
  147.       amount of time is used, for instance, to passively delay thread
  148.       execution by several microseconds or even seconds in
  149.       <code>thread_sleep</code> function. Due to the fact that all other
  150.       passive kernel synchronization primitives are based on wait queues, they
  151.       also have the option of being interrutped and, more importantly, can
  152.       timeout. All of them also implement the conditional operation.
  153.       Furthemore, this very fundamental interface reaches up to the
  154.       implementation of futexes - userspace synchronization primitive, which
  155.       makes it possible for a userspace thread to request a synchronization
  156.       operation with a timeout or a conditional operation.</para>
  157.  
  158.       <para>From the description above, it should be apparent, that when a
  159.       sleeping thread is woken by <code>waitq_wakeup</code> or when
  160.       <code>waitq_sleep_timeout</code> succeeds immediately, the thread can be
  161.       sure that the event has occurred. The thread need not and should not
  162.       verify this fact. This approach is called direct hand-off and is
  163.       characteristic for all passive HelenOS synchronization primitives, with
  164.       the exception as described below.</para>
  165.     </section>
  166.  
  167.     <section>
  168.       <title>Semaphores</title>
  169.  
  170.       <para>The interesting point about wait queues is that the number of
  171.       missed wakeups is equal to the number of threads that will not block in
  172.       <code>watiq_sleep_timeout</code> and would immediately succeed instead.
  173.       On the other hand, semaphores are synchronization primitives that will
  174.       let predefined amount of threads into their critical section and block
  175.       any other threads above this count. However, these two cases are exactly
  176.       the same. Semaphores in HelenOS are therefore implemented as wait queues
  177.       with a single semantic change: their wait queue is initialized to have
  178.       so many missed wakeups as is the number of threads that the semphore
  179.       intends to let into its critical section simultaneously.</para>
  180.  
  181.       <para>In the semaphore language, the wait queue operation
  182.       <code>waitq_sleep_timeout</code> corresponds to semaphore
  183.       <code>down</code> operation, represented by the function
  184.       <code>semaphore_down_timeout</code> and by way of similitude the wait
  185.       queue operation waitq_wakeup corresponds to semaphore <code>up</code>
  186.       operation, represented by the function <code>sempafore_up</code>. The
  187.       conditional down operation is called
  188.       <code>semaphore_trydown</code>.</para>
  189.     </section>
  190.  
  191.     <section>
  192.       <title>Mutexes</title>
  193.  
  194.       <para>Mutexes are sometimes referred to as binary sempahores. That means
  195.       that mutexes are like semaphores that allow only one thread in its
  196.       critical section. Indeed, mutexes in HelenOS are implemented exactly in
  197.       this way: they are built on top of semaphores. From another point of
  198.       view, they can be viewed as spinlocks without busy waiting. Their
  199.       semaphore heritage provides good basics for both conditional operation
  200.       and operation with timeout. The locking operation is called
  201.       <code>mutex_lock</code>, the conditional locking operation is called
  202.       <code>mutex_trylock</code> and the unlocking operation is called
  203.       <code>mutex_unlock</code>.</para>
  204.     </section>
  205.  
  206.     <section>
  207.       <title>Reader/writer locks</title>
  208.  
  209.       <para>Reader/writer locks, or rwlocks, are by far the most complicated
  210.       synchronization primitive within the kernel. The goal of these locks is
  211.       to improve concurrency of applications, in which threads need to
  212.       synchronize access to a shared resource, and that access can be
  213.       partitioned into a read-only mode and a write mode. Reader/writer locks
  214.       should make it possible for several, possibly many, readers to enter the
  215.       critical section, provided that no writer is currently in the critical
  216.       section, or to be in the critical section contemporarily. Writers are
  217.       allowed to enter the critical section only individually, provided that
  218.       no reader is in the critical section already. Applications, in which the
  219.       majority of operations can be done in the read-only mode, can benefit
  220.       from increased concurrency created by reader/writer locks.</para>
  221.  
  222.       <para>During reader/writer lock construction, a decision should be made
  223.       whether readers will be prefered over writers or whether writers will be
  224.       prefered over readers in cases when the lock is not currently held and
  225.       both a reader and a writer want to gain the lock. Some operating systems
  226.       prefer one group over the other, creating thus a possibility for
  227.       starving the unprefered group. In the HelenOS operating system, none of
  228.       the two groups is prefered. The lock is granted on a first come, first
  229.       served basis with the additional note that readers are granted the lock
  230.       in the biggest possible batch.</para>
  231.  
  232.       <para>With this policy and the timeout modes of operation, the direct
  233.       hand-off becomes much more complicated. For instance, a writer leaving
  234.       the critical section must wake up all leading readers in the rwlock's
  235.      wait queue or one leading writer or no-one if no thread is waiting.
  236.      Similarily, the last reader leaving the critical section must wakeup the
  237.      sleeping writer if there are any sleeping threads left at all. As
  238.      another example, if a writer at the beginning of the rwlock's wait queue
  239.       times out and the lock is held by at least one reader, the writer which
  240.       has timed out must first wake up all readers that follow him in the
  241.       queue prior to signalling the timeout itself and giving up.</para>
  242.  
  243.       <para>Due to the issues mentioned in the previous paragraph, the
  244.       reader/writer lock imlpementation needs to walk the rwlock wait queue's
  245.      list of sleeping threads directly, in order to find out the type of
  246.      access that the queueing threads demand. This makes the code difficult
  247.      to understand and dependent on the internal implementation of the wait
  248.      queue. Nevertheless, it remains unclear to the authors of HelenOS
  249.      whether a simpler but equivalently fair solution exists.</para>
  250.  
  251.      <para>The implementation of rwlocks as it has been already put, makes
  252.      use of one single wait queue for both readers and writers, thus avoiding
  253.      any possibility of starvation. In fact, rwlocks use a mutex rather than
  254.      a bare wait queue. This mutex is called <code>exclusive</code> and is
  255.      used to synchronize writers. The writer's lock operation,
  256.       <code>rwlock_write_lock_timeout</code>, simply tries to acquire the
  257.       exclusive mutex. If it succeeds, the writer is granted the rwlock.
  258.       However, if the operation fails (e.g. times out), the writer must check
  259.       for potential readers at the head of the list of sleeping threads
  260.       associated with the mutex's wait queue and then proceed according to the
  261.      procedure outlined above.</para>
  262.  
  263.      <para>The exclusive mutex plays an important role in reader
  264.      synchronization as well. However, a reader doing the reader's lock
  265.       operation, <code>rwlock_read_lock_timeout</code>, may bypass this mutex
  266.       when it detects that:</para>
  267.  
  268.       <orderedlist>
  269.         <listitem>
  270.           <para>there are other readers in the critical section and</para>
  271.         </listitem>
  272.  
  273.         <listitem>
  274.           <para>there are no sleeping threads waiting for the exclusive
  275.           mutex.</para>
  276.         </listitem>
  277.       </orderedlist>
  278.  
  279.       <para>If both conditions are true, the reader will bypass the mutex,
  280.       increment the number of readers in the critical section and then enter
  281.       the critical section. Note that if there are any sleeping threads at the
  282.       beginning of the wait queue, the first must be a writer. If the
  283.       conditions are not fulfilled, the reader normally waits until the
  284.       exclusive mutex is granted to it.</para>
  285.     </section>
  286.  
  287.     <section>
  288.       <title>Condition variables</title>
  289.  
  290.       <para>Condition variables can be used for waiting until a condition
  291.       becomes true. In this respect, they are similar to wait queues. But
  292.       contrary to wait queues, condition variables have different semantics
  293.       that allows events to be lost when there is no thread waiting for them.
  294.       In order to support this, condition variables don't use direct hand-off
  295.      and operate in a way similar to the example below. A thread waiting for
  296.      the condition becoming true does the following:</para>
  297.  
  298.      <para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
  299. while (!<varname>condition</varname>)
  300.        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
  301. /* <remark>the condition is true, do something</remark> */
  302. <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
  303.  
  304.      <para>A thread that causes the condition become true signals this event
  305.      like this:</para>
  306.  
  307.      <para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
  308. <varname>condition</varname> = <constant>true</constant>;
  309. <function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
  310. <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
  311.  
  312.      <para>The wait operation, <code>condvar_wait_timeout</code>, always puts
  313.      the calling thread to sleep. The thread then sleeps until another thread
  314.      invokes <code>condvar_broadcast</code> on the same condition variable or
  315.      until it is woken up by <code>condvar_signal</code>. The
  316.      <code>condvar_signal</code> operation unblocks the first thread blocking
  317.      on the condition variable while the <code>condvar_broadcast</code>
  318.      operation unblocks all threads blocking there. If there are no blocking
  319.      threads, these two operations have no efect.</para>
  320.  
  321.      <para>Note that the threads must synchronize over a dedicated mutex. To
  322.      prevent race condition between <code>condvar_wait_timeout</code> and
  323.      <code>condvar_signal</code> or <code>condvar_broadcast</code>, the mutex
  324.      is passed to <code>condvar_wait_timeout</code> which then atomically
  325.      puts the calling thread asleep and unlocks the mutex. When the thread
  326.      eventually wakes up, <code>condvar_wait</code> regains the mutex and
  327.      returns.</para>
  328.  
  329.      <para>Also note, that there is no conditional operation for condition
  330.      variables. Such an operation would make no sence since condition
  331.      variables are defined to forget events for which there is no waiting
  332.      thread and because <code>condvar_wait</code> must always go to sleep.
  333.      The operation with timeout is supported as usually.</para>
  334.  
  335.      <para>In HelenOS, condition variables are based on wait queues. As it is
  336.      already mentioned above, wait queues remember missed events while
  337.      condition variables must not do so. This is reasoned by the fact that
  338.      condition variables are designed for scenarios in which an event might
  339.      occur very many times without being picked up by any waiting thread. On
  340.      the other hand, wait queues would remember any event that had not been
  341.      picked up by a call to <code>waitq_sleep_timeout</code>. Therefore, if
  342.      wait queues were used directly and without any changes to implement
  343.      condition variables, the missed_wakeup counter would hurt performance of
  344.      the implementation: the <code>while</code> loop in
  345.      <code>condvar_wait_timeout</code> would effectively do busy waiting
  346.      until all missed wakeups were discarded.</para>
  347.  
  348.      <para>The requirement on the wait operation to atomically put the caller
  349.      to sleep and release the mutex poses an interesting problem on
  350.      <code>condvar_wait_timeout</code>. More precisely, the thread should
  351.      sleep in the condvar's wait queue prior to releasing the mutex, but it
  352.       must not hold the mutex when it is sleeping.</para>
  353.  
  354.       <para>Problems described in the two previous paragraphs are addressed in
  355.       HelenOS by dividing the <code>waitq_sleep_timeout</code> function into
  356.       three pieces:</para>
  357.  
  358.       <orderedlist>
  359.         <listitem>
  360.           <para><code>waitq_sleep_prepare</code> prepares the thread to go to
  361.           sleep by, among other things, locking the wait queue;</para>
  362.         </listitem>
  363.  
  364.         <listitem>
  365.           <para><code>waitq_sleep_timeout_unsafe</code> implements the core
  366.           blocking logic;</para>
  367.         </listitem>
  368.  
  369.         <listitem>
  370.           <para><code>waitq_sleep_finish</code> performs cleanup after
  371.           <code>waitq_sleep_timeout_unsafe</code>; after this call, the wait
  372.           queue spinlock is guaranteed to be unlocked by the caller</para>
  373.         </listitem>
  374.       </orderedlist>
  375.  
  376.       <para>The stock <code>waitq_sleep_timeout</code> is then a mere wrapper
  377.       that calls these three functions. It is provided for convenience in
  378.       cases where the caller doesn't require such a low level control.
  379.      However, the implementation of <code>condvar_wait_timeout</code> does
  380.      need this finer-grained control because it has to interleave calls to
  381.      these functions by other actions. It carries its operations out in the
  382.      following order:</para>
  383.  
  384.      <orderedlist>
  385.        <listitem>
  386.          <para>calls <code>waitq_sleep_prepare</code> in order to lock the
  387.          condition variable's wait queue,</para>
  388.         </listitem>
  389.  
  390.         <listitem>
  391.           <para>releases the mutex,</para>
  392.         </listitem>
  393.  
  394.         <listitem>
  395.           <para>clears the counter of missed wakeups,</para>
  396.         </listitem>
  397.  
  398.         <listitem>
  399.           <para>calls <code>waitq_sleep_timeout_unsafe</code>,</para>
  400.         </listitem>
  401.  
  402.         <listitem>
  403.           <para>retakes the mutex,</para>
  404.         </listitem>
  405.  
  406.         <listitem>
  407.           <para>calls <code>waitq_sleep_finish</code>.</para>
  408.         </listitem>
  409.       </orderedlist>
  410.     </section>
  411.   </section>
  412.  
  413.   <section>
  414.     <title>Userspace synchronization</title>
  415.  
  416.     <section>
  417.       <title>Futexes</title>
  418.  
  419.       <para></para>
  420.     </section>
  421.   </section>
  422. </chapter>