Subversion Repositories HelenOS-doc

Rev

Rev 45 | Rev 57 | Go to most recent revision | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="sync">
  3.   <?dbhtml filename="sync.html"?>
  4.  
  5.   <title>Mutual exclusion and synchronization</title>
  6.  
  7.   <section>
  8.     <title>Introduction</title>
  9.  
  10.     <para>The HelenOS operating system is designed to make use of the
  11.     parallelism offered by the hardware and to exploit concurrency of both the
  12.     kernel and userspace tasks. This is achieved through multiprocessor
  13.     support and several levels of multiprogramming such as multitasking,
  14.     multithreading and also through userspace pseudo threads. However, such a
  15.     highly concurrent environment needs safe and efficient ways to handle
  16.     mutual exclusion and synchronization of many execution flows.</para>
  17.   </section>
  18.  
  19.   <section>
  20.     <title>Active kernel primitives</title>
  21.  
  22.     <section>
  23.       <title>Spinlocks</title>
  24.  
  25.       <para>The basic mutual exclusion primitive is the spinlock. The spinlock
  26.       implements active waiting for the availability of a memory lock (i.e.
  27.       simple variable) in a multiprocessor-safe manner. This safety is
  28.       achieved through the use of a specialized, architecture-dependent,
  29.       atomic test-and-set operation which either locks the spinlock (i.e. sets
  30.       the variable) or, provided that it is already locked, leaves it
  31.       unaltered. In any case, the test-and-set operation returns a value, thus
  32.       signalling either success (i.e. zero return value) or failure (i.e.
  33.       non-zero value) in acquiring the lock. Note that this makes a
  34.       fundamental difference between the naive algorithm that doesn't use the
  35.      atomic operation and the spinlock algortihm. While the naive algorithm
  36.      is prone to race conditions on SMP configurations and thus is completely
  37.      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
  38.      conditions and is suitable for mutual exclusion use.</para>
  39.  
  40.      <para>The semantics of the test-and-set operation is that the spinlock
  41.      remains unavailable until this operation called on the respective
  42.      spinlock returns zero. HelenOS builds two functions on top of the
  43.      test-and-set operation. The first function is the unconditional attempt
  44.      to acquire the spinlock and is called
  45.      <emphasis>spinlock_lock</emphasis>. It simply loops until the
  46.      test-and-set returns a zero value. The other function,
  47.      <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
  48.      and calls the test-and-set only once to find out whether it managed to
  49.      acquire the spinlock or not. The conditional operation is useful in
  50.      situations in which an algorithm cannot acquire more spinlocks in the
  51.      proper order and a deadlock cannot be avoided. In such a case, the
  52.      algorithm would detect the danger and instead of possibly deadlocking
  53.      the system it would simply release some spinlocks it already holds and
  54.      retry the whole operation with the hope that it will succeed next time.
  55.      The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy
  56.      - it merely clears the spinlock variable.</para>
  57.  
  58.      <para>Nevertheless, there is a special issue related to hardware
  59.      optimizations that modern processors implement. Particularly problematic
  60.      is the out-of-order execution of instructions within the critical
  61.      section protected by a spinlock. The processors are always
  62.      self-consistent so that they can carry out speculatively executed
  63.      instructions in the right order with regard to dependencies among those
  64.      instructions. However, the dependency between instructions inside the
  65.      critical section and those that implement locking and unlocking of the
  66.      respective spinlock is not implicit on some processor architectures. As
  67.      a result, the processor needs to be explicitly told about each
  68.      occurrence of such a dependency. Therefore, HelenOS adds
  69.      architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>,
  70.      <emphasis>spinlock_trylock</emphasis> and
  71.      <emphasis>spinlock_unlock</emphasis> functions to prevent the
  72.      instructions inside the critical section from permeating out. On some
  73.      architectures, these hooks can be void because the dependencies are
  74.      implicitly there because of the special properties of locking and
  75.      unlocking instructions. However, other architectures need to instrument
  76.      these hooks with different memory barriers, depending on what operations
  77.      could permeate out.</para>
  78.  
  79.      <para>Spinlocks have one significant drawback: when held for longer time
  80.      periods, they harm both parallelism and concurrency. The processor
  81.      executing <emphasis>spinlock_lock</emphasis> does not do any fruitful
  82.      work and is effectively halted until it can grab the lock and proceed.
  83.      Similarily, other execution flows cannot execute on the processor that
  84.      holds the spinlock because the kernel disables preemption on that
  85.      processor when a spinlock is held. The reason behind disabling
  86.      preemption is priority inversion problem avoidance. For the same reason,
  87.      threads are strongly discouraged from sleeping when they hold a
  88.      spinlock.</para>
  89.  
  90.      <para>To summarize, spinlocks represent very simple and essential mutual
  91.      exclusion primitive for SMP systems. On the other hand, spinlocks scale
  92.      poorly because of the active loop they are based on. Therefore,
  93.      spinlocks are used in HelenOS only for short-time mutual exclusion and
  94.      in cases where the mutual exclusion is required out of thread context.
  95.      Lastly, spinlocks are used in the construction of passive
  96.      synchronization primitives.</para>
  97.    </section>
  98.  </section>
  99.  
  100.  <section>
  101.    <title>Passive kernel synchronization</title>
  102.  
  103.    <section>
  104.      <title>Wait queues</title>
  105.  
  106.      <para>A wait queue is the basic passive synchronization primitive on
  107.      which all other passive synchronization primitives are built. Simply
  108.      put, it allows a thread to sleep until an event associated with the
  109.      particular wait queue occurs. Multiple threads are notified about
  110.      incoming events in a first come, first served fashion. Moreover, should
  111.      the event come before any thread waits for it, it is recorded in the
  112.      wait queue as a missed wakeup and later forwarded to the first thread
  113.      that decides to wait in the queue. The inner structures of the wait
  114.      queue are protected by a spinlock.</para>
  115.  
  116.      <para>The thread that wants to wait for a wait queue event uses the
  117.      <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
  118.      checks the wait queue's counter of missed wakeups and if there are any
  119.       missed wakeups, the call returns immediately. The call also returns
  120.       immediately if only a conditional wait was requested. Otherwise the
  121.       thread is enqueued in the wait queue's list of sleeping threads and its
  122.      state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
  123.      one of the following events happens:</para>
  124.  
  125.      <orderedlist>
  126.        <listitem>
  127.          <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
  128.          thread is the first thread in the wait queue's list of sleeping
  129.           threads;</para>
  130.         </listitem>
  131.  
  132.         <listitem>
  133.           <para>another thread calls
  134.           <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
  135.           thread;</para>
  136.         </listitem>
  137.  
  138.         <listitem>
  139.           <para>the sleep times out provided that none of the previous
  140.           occurred within a specified time limit; the limit can be
  141.           infinity.</para>
  142.         </listitem>
  143.       </orderedlist>
  144.  
  145.       <para>All five possibilities (immediate return on success, immediate
  146.       return on failure, wakeup after sleep, interruption and timeout) are
  147.       distinguishable by the return value of
  148.       <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a
  149.       sleeping thread is essential for externally initiated thread
  150.       termination. The ability to wait only for a certain amount of time is
  151.       used, for instance, to passively delay thread execution by several
  152.       microseconds or even seconds in <emphasis>thread_sleep</emphasis>
  153.       function. Due to the fact that all other passive kernel synchronization
  154.       primitives are based on wait queues, they also have the option of being
  155.       interrutped and, more importantly, can timeout. All of them also
  156.       implement the conditional operation. Furthemore, this very fundamental
  157.       interface reaches up to the implementation of futexes - userspace
  158.       synchronization primitive, which makes it possible for a userspace
  159.       thread to request a synchronization operation with a timeout or a
  160.       conditional operation.</para>
  161.  
  162.       <para>From the description above, it should be apparent, that when a
  163.       sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
  164.       <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the
  165.       thread can be sure that the event has occurred. The thread need not and
  166.       should not verify this fact. This approach is called direct hand-off and
  167.       is characteristic for all passive HelenOS synchronization primitives,
  168.       with the exception as described below.</para>
  169.     </section>
  170.  
  171.     <section>
  172.       <title>Semaphores</title>
  173.  
  174.       <para>The interesting point about wait queues is that the number of
  175.       missed wakeups is equal to the number of threads that will not block in
  176.       <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
  177.       instead. On the other hand, semaphores are synchronization primitives
  178.       that will let predefined amount of threads into their critical section
  179.       and block any other threads above this count. However, these two cases
  180.       are exactly the same. Semaphores in HelenOS are therefore implemented as
  181.       wait queues with a single semantic change: their wait queue is
  182.       initialized to have so many missed wakeups as is the number of threads
  183.       that the semphore intends to let into its critical section
  184.       simultaneously.</para>
  185.  
  186.       <para>In the semaphore language, the wait queue operation
  187.       <emphasis>waitq_sleep_timeout</emphasis> corresponds to
  188.       <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
  189.       represented by the function <emphasis>semaphore_down_timeout</emphasis>
  190.       and by way of similitude the wait queue operation waitq_wakeup
  191.       corresponds to semaphore <emphasis>up</emphasis> operation, represented
  192.       by the function <emphasis>sempafore_up</emphasis>. The conditional down
  193.       operation is called <emphasis>semaphore_trydown</emphasis>.</para>
  194.     </section>
  195.  
  196.     <section>
  197.       <title>Mutexes</title>
  198.  
  199.       <para>Mutexes are sometimes referred to as binary sempahores. That means
  200.       that mutexes are like semaphores that allow only one thread in its
  201.       critical section. Indeed, mutexes in HelenOS are implemented exactly in
  202.       this way: they are built on top of semaphores. From another point of
  203.       view, they can be viewed as spinlocks without busy waiting. Their
  204.       semaphore heritage provides good basics for both conditional operation
  205.       and operation with timeout. The locking operation is called
  206.       <emphasis>mutex_lock</emphasis>, the conditional locking operation is
  207.       called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
  208.       called <emphasis>mutex_unlock</emphasis>.</para>
  209.     </section>
  210.  
  211.     <section>
  212.       <title>Reader/writer locks</title>
  213.  
  214.       <para>Reader/writer locks, or rwlocks, are by far the most complicated
  215.       synchronization primitive within the kernel. The goal of these locks is
  216.       to improve concurrency of applications, in which threads need to
  217.       synchronize access to a shared resource, and that access can be
  218.       partitioned into a read-only mode and a write mode. Reader/writer locks
  219.       should make it possible for several, possibly many, readers to enter the
  220.       critical section, provided that no writer is currently in the critical
  221.       section, or to be in the critical section contemporarily. Writers are
  222.       allowed to enter the critical section only individually, provided that
  223.       no reader is in the critical section already. Applications, in which the
  224.       majority of operations can be done in the read-only mode, can benefit
  225.       from increased concurrency created by reader/writer locks.</para>
  226.  
  227.       <para>During reader/writer lock construction, a decision should be made
  228.       whether readers will be prefered over writers or whether writers will be
  229.       prefered over readers in cases when the lock is not currently held and
  230.       both a reader and a writer want to gain the lock. Some operating systems
  231.       prefer one group over the other, creating thus a possibility for
  232.       starving the unprefered group. In the HelenOS operating system, none of
  233.       the two groups is prefered. The lock is granted on a first come, first
  234.       served basis with the additional note that readers are granted the lock
  235.       in the biggest possible batch.</para>
  236.  
  237.       <para>With this policy and the timeout modes of operation, the direct
  238.       hand-off becomes much more complicated. For instance, a writer leaving
  239.       the critical section must wake up all leading readers in the rwlock's
  240.      wait queue or one leading writer or no-one if no thread is waiting.
  241.      Similarily, the last reader leaving the critical section must wakeup the
  242.      sleeping writer if there are any sleeping threads left at all. As
  243.      another example, if a writer at the beginning of the rwlock's wait queue
  244.       times out and the lock is held by at least one reader, the writer which
  245.       has timed out must first wake up all readers that follow him in the
  246.       queue prior to signalling the timeout itself and giving up.</para>
  247.  
  248.       <para>Due to the issues mentioned in the previous paragraph, the
  249.       reader/writer lock imlpementation needs to walk the rwlock wait queue's
  250.      list of sleeping threads directly, in order to find out the type of
  251.      access that the queueing threads demand. This makes the code difficult
  252.      to understand and dependent on the internal implementation of the wait
  253.      queue. Nevertheless, it remains unclear to the authors of HelenOS
  254.      whether a simpler but equivalently fair solution exists.</para>
  255.  
  256.      <para>The implementation of rwlocks as it has been already put, makes
  257.      use of one single wait queue for both readers and writers, thus avoiding
  258.      any possibility of starvation. In fact, rwlocks use a mutex rather than
  259.      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
  260.      and is used to synchronize writers. The writer's lock operation,
  261.       <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
  262.       the exclusive mutex. If it succeeds, the writer is granted the rwlock.
  263.       However, if the operation fails (e.g. times out), the writer must check
  264.       for potential readers at the head of the list of sleeping threads
  265.       associated with the mutex's wait queue and then proceed according to the
  266.      procedure outlined above.</para>
  267.  
  268.      <para>The exclusive mutex plays an important role in reader
  269.      synchronization as well. However, a reader doing the reader's lock
  270.       operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
  271.       this mutex when it detects that:</para>
  272.  
  273.       <orderedlist>
  274.         <listitem>
  275.           <para>there are other readers in the critical section and</para>
  276.         </listitem>
  277.  
  278.         <listitem>
  279.           <para>there are no sleeping threads waiting for the exclusive
  280.           mutex.</para>
  281.         </listitem>
  282.       </orderedlist>
  283.  
  284.       <para>If both conditions are true, the reader will bypass the mutex,
  285.       increment the number of readers in the critical section and then enter
  286.       the critical section. Note that if there are any sleeping threads at the
  287.       beginning of the wait queue, the first must be a writer. If the
  288.       conditions are not fulfilled, the reader normally waits until the
  289.       exclusive mutex is granted to it.</para>
  290.     </section>
  291.  
  292.     <section>
  293.       <title>Condition variables</title>
  294.  
  295.       <para>Condition variables can be used for waiting until a condition
  296.       becomes true. In this respect, they are similar to wait queues. But
  297.       contrary to wait queues, condition variables have different semantics
  298.       that allows events to be lost when there is no thread waiting for them.
  299.       In order to support this, condition variables don't use direct hand-off
  300.      and operate in a way similar to the example below. A thread waiting for
  301.      the condition becoming true does the following:</para>
  302.  
  303.      <para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
  304. while (!<varname>condition</varname>)
  305.        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
  306. /* <remark>the condition is true, do something</remark> */
  307. <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
  308.  
  309.      <para>A thread that causes the condition become true signals this event
  310.      like this:</para>
  311.  
  312.      <para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
  313. <varname>condition</varname> = <constant>true</constant>;
  314. <function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
  315. <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
  316.  
  317.      <para>The wait operation, <emphasis>condvar_wait_timeout</emphasis>,
  318.      always puts the calling thread to sleep. The thread then sleeps until
  319.      another thread invokes <emphasis>condvar_broadcast</emphasis> on the
  320.      same condition variable or until it is woken up by
  321.      <emphasis>condvar_signal</emphasis>. The
  322.      <emphasis>condvar_signal</emphasis> operation unblocks the first thread
  323.      blocking on the condition variable while the
  324.      <emphasis>condvar_broadcast</emphasis> operation unblocks all threads
  325.      blocking there. If there are no blocking threads, these two operations
  326.      have no efect.</para>
  327.  
  328.      <para>Note that the threads must synchronize over a dedicated mutex. To
  329.      prevent race condition between <emphasis>condvar_wait_timeout</emphasis>
  330.      and <emphasis>condvar_signal</emphasis> or
  331.      <emphasis>condvar_broadcast</emphasis>, the mutex is passed to
  332.      <emphasis>condvar_wait_timeout</emphasis> which then atomically puts the
  333.      calling thread asleep and unlocks the mutex. When the thread eventually
  334.      wakes up, <emphasis>condvar_wait</emphasis> regains the mutex and
  335.      returns.</para>
  336.  
  337.      <para>Also note, that there is no conditional operation for condition
  338.      variables. Such an operation would make no sence since condition
  339.      variables are defined to forget events for which there is no waiting
  340.      thread and because <emphasis>condvar_wait</emphasis> must always go to
  341.      sleep. The operation with timeout is supported as usually.</para>
  342.  
  343.      <para>In HelenOS, condition variables are based on wait queues. As it is
  344.      already mentioned above, wait queues remember missed events while
  345.      condition variables must not do so. This is reasoned by the fact that
  346.      condition variables are designed for scenarios in which an event might
  347.      occur very many times without being picked up by any waiting thread. On
  348.      the other hand, wait queues would remember any event that had not been
  349.      picked up by a call to <emphasis>waitq_sleep_timeout</emphasis>.
  350.      Therefore, if wait queues were used directly and without any changes to
  351.      implement condition variables, the missed_wakeup counter would hurt
  352.      performance of the implementation: the <code>while</code> loop in
  353.      <emphasis>condvar_wait_timeout</emphasis> would effectively do busy
  354.      waiting until all missed wakeups were discarded.</para>
  355.  
  356.      <para>The requirement on the wait operation to atomically put the caller
  357.      to sleep and release the mutex poses an interesting problem on
  358.      <emphasis>condvar_wait_timeout</emphasis>. More precisely, the thread
  359.      should sleep in the condvar's wait queue prior to releasing the mutex,
  360.       but it must not hold the mutex when it is sleeping.</para>
  361.  
  362.       <para>Problems described in the two previous paragraphs are addressed in
  363.       HelenOS by dividing the <emphasis>waitq_sleep_timeout</emphasis>
  364.       function into three pieces:</para>
  365.  
  366.       <orderedlist>
  367.         <listitem>
  368.           <para><emphasis>waitq_sleep_prepare</emphasis> prepares the thread
  369.           to go to sleep by, among other things, locking the wait
  370.           queue;</para>
  371.         </listitem>
  372.  
  373.         <listitem>
  374.           <para><emphasis>waitq_sleep_timeout_unsafe</emphasis> implements the
  375.           core blocking logic;</para>
  376.         </listitem>
  377.  
  378.         <listitem>
  379.           <para><emphasis>waitq_sleep_finish</emphasis> performs cleanup after
  380.           <emphasis>waitq_sleep_timeout_unsafe</emphasis>; after this call,
  381.           the wait queue spinlock is guaranteed to be unlocked by the
  382.           caller</para>
  383.         </listitem>
  384.       </orderedlist>
  385.  
  386.       <para>The stock <emphasis>waitq_sleep_timeout</emphasis> is then a mere
  387.       wrapper that calls these three functions. It is provided for convenience
  388.       in cases where the caller doesn't require such a low level control.
  389.      However, the implementation of <emphasis>condvar_wait_timeout</emphasis>
  390.      does need this finer-grained control because it has to interleave calls
  391.      to these functions by other actions. It carries its operations out in
  392.      the following order:</para>
  393.  
  394.      <orderedlist>
  395.        <listitem>
  396.          <para>calls <emphasis>waitq_sleep_prepare</emphasis> in order to
  397.          lock the condition variable's wait queue,</para>
  398.         </listitem>
  399.  
  400.         <listitem>
  401.           <para>releases the mutex,</para>
  402.         </listitem>
  403.  
  404.         <listitem>
  405.           <para>clears the counter of missed wakeups,</para>
  406.         </listitem>
  407.  
  408.         <listitem>
  409.           <para>calls <emphasis>waitq_sleep_timeout_unsafe</emphasis>,</para>
  410.         </listitem>
  411.  
  412.         <listitem>
  413.           <para>retakes the mutex,</para>
  414.         </listitem>
  415.  
  416.         <listitem>
  417.           <para>calls <emphasis>waitq_sleep_finish</emphasis>.</para>
  418.         </listitem>
  419.       </orderedlist>
  420.     </section>
  421.   </section>
  422.  
  423.   <section>
  424.     <title>Userspace synchronization</title>
  425.  
  426.     <section>
  427.       <title>Futexes</title>
  428.  
  429.       <para></para>
  430.     </section>
  431.   </section>
  432. </chapter>