Subversion Repositories HelenOS-doc

Rev

Rev 86 | Rev 131 | Go to most recent revision | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="sync">
  3.   <?dbhtml filename="sync.html"?>
  4.  
  5.   <title>Synchronization</title>
  6.  
  7.   <section>
  8.     <title>Introduction</title>
  9.  
  10.     <para>The HelenOS operating system is designed to make use of the
  11.     parallelism offered by the hardware and to exploit concurrency of both the
  12.     kernel and userspace tasks. This is achieved through multiprocessor
  13.     support and several levels of multiprogramming such as multitasking,
  14.     multithreading and also through userspace pseudo threads. However, such a
  15.     highly concurrent environment needs safe and efficient ways to handle
  16.     mutual exclusion and synchronization of many execution flows.</para>
  17.   </section>
  18.  
  19.   <section>
  20.     <title>Active kernel primitives</title>
  21.  
  22.     <section>
  23.       <indexterm>
  24.         <primary>synchronization</primary>
  25.  
  26.         <secondary>- spinlock</secondary>
  27.       </indexterm>
  28.  
  29.       <title>Spinlocks</title>
  30.  
  31.       <para>The basic mutual exclusion primitive is the spinlock. The spinlock
  32.       implements active waiting for the availability of a memory lock (i.e.
  33.       simple variable) in a multiprocessor-safe manner. This safety is
  34.       achieved through the use of a specialized, architecture-dependent,
  35.       atomic test-and-set operation which either locks the spinlock (i.e. sets
  36.       the variable) or, provided that it is already locked, leaves it
  37.       unaltered. In any case, the test-and-set operation returns a value, thus
  38.       signalling either success (i.e. zero return value) or failure (i.e.
  39.       non-zero value) in acquiring the lock. Note that this makes a
  40.       fundamental difference between the naive algorithm that doesn't use the
  41.      atomic operation and the spinlock algortihm. While the naive algorithm
  42.      is prone to race conditions on SMP configurations and thus is completely
  43.      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
  44.      conditions and is suitable for mutual exclusion use.</para>
  45.  
  46.      <para>The semantics of the test-and-set operation is that the spinlock
  47.      remains unavailable until this operation called on the respective
  48.      spinlock returns zero. HelenOS builds two functions on top of the
  49.      test-and-set operation. The first function is the unconditional attempt
  50.      to acquire the spinlock and is called <code>spinlock_lock()</code>. It
  51.      simply loops until the test-and-set returns a zero value. The other
  52.      function, <code>spinlock_trylock()</code>, is the conditional lock
  53.      operation and calls the test-and-set only once to find out whether it
  54.      managed to acquire the spinlock or not. The conditional operation is
  55.      useful in situations in which an algorithm cannot acquire more spinlocks
  56.      in the proper order and a deadlock cannot be avoided. In such a case,
  57.      the algorithm would detect the danger and instead of possibly
  58.      deadlocking the system it would simply release some spinlocks it already
  59.      holds and retry the whole operation with the hope that it will succeed
  60.      next time. The unlock function, <code>spinlock_unlock()</code>, is quite
  61.      easy - it merely clears the spinlock variable.</para>
  62.  
  63.      <para>Nevertheless, there is a special issue related to hardware
  64.      optimizations that modern processors implement. Particularly problematic
  65.      is the out-of-order execution of instructions within the critical
  66.      section protected by a spinlock. The processors are always
  67.      self-consistent so that they can carry out speculatively executed
  68.      instructions in the right order with regard to dependencies among those
  69.      instructions. However, the dependency between instructions inside the
  70.      critical section and those that implement locking and unlocking of the
  71.      respective spinlock is not implicit on some processor architectures. As
  72.      a result, the processor needs to be explicitly told about each
  73.      occurrence of such a dependency. Therefore, HelenOS adds
  74.      architecture-specific hooks to all <code>spinlock_lock()</code>,
  75.      <code>spinlock_trylock()</code> and <code>spinlock_unlock()</code> functions
  76.      to prevent the instructions inside the critical section from permeating
  77.      out. On some architectures, these hooks can be void because the
  78.      dependencies are implicitly there because of the special properties of
  79.      locking and unlocking instructions. However, other architectures need to
  80.      instrument these hooks with different memory barriers, depending on what
  81.      operations could permeate out.</para>
  82.  
  83.      <para>Spinlocks have one significant drawback: when held for longer time
  84.      periods, they harm both parallelism and concurrency. The processor
  85.      executing <code>spinlock_lock()</code> does not do any fruitful work and
  86.      is effectively halted until it can grab the lock and proceed.
  87.      Similarily, other execution flows cannot execute on the processor that
  88.      holds the spinlock because the kernel disables preemption on that
  89.      processor when a spinlock is held. The reason behind disabling
  90.      preemption is priority inversion problem avoidance. For the same reason,
  91.      threads are strongly discouraged from sleeping when they hold a
  92.      spinlock.</para>
  93.  
  94.      <para>To summarize, spinlocks represent very simple and essential mutual
  95.      exclusion primitive for SMP systems. On the other hand, spinlocks scale
  96.      poorly because of the active loop they are based on. Therefore,
  97.      spinlocks are used in HelenOS only for short-time mutual exclusion and
  98.      in cases where the mutual exclusion is required out of thread context.
  99.      Lastly, spinlocks are used in the construction of passive
  100.      synchronization primitives.</para>
  101.    </section>
  102.  </section>
  103.  
  104.  <section>
  105.    <title>Passive kernel synchronization</title>
  106.  
  107.    <section>
  108.      <indexterm>
  109.        <primary>synchronization</primary>
  110.  
  111.        <secondary>- wait queue</secondary>
  112.      </indexterm>
  113.  
  114.      <title>Wait queues</title>
  115.  
  116.      <para>A wait queue is the basic passive synchronization primitive on
  117.      which all other passive synchronization primitives are built. Simply
  118.      put, it allows a thread to sleep until an event associated with the
  119.      particular wait queue occurs. Multiple threads are notified about
  120.      incoming events in a first come, first served fashion. Moreover, should
  121.      the event come before any thread waits for it, it is recorded in the
  122.      wait queue as a missed wakeup and later forwarded to the first thread
  123.      that decides to wait in the queue. The inner structures of the wait
  124.      queue are protected by a spinlock.</para>
  125.  
  126.      <para>The thread that wants to wait for a wait queue event uses the
  127.      <code>waitq_sleep_timeout()</code> function. The algorithm then checks the
  128.      wait queue's counter of missed wakeups and if there are any missed
  129.       wakeups, the call returns immediately. The call also returns immediately
  130.       if only a conditional wait was requested. Otherwise the thread is
  131.       enqueued in the wait queue's list of sleeping threads and its state is
  132.      changed to <constant>Sleeping</constant>. It then sleeps until one of
  133.      the following events happens:</para>
  134.  
  135.      <orderedlist>
  136.        <listitem>
  137.          <para>another thread calls <code>waitq_wakeup()</code> and the thread
  138.          is the first thread in the wait queue's list of sleeping
  139.           threads;</para>
  140.         </listitem>
  141.  
  142.         <listitem>
  143.           <para>another thread calls <code>waitq_interrupt_sleep()</code> on the
  144.           sleeping thread;</para>
  145.         </listitem>
  146.  
  147.         <listitem>
  148.           <para>the sleep times out provided that none of the previous
  149.           occurred within a specified time limit; the limit can be
  150.           infinity.</para>
  151.         </listitem>
  152.       </orderedlist>
  153.  
  154.       <para>All five possibilities (immediate return on success, immediate
  155.       return on failure, wakeup after sleep, interruption and timeout) are
  156.       distinguishable by the return value of <code>waitq_sleep_timeout()</code>.
  157.       Being able to interrupt a sleeping thread is essential for externally
  158.       initiated thread termination. The ability to wait only for a certain
  159.       amount of time is used, for instance, to passively delay thread
  160.       execution by several microseconds or even seconds in
  161.       <code>thread_sleep()</code> function. Due to the fact that all other
  162.       passive kernel synchronization primitives are based on wait queues, they
  163.       also have the option of being interrutped and, more importantly, can
  164.       timeout. All of them also implement the conditional operation.
  165.       Furthemore, this very fundamental interface reaches up to the
  166.       implementation of futexes - userspace synchronization primitive, which
  167.       makes it possible for a userspace thread to request a synchronization
  168.       operation with a timeout or a conditional operation.</para>
  169.  
  170.       <para>From the description above, it should be apparent, that when a
  171.       sleeping thread is woken by <code>waitq_wakeup()</code> or when
  172.       <code>waitq_sleep_timeout()</code> succeeds immediately, the thread can be
  173.       sure that the event has occurred. The thread need not and should not
  174.       verify this fact. This approach is called direct hand-off and is
  175.       characteristic for all passive HelenOS synchronization primitives, with
  176.       the exception as described below.</para>
  177.     </section>
  178.  
  179.     <section>
  180.       <indexterm>
  181.         <primary>synchronization</primary>
  182.  
  183.         <secondary>- semaphore</secondary>
  184.       </indexterm>
  185.  
  186.       <title>Semaphores</title>
  187.  
  188.       <para>The interesting point about wait queues is that the number of
  189.       missed wakeups is equal to the number of threads that will not block in
  190.       <code>watiq_sleep_timeout()</code> and would immediately succeed instead.
  191.       On the other hand, semaphores are synchronization primitives that will
  192.       let predefined amount of threads into their critical section and block
  193.       any other threads above this count. However, these two cases are exactly
  194.       the same. Semaphores in HelenOS are therefore implemented as wait queues
  195.       with a single semantic change: their wait queue is initialized to have
  196.       so many missed wakeups as is the number of threads that the semphore
  197.       intends to let into its critical section simultaneously.</para>
  198.  
  199.       <para>In the semaphore language, the wait queue operation
  200.       <code>waitq_sleep_timeout()</code> corresponds to semaphore
  201.       <code>down</code> operation, represented by the function
  202.       <code>semaphore_down_timeout()</code> and by way of similitude the wait
  203.       queue operation waitq_wakeup corresponds to semaphore <code>up</code>
  204.       operation, represented by the function <code>sempafore_up()</code>. The
  205.       conditional down operation is called
  206.       <code>semaphore_trydown()</code>.</para>
  207.     </section>
  208.  
  209.     <section>
  210.       <title>Mutexes</title>
  211.  
  212.       <indexterm>
  213.         <primary>synchronization</primary>
  214.  
  215.         <secondary>- mutex</secondary>
  216.       </indexterm>
  217.  
  218.       <para>Mutexes are sometimes referred to as binary sempahores. That means
  219.       that mutexes are like semaphores that allow only one thread in its
  220.       critical section. Indeed, mutexes in HelenOS are implemented exactly in
  221.       this way: they are built on top of semaphores. From another point of
  222.       view, they can be viewed as spinlocks without busy waiting. Their
  223.       semaphore heritage provides good basics for both conditional operation
  224.       and operation with timeout. The locking operation is called
  225.       <code>mutex_lock()</code>, the conditional locking operation is called
  226.       <code>mutex_trylock()</code> and the unlocking operation is called
  227.       <code>mutex_unlock()</code>.</para>
  228.     </section>
  229.  
  230.     <section>
  231.       <title>Reader/writer locks</title>
  232.  
  233.       <indexterm>
  234.         <primary>synchronization</primary>
  235.  
  236.         <secondary>- read/write lock</secondary>
  237.       </indexterm>
  238.  
  239.       <para>Reader/writer locks, or rwlocks, are by far the most complicated
  240.       synchronization primitive within the kernel. The goal of these locks is
  241.       to improve concurrency of applications, in which threads need to
  242.       synchronize access to a shared resource, and that access can be
  243.       partitioned into a read-only mode and a write mode. Reader/writer locks
  244.       should make it possible for several, possibly many, readers to enter the
  245.       critical section, provided that no writer is currently in the critical
  246.       section, or to be in the critical section contemporarily. Writers are
  247.       allowed to enter the critical section only individually, provided that
  248.       no reader is in the critical section already. Applications, in which the
  249.       majority of operations can be done in the read-only mode, can benefit
  250.       from increased concurrency created by reader/writer locks.</para>
  251.  
  252.       <para>During reader/writer lock construction, a decision should be made
  253.       whether readers will be prefered over writers or whether writers will be
  254.       prefered over readers in cases when the lock is not currently held and
  255.       both a reader and a writer want to gain the lock. Some operating systems
  256.       prefer one group over the other, creating thus a possibility for
  257.       starving the unprefered group. In the HelenOS operating system, none of
  258.       the two groups is prefered. The lock is granted on a first come, first
  259.       served basis with the additional note that readers are granted the lock
  260.       in the biggest possible batch.</para>
  261.  
  262.       <para>With this policy and the timeout modes of operation, the direct
  263.       hand-off becomes much more complicated. For instance, a writer leaving
  264.       the critical section must wake up all leading readers in the rwlock's
  265.      wait queue or one leading writer or no-one if no thread is waiting.
  266.      Similarily, the last reader leaving the critical section must wakeup the
  267.      sleeping writer if there are any sleeping threads left at all. As
  268.      another example, if a writer at the beginning of the rwlock's wait queue
  269.       times out and the lock is held by at least one reader, the writer which
  270.       has timed out must first wake up all readers that follow him in the
  271.       queue prior to signalling the timeout itself and giving up.</para>
  272.  
  273.       <para>Due to the issues mentioned in the previous paragraph, the
  274.       reader/writer lock imlpementation needs to walk the rwlock wait queue's
  275.      list of sleeping threads directly, in order to find out the type of
  276.      access that the queueing threads demand. This makes the code difficult
  277.      to understand and dependent on the internal implementation of the wait
  278.      queue. Nevertheless, it remains unclear to the authors of HelenOS
  279.      whether a simpler but equivalently fair solution exists.</para>
  280.  
  281.      <para>The implementation of rwlocks as it has been already put, makes
  282.      use of one single wait queue for both readers and writers, thus avoiding
  283.      any possibility of starvation. In fact, rwlocks use a mutex rather than
  284.      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis> and is
  285.      used to synchronize writers. The writer's lock operation,
  286.       <code>rwlock_write_lock_timeout()</code>, simply tries to acquire the
  287.       exclusive mutex. If it succeeds, the writer is granted the rwlock.
  288.       However, if the operation fails (e.g. times out), the writer must check
  289.       for potential readers at the head of the list of sleeping threads
  290.       associated with the mutex's wait queue and then proceed according to the
  291.      procedure outlined above.</para>
  292.  
  293.      <para>The exclusive mutex plays an important role in reader
  294.      synchronization as well. However, a reader doing the reader's lock
  295.       operation, <code>rwlock_read_lock_timeout()</code>, may bypass this mutex
  296.       when it detects that:</para>
  297.  
  298.       <orderedlist>
  299.         <listitem>
  300.           <para>there are other readers in the critical section and</para>
  301.         </listitem>
  302.  
  303.         <listitem>
  304.           <para>there are no sleeping threads waiting for the exclusive
  305.           mutex.</para>
  306.         </listitem>
  307.       </orderedlist>
  308.  
  309.       <para>If both conditions are true, the reader will bypass the mutex,
  310.       increment the number of readers in the critical section and then enter
  311.       the critical section. Note that if there are any sleeping threads at the
  312.       beginning of the wait queue, the first must be a writer. If the
  313.       conditions are not fulfilled, the reader normally waits until the
  314.       exclusive mutex is granted to it.</para>
  315.     </section>
  316.  
  317.     <section>
  318.       <title>Condition variables</title>
  319.  
  320.       <indexterm>
  321.         <primary>synchronization</primary>
  322.  
  323.         <secondary>- condition variable</secondary>
  324.       </indexterm>
  325.  
  326.       <para>Condition variables can be used for waiting until a condition
  327.       becomes true. In this respect, they are similar to wait queues. But
  328.       contrary to wait queues, condition variables have different semantics
  329.       that allows events to be lost when there is no thread waiting for them.
  330.       In order to support this, condition variables don't use direct hand-off
  331.      and operate in a way similar to the example below. A thread waiting for
  332.      the condition becoming true does the following:</para>
  333.  
  334.      <example>
  335.        <title>Use of <code>condvar_wait_timeout()</code>.</title>
  336.  
  337.        <programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
  338. while (!<varname>condition</varname>)
  339.        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>); /* <remark>the condition is true, do something</remark> */
  340. <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
  341.      </example>
  342.  
  343.      <para>A thread that causes the condition become true signals this event
  344.      like this:</para>
  345.  
  346.      <example>
  347.        <title>Use of <code>condvar_signal</code>.</title>
  348.  
  349.        <programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
  350. <varname>condition</varname> = <constant>true</constant>;
  351. <function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
  352. <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
  353.      </example>
  354.  
  355.      <para>The wait operation, <code>condvar_wait_timeout()</code>, always puts
  356.      the calling thread to sleep. The thread then sleeps until another thread
  357.      invokes <code>condvar_broadcast()</code> on the same condition variable or
  358.      until it is woken up by <code>condvar_signal()</code>. The
  359.      <code>condvar_signal()</code> operation unblocks the first thread blocking
  360.      on the condition variable while the <code>condvar_broadcast()</code>
  361.      operation unblocks all threads blocking there. If there are no blocking
  362.      threads, these two operations have no efect.</para>
  363.  
  364.      <para>Note that the threads must synchronize over a dedicated mutex. To
  365.      prevent race condition between <code>condvar_wait_timeout()</code> and
  366.      <code>condvar_signal()</code> or <code>condvar_broadcast()</code>, the mutex
  367.      is passed to <code>condvar_wait_timeout()</code> which then atomically
  368.      puts the calling thread asleep and unlocks the mutex. When the thread
  369.      eventually wakes up, <code>condvar_wait()</code> regains the mutex and
  370.      returns.</para>
  371.  
  372.      <para>Also note, that there is no conditional operation for condition
  373.      variables. Such an operation would make no sence since condition
  374.      variables are defined to forget events for which there is no waiting
  375.      thread and because <code>condvar_wait()</code> must always go to sleep.
  376.      The operation with timeout is supported as usually.</para>
  377.  
  378.      <para>In HelenOS, condition variables are based on wait queues. As it is
  379.      already mentioned above, wait queues remember missed events while
  380.      condition variables must not do so. This is reasoned by the fact that
  381.      condition variables are designed for scenarios in which an event might
  382.      occur very many times without being picked up by any waiting thread. On
  383.      the other hand, wait queues would remember any event that had not been
  384.      picked up by a call to <code>waitq_sleep_timeout()</code>. Therefore, if
  385.      wait queues were used directly and without any changes to implement
  386.      condition variables, the missed_wakeup counter would hurt performance of
  387.      the implementation: the <code>while</code> loop in
  388.      <code>condvar_wait_timeout()</code> would effectively do busy waiting
  389.      until all missed wakeups were discarded.</para>
  390.  
  391.      <para>The requirement on the wait operation to atomically put the caller
  392.      to sleep and release the mutex poses an interesting problem on
  393.      <code>condvar_wait_timeout()</code>. More precisely, the thread should
  394.      sleep in the condvar's wait queue prior to releasing the mutex, but it
  395.       must not hold the mutex when it is sleeping.</para>
  396.  
  397.       <para>Problems described in the two previous paragraphs are addressed in
  398.       HelenOS by dividing the <code>waitq_sleep_timeout()</code> function into
  399.       three pieces:</para>
  400.  
  401.       <orderedlist>
  402.         <listitem>
  403.           <para><code>waitq_sleep_prepare()</code> prepares the thread to go to
  404.           sleep by, among other things, locking the wait queue;</para>
  405.         </listitem>
  406.  
  407.         <listitem>
  408.           <para><code>waitq_sleep_timeout_unsafe()</code> implements the core
  409.           blocking logic;</para>
  410.         </listitem>
  411.  
  412.         <listitem>
  413.           <para><code>waitq_sleep_finish()</code> performs cleanup after
  414.           <code>waitq_sleep_timeout_unsafe()</code>; after this call, the wait
  415.           queue spinlock is guaranteed to be unlocked by the caller</para>
  416.         </listitem>
  417.       </orderedlist>
  418.  
  419.       <para>The stock <code>waitq_sleep_timeout()</code> is then a mere wrapper
  420.       that calls these three functions. It is provided for convenience in
  421.       cases where the caller doesn't require such a low level control.
  422.      However, the implementation of <code>condvar_wait_timeout()</code> does
  423.      need this finer-grained control because it has to interleave calls to
  424.      these functions by other actions. It carries its operations out in the
  425.      following order:</para>
  426.  
  427.      <orderedlist>
  428.        <listitem>
  429.          <para>calls <code>waitq_sleep_prepare()</code> in order to lock the
  430.          condition variable's wait queue,</para>
  431.         </listitem>
  432.  
  433.         <listitem>
  434.           <para>releases the mutex,</para>
  435.         </listitem>
  436.  
  437.         <listitem>
  438.           <para>clears the counter of missed wakeups,</para>
  439.         </listitem>
  440.  
  441.         <listitem>
  442.           <para>calls <code>waitq_sleep_timeout_unsafe()</code>,</para>
  443.         </listitem>
  444.  
  445.         <listitem>
  446.           <para>retakes the mutex,</para>
  447.         </listitem>
  448.  
  449.         <listitem>
  450.           <para>calls <code>waitq_sleep_finish()</code>.</para>
  451.         </listitem>
  452.       </orderedlist>
  453.     </section>
  454.   </section>
  455.  
  456.   <section>
  457.     <title>Userspace synchronization</title>
  458.  
  459.     <section>
  460.       <title>Futexes</title>
  461.  
  462.       <indexterm>
  463.         <primary>synchronization</primary>
  464.  
  465.         <secondary>- futex</secondary>
  466.       </indexterm>
  467.  
  468.       <para>In a multithreaded environment, userspace threads need an
  469.       efficient way to synchronize. HelenOS borrows an idea from Linux<xref
  470.       linkend="futex" /> to implement lightweight userspace synchronization
  471.       and mutual exclusion primitive called futex. The key idea behind futexes
  472.       is that non-contended synchronization is very fast and takes place only
  473.       in userspace without kernel's intervention. When more threads contend
  474.      for a futex, only one of them wins; other threads go to sleep via a
  475.      dedicated syscall.</para>
  476.  
  477.      <para>The userspace part of the futex is a mere integer variable, a
  478.      counter, that can be atomically incremented or decremented. The kernel
  479.      part is rather more complicated. For each userspace futex counter, there
  480.      is a kernel structure describing the futex. This structure
  481.      contains:</para>
  482.  
  483.      <itemizedlist>
  484.        <listitem>
  485.          <para>number of references,</para>
  486.        </listitem>
  487.  
  488.        <listitem>
  489.          <para>physical address of the userspace futex counter,</para>
  490.        </listitem>
  491.  
  492.        <listitem>
  493.          <para>hash table link and</para>
  494.        </listitem>
  495.  
  496.        <listitem>
  497.          <para>a wait queue.</para>
  498.        </listitem>
  499.      </itemizedlist>
  500.  
  501.      <para>The reference count helps to find out when the futex is no longer
  502.      needed and can be deallocated. The physical address is used as a key for
  503.      the global futex hash table. Note that the kernel has to use physical
  504.      address to identify the futex beacause one futex can be used for
  505.      synchronization among different address spaces and can have different
  506.      virtual addresses in each of them. Finally, the kernel futex structure
  507.      includes a wait queue. The wait queue is used to put threads that didn't
  508.       win the futex to sleep until the winner wakes one of them up.</para>
  509.  
  510.       <para>A futex should be initialized by setting its userspace counter to
  511.       one before it is used. When locking the futex via userspace library
  512.       function <code>futex_down_timeout()</code>, the library code atomically
  513.       decrements the futex counter and tests if it dropped below zero. If it
  514.       did, then the futex is locked by another thread and the library uses the
  515.       <constant>SYS_FUTEX_SLEEP</constant> syscall to put the thread asleep.
  516.       If the counter decreased to 0, then there was no contention and the
  517.       thread can enter the critical section protected by the futex. When the
  518.       thread later leaves that critical section, it, using library function
  519.       <code>futex_up()</code>, atomically increments the counter. If the counter
  520.       value increased to one, then there again was no contention and no action
  521.       needs to be done. However, if it increased to zero or even a smaller
  522.       number, then there are sleeping threads waiting for the futex to become
  523.       available. In that case, the library has to use the
  524.       <constant>SYS_FUTEX_WAKEUP</constant> syscall to wake one sleeping
  525.       thread.</para>
  526.  
  527.       <para>So far, futexes are very elegant in that they don't interfere with
  528.      the kernel when there is no contention for them. Another nice aspect of
  529.      futexes is that they don't need to be registered anywhere prior to the
  530.       first kernel intervention.</para>
  531.  
  532.       <para>Both futex related syscalls, <constant>SYS_FUTEX_SLEEP</constant>
  533.       and <constant>SYS_FUTEX_WAKEUP</constant>, respectivelly, are mere
  534.       wrappers for <code>waitq_sleep_timeout()</code> and
  535.       <code>waitq_sleep_wakeup()</code>, respectively, with some housekeeping
  536.       functionality added. Both syscalls need to translate the userspace
  537.       virtual address of the futex counter to physical address in order to
  538.       support synchronization accross shared memory. Once the physical address
  539.       is known, the kernel checks whether the futexes are already known to it
  540.       by searching the global futex hash table for an item with the physical
  541.       address of the futex counter as a key. When the search is successful, it
  542.       returns an address of the kernel futex structure associated with the
  543.       counter. If the hash table does not contain the key, the kernel creates
  544.       it and inserts it into the hash table. At the same time, the the current
  545.       task's B+tree of known futexes is searched in order to find out if the
  546.      task already uses the futex. If it does, no action is taken. Otherwise
  547.      the reference count of the futex is incremented, provided that the futex
  548.      already existed.</para>
  549.    </section>
  550.  </section>
  551. </chapter>