Subversion Repositories HelenOS-doc

Rev

Rev 41 | Rev 44 | Go to most recent revision | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="sync">
  3.   <?dbhtml filename="sync.html"?>
  4.  
  5.   <title>Mutual exclusion and synchronization</title>
  6.  
  7.   <section>
  8.     <title>Introduction</title>
  9.  
  10.     <para>The HelenOS operating system is designed to make use of parallelism
  11.     offered by hardware and to exploit concurrency of both the kernel and
  12.     userspace tasks. This is achieved through multiprocessor support and
  13.     several levels of multiprogramming (i.e. multitasking, multithreading and
  14.     through userspace pseudo threads). However, such a highly concurrent
  15.     environment needs safe and efficient ways to handle mutual exclusion and
  16.     synchronization of many execution flows.</para>
  17.   </section>
  18.  
  19.   <section>
  20.     <title>Active kernel primitives</title>
  21.  
  22.     <section>
  23.       <title>Spinlocks</title>
  24.  
  25.       <para>The basic mutual exclusion primitive is the spinlock. Spinlock
  26.       implements busy waiting for an availability of a memory lock (i.e.
  27.       simple variable) in a multiprocessor-safe manner. This safety is
  28.       achieved through the use of a specialized, architecture-dependent,
  29.       atomic test-and-set operation which either locks the spinlock (i.e. sets
  30.       the variable) or, provided that it is already locked, leaves it
  31.       unaltered. In any case, the test-and-set operation returns a value, thus
  32.       signalling either success (i.e. zero return value) or failure (i.e.
  33.       non-zero value) in acquiring the lock. Note that this makes the
  34.       fundamental difference between the naive algorithm that doesn't use the
  35.      atomic operation and the spinlock algortihm. While the naive algorithm
  36.      is prone to race conditions on SMP configuratinos and thus is completely
  37.      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
  38.      conditions and is suitable for mutual exclusion use.</para>
  39.  
  40.      <para>The semantics of the test-and-set operation is that the spinlock
  41.      remains unavailable until this operation called on the respective
  42.      spinlock returns zero. HelenOS builds two functions on top of
  43.      test-and-set operation. The first is the unconditional attempt to
  44.      acquire the spinlock and is called <emphasis>spinlock_lock</emphasis>.
  45.      It simply loops until test-and-set returns zero. The other operation,
  46.      <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
  47.      and calls the test-and-set only once to find out wheter it managed to
  48.      acquire the spinlock or not. The conditional operation is useful in
  49.      situations in which an algorithm cannot acquire more spinlocks in the
  50.      proper order and a deadlock cannot be avoided. In such a case, the
  51.      algorithm would detect the danger and instead of possibly deadlocking
  52.      the system it would simply release some spinlocks it already holds and
  53.      retry the whole operation with the hope that it will succeed next time.
  54.      The unlock operation, <emphasis>spinlock_unlock</emphasis>, is quite
  55.      easy - it merely clears the spinlock variable.</para>
  56.  
  57.      <para>Nevertheless, there is a special issue related to hardware
  58.      optimizations that modern processors implement. Particularily
  59.      problematic is the out-of-order execution of instructions within the
  60.      critical section protected by a spinlock. The processors are always
  61.      self-consistent so that they can carry out speculatively executed
  62.      instructions in the right order with regard to dependencies among those
  63.      instructions. However, the dependency between instructions inside the
  64.      critical section and those that implement locking and unlocking of the
  65.      respective spinlock is not implicit on some processor architectures and
  66.      the processor needs to be explicitly told about each occurrence of such
  67.      a dependency. Therefore, HelenOS adds architecture-specific hooks to all
  68.      <emphasis>spinlock_lock</emphasis>,
  69.      <emphasis>spinlock_trylock</emphasis> and
  70.      <emphasis>spinlock_unlock</emphasis> to prevent the instructions inside
  71.      the critical section from bleeding out. On some architectures, these
  72.      hooks can be a no-op because the dependencies are implicitly there
  73.      because of the special properties of locking and unlocking instructions.
  74.      However, other architectures need to instrument these hooks with
  75.      different memory barriers, depending on what operations can bleed
  76.      out.</para>
  77.  
  78.      <para>Spinlocks have one significant drawback: when held for longer time
  79.      periods, they harm both parallelism and concurrency. Processor executing
  80.      <emphasis>spinlock_lock</emphasis> does not do any fruitful work and is
  81.      effectively halted until it can grab the lock and proceed. Similarily,
  82.      other threads cannot execute on the processor that holds the spinlock
  83.      because the kernel disables preemption on that processor when a spinlock
  84.      is held. The reason behind disabling preemption is priority inversion
  85.      problem avoidance. For the same reason, threads are strongly discouraged
  86.      from sleeping when they hold a spinlock.</para>
  87.  
  88.      <para>To summarize, spinlocks represent very simple and essential mutual
  89.      exclusion primitive for SMP systems. On the other hand, spinlocks scale
  90.      poorly because of the active loop they are based on. Therefore,
  91.      spinlocks are used in HelenOS only for a short-time mutual exclusion and
  92.      in cases where the mutual exclusion is required out of thread context.
  93.      Lastly, spinlocks are used in the construction of passive
  94.      synchronization primitives.</para>
  95.    </section>
  96.  </section>
  97.  
  98.  <section>
  99.    <title>Passive kernel synchronization</title>
  100.  
  101.    <section>
  102.      <title>Wait queues</title>
  103.  
  104.      <para>A wait queue is the basic passive synchronization primitive on
  105.      which all other passive synchronization primitives build. Simply put, it
  106.      allows a thread to sleep until an event associated with the particular
  107.      wait queue occurs. Multiple threads are notified about incoming events
  108.      in first come, first served fashion. Moreover, should the event come
  109.      before any thread waits for it, it is recorded in the wait queue as a
  110.      missed wakeup and later forwarded to the first thread that decides to
  111.      wait in the queue. The inner structures of the wait queue are protected
  112.      by a spinlock.</para>
  113.  
  114.      <para>The thread that wants to wait for a wait queue event uses the
  115.      <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
  116.      checks the wait queue's counter of missed wakeups and if there are any
  117.       missed wakeups, the call returns immediately. The call also returns
  118.       immediately if only a conditional wait was requested. Otherwise the
  119.       thread is enqueued in the wait queue's list of sleeping threads and its
  120.      state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
  121.      one of the following events happens:</para>
  122.  
  123.      <orderedlist>
  124.        <listitem>
  125.          <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
  126.          thread is the first thread in the wait queue's list of sleeping
  127.           threads</para>
  128.         </listitem>
  129.  
  130.         <listitem>
  131.           <para>another thread calls
  132.           <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
  133.           thread</para>
  134.         </listitem>
  135.  
  136.         <listitem>
  137.           <para>the sleep timeouts provided that none of the previous occurred
  138.           within a specified time limit; the limit can be infinity</para>
  139.         </listitem>
  140.       </orderedlist>
  141.  
  142.       <para>All five possibilities (immediate return on success, immediate
  143.       return on failure, wakeup after sleep, interruption and timeout) are
  144.       distinguishable by the return value of
  145.       <emphasis>waitq_sleep_timeout</emphasis>. The ability to interrupt a
  146.       sleeping thread is essential for externally initiated thread termination
  147.       and the ability to wait only for a certain amount of time is used, for
  148.       instance, to passively delay thread execution by several microseconds or
  149.       even seconds in <emphasis>thread_sleep</emphasis> function. Because all
  150.       other passive kernel synchronization primitives are based on wait
  151.       queues, they also have the option of being interrutped and, more
  152.       importantly, can timeout. All of them also implement the conditional
  153.       operation. Furthemore, this very fundamental interface reaches up to the
  154.       implementation of futexes - userspace synchronization primitive, which
  155.       makes it possible for a userspace thread to request synchronization
  156.       operation with a timeout or a conditional operation.</para>
  157.  
  158.       <para>From the description above, it should be apparent, that when a
  159.       sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
  160.       <emphasis>waitq_sleep_timeout</emphasis> succeeds immediatelly, the
  161.       thread can be sure the event has come and the thread need not and should
  162.       not verify this fact. This approach is called direct hand-off and is
  163.       characteristic for all passive HelenOS synchronization primitives with
  164.       one exception described below.</para>
  165.     </section>
  166.  
  167.     <section>
  168.       <title>Semaphores</title>
  169.  
  170.       <para>The interesting point about wait queues is that the number of
  171.       missed wakeups is equal to the number of threads that will not block in
  172.       <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
  173.       instead. On the other hand, semaphores are synchronization primitives
  174.       that will let predefined amount of threads in its critical section and
  175.       block any other threads above this count. However, these two cases are
  176.       exactly the same. Semaphores in HelenOS are therefore implemented as
  177.       wait queues with a single semantic change: their wait queue is
  178.       initialized to have so many missed wakeups as is the number of threads
  179.       that the semphore intends to let into its critical section
  180.       simultaneously.</para>
  181.  
  182.       <para>In the semaphore language, the wait queue operation
  183.       <emphasis>waitq_sleep_timeout</emphasis> corresponds to
  184.       <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
  185.       represented by the function <emphasis>semaphore_down_timeout</emphasis>
  186.       and by way of similitude the wait queue operation waitq_wakeup
  187.       corresponds to semaphore <emphasis>up</emphasis> operation, represented
  188.       by the function <emphasis>sempafore_up</emphasis>. The conditional down
  189.       operation is called <emphasis>semaphore_trydown</emphasis>.</para>
  190.     </section>
  191.  
  192.     <section>
  193.       <title>Mutexes</title>
  194.  
  195.       <para>Mutexes are are sometimes referred to as binary sempahores. That
  196.       means that mutexes are like semaphores that allow only one thread in its
  197.       critical section. Indeed, mutexes in HelenOS are implemented exactly in
  198.       this way: they are built atop semaphores. From another point of view,
  199.       they can be viewed as spinlocks without busy waiting. Their semaphore
  200.       heritage provides good basics for both conditional operation and
  201.       operation with timeout. The locking operation is called
  202.       <emphasis>mutex_lock</emphasis>, the conditional locking operation is
  203.       called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
  204.       called <emphasis>mutex_unlock</emphasis>.</para>
  205.     </section>
  206.  
  207.     <section>
  208.       <title>Reader/writer locks</title>
  209.  
  210.       <para>Reader/writer locks, or rwlocks, are by far the most complicated
  211.       synchronization primitive within the kernel. The goal of these locks is
  212.       to improve concurrency of applications in which threads need to
  213.       synchronize access to a shared resource and that access can be
  214.       partitioned into a read-only mode and a write mode. Reader/writer locks
  215.       should make it possible for several, possibly many, readers to enter the
  216.       critical section, provided that no writer is currently in the critical
  217.       section, or to be in the critical section contemporarily. Writers are
  218.       allowed to enter the critical section only individually, provided that
  219.       no reader is in the critical section already. Applications in which the
  220.       majority of operations can be done in the read-only mode can benefit
  221.       from increased concurrency created by reader/writer locks.</para>
  222.  
  223.       <para>During reader/writer locks construction, a decision should be made
  224.       whether readers will be prefered over writers or whether writers will be
  225.       prefered over readers in cases when the lock is not currently held and
  226.       both a reader and a writer want to gain the lock. Some operating systems
  227.       prefer one group over the other, creating thus a possibility for
  228.       starving the unprefered group. In the HelenOS operating system, none of
  229.       the two groups is prefered. The lock is granted on the first come, first
  230.       served basis with the additional note that readers are granted the lock
  231.       in biggest possible batches.</para>
  232.  
  233.       <para>With this policy and the timeout modes of operation, the direct
  234.       hand-off becomes much more complicated. For instance, a writer leaving
  235.       the critical section must wake up all leading readers in the rwlock's
  236.      wait queue or one leading writer or no-one if no thread is waiting.
  237.      Similarily, the last reader leaving the critical section must wakeup the
  238.      sleeping writer, if there are any sleeping threads at all. As another
  239.      example, if a writer at the beginning of the rwlock's wait queue
  240.       timeouts and the lock is held by at least one reader, the timeouting
  241.       writer must first wake up all readers that follow him in the queue prior
  242.       to signalling the timeout itself and giving up.</para>
  243.  
  244.       <para>Because of the issues mentioned in the previous paragraph, the
  245.       reader/writer locks imlpementation needs to walk the rwlock wait queue's
  246.      list of sleeping threads directly in order to find out the type of
  247.      access that the queueing threads demand. This makes the code difficult
  248.      to understand and dependent on the internal implementation of the wait
  249.      queue. Nevertheless, it remains unclear to the authors of HelenOS
  250.      whether a simpler but equivalently fair solution exists.</para>
  251.  
  252.      <para>The implementation of rwlocks as it has been already put, makes
  253.      use of one single wait queue for both readers and writers, thus avoiding
  254.      any possibility of starvation. In fact, rwlocks use a mutex rather than
  255.      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
  256.      and is used to synchronize writers. The writer's lock operation,
  257.       <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
  258.       the exclusive mutex. If it succeeds, the writer is granted the rwlock.
  259.       However, if the operation fails, the writer must check for potential
  260.       readers at the head of the list of sleeping threads associated with the
  261.       mutex's wait queue and proceed according to the procedure outlined
  262.      above.</para>
  263.  
  264.      <para>The exclusive mutex plays an important role in reader
  265.      synchronization as well. However, a reader doing the reader's lock
  266.       operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
  267.       this mutex when it detects that:</para>
  268.  
  269.       <orderedlist>
  270.         <listitem>
  271.           <para>there are other readers in the critical section</para>
  272.         </listitem>
  273.  
  274.         <listitem>
  275.           <para>there are no sleeping threads waiting for the exclusive
  276.           mutex</para>
  277.         </listitem>
  278.       </orderedlist>
  279.  
  280.       <para>If both conditions are true, the reader will bypass the mutex,
  281.       increment the number of readers in the critical section and enter the
  282.       critical section. Note that if there are any sleeping threads at the
  283.       beginning of the wait queue, the first of them must be a writer. If the
  284.       conditions are not fulfilled, the reader normally waits until the
  285.       exclusive mutex is granted to it.</para>
  286.     </section>
  287.  
  288.     <section>
  289.       <title>Condition variables</title>
  290.  
  291.       <para>Condvars explanation</para>
  292.     </section>
  293.   </section>
  294.  
  295.   <section>
  296.     <title>Userspace synchronization</title>
  297.  
  298.     <section>
  299.       <title>Futexes</title>
  300.  
  301.       <para></para>
  302.     </section>
  303.   </section>
  304. </chapter>