Subversion Repositories HelenOS-doc

Rev

Rev 44 | Rev 48 | Go to most recent revision | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="sync">
  3.   <?dbhtml filename="sync.html"?>
  4.  
  5.   <title>Mutual exclusion and synchronization</title>
  6.  
  7.   <section>
  8.     <title>Introduction</title>
  9.  
  10.     <para>The HelenOS operating system is designed to make use of the
  11.     parallelism offered by the hardware and to exploit concurrency of both the
  12.     kernel and userspace tasks. This is achieved through multiprocessor
  13.     support and several levels of multiprogramming such as multitasking,
  14.     multithreading and also through userspace pseudo threads. However, such a
  15.     highly concurrent environment needs safe and efficient ways to handle
  16.     mutual exclusion and synchronization of many execution flows.</para>
  17.   </section>
  18.  
  19.   <section>
  20.     <title>Active kernel primitives</title>
  21.  
  22.     <section>
  23.       <title>Spinlocks</title>
  24.  
  25.       <para>The basic mutual exclusion primitive is the spinlock. The spinlock
  26.       implements active waiting for the availability of a memory lock (i.e.
  27.       simple variable) in a multiprocessor-safe manner. This safety is
  28.       achieved through the use of a specialized, architecture-dependent,
  29.       atomic test-and-set operation which either locks the spinlock (i.e. sets
  30.       the variable) or, provided that it is already locked, leaves it
  31.       unaltered. In any case, the test-and-set operation returns a value, thus
  32.       signalling either success (i.e. zero return value) or failure (i.e.
  33.       non-zero value) in acquiring the lock. Note that this makes a
  34.       fundamental difference between the naive algorithm that doesn't use the
  35.      atomic operation and the spinlock algortihm. While the naive algorithm
  36.      is prone to race conditions on SMP configurations and thus is completely
  37.      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
  38.      conditions and is suitable for mutual exclusion use.</para>
  39.  
  40.      <para>The semantics of the test-and-set operation is that the spinlock
  41.      remains unavailable until this operation called on the respective
  42.      spinlock returns zero. HelenOS builds two functions on top of the
  43.      test-and-set operation. The first function is the unconditional attempt
  44.      to acquire the spinlock and is called
  45.      <emphasis>spinlock_lock</emphasis>. It simply loops until the
  46.      test-and-set returns a zero value. The other function,
  47.      <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
  48.      and calls the test-and-set only once to find out whether it managed to
  49.      acquire the spinlock or not. The conditional operation is useful in
  50.      situations in which an algorithm cannot acquire more spinlocks in the
  51.      proper order and a deadlock cannot be avoided. In such a case, the
  52.      algorithm would detect the danger and instead of possibly deadlocking
  53.      the system it would simply release some spinlocks it already holds and
  54.      retry the whole operation with the hope that it will succeed next time.
  55.      The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy
  56.      - it merely clears the spinlock variable.</para>
  57.  
  58.      <para>Nevertheless, there is a special issue related to hardware
  59.      optimizations that modern processors implement. Particularly problematic
  60.      is the out-of-order execution of instructions within the critical
  61.      section protected by a spinlock. The processors are always
  62.      self-consistent so that they can carry out speculatively executed
  63.      instructions in the right order with regard to dependencies among those
  64.      instructions. However, the dependency between instructions inside the
  65.      critical section and those that implement locking and unlocking of the
  66.      respective spinlock is not implicit on some processor architectures. As
  67.      a result, the processor needs to be explicitly told about each
  68.      occurrence of such a dependency. Therefore, HelenOS adds
  69.      architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>,
  70.      <emphasis>spinlock_trylock</emphasis> and
  71.      <emphasis>spinlock_unlock</emphasis> functions to prevent the
  72.      instructions inside the critical section from permeating out. On some
  73.      architectures, these hooks can be void because the dependencies are
  74.      implicitly there because of the special properties of locking and
  75.      unlocking instructions. However, other architectures need to instrument
  76.      these hooks with different memory barriers, depending on what operations
  77.      could permeate out.</para>
  78.  
  79.      <para>Spinlocks have one significant drawback: when held for longer time
  80.      periods, they harm both parallelism and concurrency. The processor
  81.      executing <emphasis>spinlock_lock</emphasis> does not do any fruitful
  82.      work and is effectively halted until it can grab the lock and proceed.
  83.      Similarily, other execution flows cannot execute on the processor that
  84.      holds the spinlock because the kernel disables preemption on that
  85.      processor when a spinlock is held. The reason behind disabling
  86.      preemption is priority inversion problem avoidance. For the same reason,
  87.      threads are strongly discouraged from sleeping when they hold a
  88.      spinlock.</para>
  89.  
  90.      <para>To summarize, spinlocks represent very simple and essential mutual
  91.      exclusion primitive for SMP systems. On the other hand, spinlocks scale
  92.      poorly because of the active loop they are based on. Therefore,
  93.      spinlocks are used in HelenOS only for short-time mutual exclusion and
  94.      in cases where the mutual exclusion is required out of thread context.
  95.      Lastly, spinlocks are used in the construction of passive
  96.      synchronization primitives.</para>
  97.    </section>
  98.  </section>
  99.  
  100.  <section>
  101.    <title>Passive kernel synchronization</title>
  102.  
  103.    <section>
  104.      <title>Wait queues</title>
  105.  
  106.      <para>A wait queue is the basic passive synchronization primitive on
  107.      which all other passive synchronization primitives are built. Simply
  108.      put, it allows a thread to sleep until an event associated with the
  109.      particular wait queue occurs. Multiple threads are notified about
  110.      incoming events in a first come, first served fashion. Moreover, should
  111.      the event come before any thread waits for it, it is recorded in the
  112.      wait queue as a missed wakeup and later forwarded to the first thread
  113.      that decides to wait in the queue. The inner structures of the wait
  114.      queue are protected by a spinlock.</para>
  115.  
  116.      <para>The thread that wants to wait for a wait queue event uses the
  117.      <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
  118.      checks the wait queue's counter of missed wakeups and if there are any
  119.       missed wakeups, the call returns immediately. The call also returns
  120.       immediately if only a conditional wait was requested. Otherwise the
  121.       thread is enqueued in the wait queue's list of sleeping threads and its
  122.      state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
  123.      one of the following events happens:</para>
  124.  
  125.      <orderedlist>
  126.        <listitem>
  127.          <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
  128.          thread is the first thread in the wait queue's list of sleeping
  129.           threads;</para>
  130.         </listitem>
  131.  
  132.         <listitem>
  133.           <para>another thread calls
  134.           <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
  135.           thread;</para>
  136.         </listitem>
  137.  
  138.         <listitem>
  139.           <para>the sleep times out provided that none of the previous
  140.           occurred within a specified time limit; the limit can be
  141.           infinity.</para>
  142.         </listitem>
  143.       </orderedlist>
  144.  
  145.       <para>All five possibilities (immediate return on success, immediate
  146.       return on failure, wakeup after sleep, interruption and timeout) are
  147.       distinguishable by the return value of
  148.       <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a
  149.       sleeping thread is essential for externally initiated thread
  150.       termination. The ability to wait only for a certain amount of time is
  151.       used, for instance, to passively delay thread execution by several
  152.       microseconds or even seconds in <emphasis>thread_sleep</emphasis>
  153.       function. Due to the fact that all other passive kernel synchronization
  154.       primitives are based on wait queues, they also have the option of being
  155.       interrutped and, more importantly, can timeout. All of them also
  156.       implement the conditional operation. Furthemore, this very fundamental
  157.       interface reaches up to the implementation of futexes - userspace
  158.       synchronization primitive, which makes it possible for a userspace
  159.       thread to request a synchronization operation with a timeout or a
  160.       conditional operation.</para>
  161.  
  162.       <para>From the description above, it should be apparent, that when a
  163.       sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
  164.       <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the
  165.       thread can be sure that the event has occurred. The thread need not and
  166.       should not verify this fact. This approach is called direct hand-off and
  167.       is characteristic for all passive HelenOS synchronization primitives,
  168.       with the exception as described below.</para>
  169.     </section>
  170.  
  171.     <section>
  172.       <title>Semaphores</title>
  173.  
  174.       <para>The interesting point about wait queues is that the number of
  175.       missed wakeups is equal to the number of threads that will not block in
  176.       <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
  177.       instead. On the other hand, semaphores are synchronization primitives
  178.       that will let predefined amount of threads into its critical section and
  179.       block any other threads above this count. However, these two cases are
  180.       exactly the same. Semaphores in HelenOS are therefore implemented as
  181.       wait queues with a single semantic change: their wait queue is
  182.       initialized to have so many missed wakeups as is the number of threads
  183.       that the semphore intends to let into its critical section
  184.       simultaneously.</para>
  185.  
  186.       <para>In the semaphore language, the wait queue operation
  187.       <emphasis>waitq_sleep_timeout</emphasis> corresponds to
  188.       <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
  189.       represented by the function <emphasis>semaphore_down_timeout</emphasis>
  190.       and by way of similitude the wait queue operation waitq_wakeup
  191.       corresponds to semaphore <emphasis>up</emphasis> operation, represented
  192.       by the function <emphasis>sempafore_up</emphasis>. The conditional down
  193.       operation is called <emphasis>semaphore_trydown</emphasis>.</para>
  194.     </section>
  195.  
  196.     <section>
  197.       <title>Mutexes</title>
  198.  
  199.       <para>Mutexes are sometimes referred to as binary sempahores. That means
  200.       that mutexes are like semaphores that allow only one thread in its
  201.       critical section. Indeed, mutexes in HelenOS are implemented exactly in
  202.       this way: they are built on top of semaphores. From another point of
  203.       view, they can be viewed as spinlocks without busy waiting. Their
  204.       semaphore heritage provides good basics for both conditional operation
  205.       and operation with timeout. The locking operation is called
  206.       <emphasis>mutex_lock</emphasis>, the conditional locking operation is
  207.       called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
  208.       called <emphasis>mutex_unlock</emphasis>.</para>
  209.     </section>
  210.  
  211.     <section>
  212.       <title>Reader/writer locks</title>
  213.  
  214.       <para>Reader/writer locks, or rwlocks, are by far the most complicated
  215.       synchronization primitive within the kernel. The goal of these locks is
  216.       to improve concurrency of applications, in which threads need to
  217.       synchronize access to a shared resource, and that access can be
  218.       partitioned into a read-only mode and a write mode. Reader/writer locks
  219.       should make it possible for several, possibly many, readers to enter the
  220.       critical section, provided that no writer is currently in the critical
  221.       section, or to be in the critical section contemporarily. Writers are
  222.       allowed to enter the critical section only individually, provided that
  223.       no reader is in the critical section already. Applications, in which the
  224.       majority of operations can be done in the read-only mode, can benefit
  225.       from increased concurrency created by reader/writer locks.</para>
  226.  
  227.       <para>During reader/writer lock construction, a decision should be made
  228.       whether readers will be prefered over writers or whether writers will be
  229.       prefered over readers in cases when the lock is not currently held and
  230.       both a reader and a writer want to gain the lock. Some operating systems
  231.       prefer one group over the other, creating thus a possibility for
  232.       starving the unprefered group. In the HelenOS operating system, none of
  233.       the two groups is prefered. The lock is granted on a first come, first
  234.       served basis with the additional note that readers are granted the lock
  235.       in the biggest possible batch.</para>
  236.  
  237.       <para>With this policy and the timeout modes of operation, the direct
  238.       hand-off becomes much more complicated. For instance, a writer leaving
  239.       the critical section must wake up all leading readers in the rwlock's
  240.      wait queue or one leading writer or no-one if no thread is waiting.
  241.      Similarily, the last reader leaving the critical section must wakeup the
  242.      sleeping writer if there are any sleeping threads left at all. As
  243.      another example, if a writer at the beginning of the rwlock's wait queue
  244.       times out and the lock is held by at least one reader, the writer which
  245.       has timed out must first wake up all readers that follow him in the
  246.       queue prior to signalling the timeout itself and giving up.</para>
  247.  
  248.       <para>Due to the issues mentioned in the previous paragraph, the
  249.       reader/writer lock imlpementation needs to walk the rwlock wait queue's
  250.      list of sleeping threads directly, in order to find out the type of
  251.      access that the queueing threads demand. This makes the code difficult
  252.      to understand and dependent on the internal implementation of the wait
  253.      queue. Nevertheless, it remains unclear to the authors of HelenOS
  254.      whether a simpler but equivalently fair solution exists.</para>
  255.  
  256.      <para>The implementation of rwlocks as it has been already put, makes
  257.      use of one single wait queue for both readers and writers, thus avoiding
  258.      any possibility of starvation. In fact, rwlocks use a mutex rather than
  259.      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
  260.      and is used to synchronize writers. The writer's lock operation,
  261.       <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
  262.       the exclusive mutex. If it succeeds, the writer is granted the rwlock.
  263.       However, if the operation fails (e.g. times out), the writer must check
  264.       for potential readers at the head of the list of sleeping threads
  265.       associated with the mutex's wait queue and then proceed according to the
  266.      procedure outlined above.</para>
  267.  
  268.      <para>The exclusive mutex plays an important role in reader
  269.      synchronization as well. However, a reader doing the reader's lock
  270.       operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
  271.       this mutex when it detects that:</para>
  272.  
  273.       <orderedlist>
  274.         <listitem>
  275.           <para>there are other readers in the critical section and</para>
  276.         </listitem>
  277.  
  278.         <listitem>
  279.           <para>there are no sleeping threads waiting for the exclusive
  280.           mutex.</para>
  281.         </listitem>
  282.       </orderedlist>
  283.  
  284.       <para>If both conditions are true, the reader will bypass the mutex,
  285.       increment the number of readers in the critical section and then enter
  286.       the critical section. Note that if there are any sleeping threads at the
  287.       beginning of the wait queue, the first must be a writer. If the
  288.       conditions are not fulfilled, the reader normally waits until the
  289.       exclusive mutex is granted to it.</para>
  290.     </section>
  291.  
  292.     <section>
  293.       <title>Condition variables</title>
  294.  
  295.       <para>Condvars explanation</para>
  296.     </section>
  297.   </section>
  298.  
  299.   <section>
  300.     <title>Userspace synchronization</title>
  301.  
  302.     <section>
  303.       <title>Futexes</title>
  304.  
  305.       <para></para>
  306.     </section>
  307.   </section>
  308. </chapter>