Subversion Repositories HelenOS-doc

Rev

Rev 48 | Rev 62 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
9 bondari 1
<?xml version="1.0" encoding="UTF-8"?>
41 jermar 2
<chapter id="sync">
3
  <?dbhtml filename="sync.html"?>
9 bondari 4
 
41 jermar 5
  <title>Mutual exclusion and synchronization</title>
9 bondari 6
 
41 jermar 7
  <section>
8
    <title>Introduction</title>
9 bondari 9
 
45 jermar 10
    <para>The HelenOS operating system is designed to make use of the
11
    parallelism offered by the hardware and to exploit concurrency of both the
12
    kernel and userspace tasks. This is achieved through multiprocessor
13
    support and several levels of multiprogramming such as multitasking,
14
    multithreading and also through userspace pseudo threads. However, such a
15
    highly concurrent environment needs safe and efficient ways to handle
16
    mutual exclusion and synchronization of many execution flows.</para>
41 jermar 17
  </section>
18
 
19
  <section>
20
    <title>Active kernel primitives</title>
21
 
9 bondari 22
    <section>
41 jermar 23
      <title>Spinlocks</title>
9 bondari 24
 
45 jermar 25
      <para>The basic mutual exclusion primitive is the spinlock. The spinlock
26
      implements active waiting for the availability of a memory lock (i.e.
41 jermar 27
      simple variable) in a multiprocessor-safe manner. This safety is
28
      achieved through the use of a specialized, architecture-dependent,
29
      atomic test-and-set operation which either locks the spinlock (i.e. sets
30
      the variable) or, provided that it is already locked, leaves it
31
      unaltered. In any case, the test-and-set operation returns a value, thus
32
      signalling either success (i.e. zero return value) or failure (i.e.
45 jermar 33
      non-zero value) in acquiring the lock. Note that this makes a
41 jermar 34
      fundamental difference between the naive algorithm that doesn't use the
35
      atomic operation and the spinlock algortihm. While the naive algorithm
45 jermar 36
      is prone to race conditions on SMP configurations and thus is completely
41 jermar 37
      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
38
      conditions and is suitable for mutual exclusion use.</para>
9 bondari 39
 
41 jermar 40
      <para>The semantics of the test-and-set operation is that the spinlock
41
      remains unavailable until this operation called on the respective
45 jermar 42
      spinlock returns zero. HelenOS builds two functions on top of the
43
      test-and-set operation. The first function is the unconditional attempt
57 jermar 44
      to acquire the spinlock and is called <code>spinlock_lock</code>. It
45
      simply loops until the test-and-set returns a zero value. The other
46
      function, <code>spinlock_trylock</code>, is the conditional lock
47
      operation and calls the test-and-set only once to find out whether it
48
      managed to acquire the spinlock or not. The conditional operation is
49
      useful in situations in which an algorithm cannot acquire more spinlocks
50
      in the proper order and a deadlock cannot be avoided. In such a case,
51
      the algorithm would detect the danger and instead of possibly
52
      deadlocking the system it would simply release some spinlocks it already
53
      holds and retry the whole operation with the hope that it will succeed
54
      next time. The unlock function, <code>spinlock_unlock</code>, is quite
55
      easy - it merely clears the spinlock variable.</para>
9 bondari 56
 
41 jermar 57
      <para>Nevertheless, there is a special issue related to hardware
45 jermar 58
      optimizations that modern processors implement. Particularly problematic
59
      is the out-of-order execution of instructions within the critical
60
      section protected by a spinlock. The processors are always
41 jermar 61
      self-consistent so that they can carry out speculatively executed
62
      instructions in the right order with regard to dependencies among those
63
      instructions. However, the dependency between instructions inside the
64
      critical section and those that implement locking and unlocking of the
45 jermar 65
      respective spinlock is not implicit on some processor architectures. As
66
      a result, the processor needs to be explicitly told about each
67
      occurrence of such a dependency. Therefore, HelenOS adds
57 jermar 68
      architecture-specific hooks to all <code>spinlock_lock</code>,
69
      <code>spinlock_trylock</code> and <code>spinlock_unlock</code> functions
70
      to prevent the instructions inside the critical section from permeating
71
      out. On some architectures, these hooks can be void because the
72
      dependencies are implicitly there because of the special properties of
73
      locking and unlocking instructions. However, other architectures need to
74
      instrument these hooks with different memory barriers, depending on what
75
      operations could permeate out.</para>
9 bondari 76
 
41 jermar 77
      <para>Spinlocks have one significant drawback: when held for longer time
45 jermar 78
      periods, they harm both parallelism and concurrency. The processor
57 jermar 79
      executing <code>spinlock_lock</code> does not do any fruitful work and
80
      is effectively halted until it can grab the lock and proceed.
45 jermar 81
      Similarily, other execution flows cannot execute on the processor that
82
      holds the spinlock because the kernel disables preemption on that
83
      processor when a spinlock is held. The reason behind disabling
84
      preemption is priority inversion problem avoidance. For the same reason,
85
      threads are strongly discouraged from sleeping when they hold a
86
      spinlock.</para>
9 bondari 87
 
41 jermar 88
      <para>To summarize, spinlocks represent very simple and essential mutual
89
      exclusion primitive for SMP systems. On the other hand, spinlocks scale
90
      poorly because of the active loop they are based on. Therefore,
45 jermar 91
      spinlocks are used in HelenOS only for short-time mutual exclusion and
41 jermar 92
      in cases where the mutual exclusion is required out of thread context.
93
      Lastly, spinlocks are used in the construction of passive
94
      synchronization primitives.</para>
95
    </section>
96
  </section>
9 bondari 97
 
41 jermar 98
  <section>
99
    <title>Passive kernel synchronization</title>
9 bondari 100
 
41 jermar 101
    <section>
43 jermar 102
      <title>Wait queues</title>
9 bondari 103
 
43 jermar 104
      <para>A wait queue is the basic passive synchronization primitive on
45 jermar 105
      which all other passive synchronization primitives are built. Simply
106
      put, it allows a thread to sleep until an event associated with the
107
      particular wait queue occurs. Multiple threads are notified about
108
      incoming events in a first come, first served fashion. Moreover, should
109
      the event come before any thread waits for it, it is recorded in the
110
      wait queue as a missed wakeup and later forwarded to the first thread
111
      that decides to wait in the queue. The inner structures of the wait
112
      queue are protected by a spinlock.</para>
43 jermar 113
 
114
      <para>The thread that wants to wait for a wait queue event uses the
57 jermar 115
      <code>waitq_sleep_timeout</code> function. The algorithm then checks the
116
      wait queue's counter of missed wakeups and if there are any missed
117
      wakeups, the call returns immediately. The call also returns immediately
118
      if only a conditional wait was requested. Otherwise the thread is
119
      enqueued in the wait queue's list of sleeping threads and its state is
120
      changed to <constant>Sleeping</constant>. It then sleeps until one of
121
      the following events happens:</para>
43 jermar 122
 
123
      <orderedlist>
124
        <listitem>
57 jermar 125
          <para>another thread calls <code>waitq_wakeup</code> and the thread
126
          is the first thread in the wait queue's list of sleeping
45 jermar 127
          threads;</para>
43 jermar 128
        </listitem>
129
 
130
        <listitem>
57 jermar 131
          <para>another thread calls <code>waitq_interrupt_sleep</code> on the
132
          sleeping thread;</para>
43 jermar 133
        </listitem>
134
 
135
        <listitem>
45 jermar 136
          <para>the sleep times out provided that none of the previous
137
          occurred within a specified time limit; the limit can be
138
          infinity.</para>
43 jermar 139
        </listitem>
140
      </orderedlist>
141
 
142
      <para>All five possibilities (immediate return on success, immediate
143
      return on failure, wakeup after sleep, interruption and timeout) are
57 jermar 144
      distinguishable by the return value of <code>waitq_sleep_timeout</code>.
145
      Being able to interrupt a sleeping thread is essential for externally
146
      initiated thread termination. The ability to wait only for a certain
147
      amount of time is used, for instance, to passively delay thread
148
      execution by several microseconds or even seconds in
149
      <code>thread_sleep</code> function. Due to the fact that all other
150
      passive kernel synchronization primitives are based on wait queues, they
151
      also have the option of being interrutped and, more importantly, can
152
      timeout. All of them also implement the conditional operation.
153
      Furthemore, this very fundamental interface reaches up to the
154
      implementation of futexes - userspace synchronization primitive, which
155
      makes it possible for a userspace thread to request a synchronization
156
      operation with a timeout or a conditional operation.</para>
43 jermar 157
 
158
      <para>From the description above, it should be apparent, that when a
57 jermar 159
      sleeping thread is woken by <code>waitq_wakeup</code> or when
160
      <code>waitq_sleep_timeout</code> succeeds immediately, the thread can be
161
      sure that the event has occurred. The thread need not and should not
162
      verify this fact. This approach is called direct hand-off and is
163
      characteristic for all passive HelenOS synchronization primitives, with
164
      the exception as described below.</para>
41 jermar 165
    </section>
9 bondari 166
 
41 jermar 167
    <section>
168
      <title>Semaphores</title>
9 bondari 169
 
43 jermar 170
      <para>The interesting point about wait queues is that the number of
171
      missed wakeups is equal to the number of threads that will not block in
57 jermar 172
      <code>watiq_sleep_timeout</code> and would immediately succeed instead.
173
      On the other hand, semaphores are synchronization primitives that will
174
      let predefined amount of threads into their critical section and block
175
      any other threads above this count. However, these two cases are exactly
176
      the same. Semaphores in HelenOS are therefore implemented as wait queues
177
      with a single semantic change: their wait queue is initialized to have
178
      so many missed wakeups as is the number of threads that the semphore
179
      intends to let into its critical section simultaneously.</para>
43 jermar 180
 
181
      <para>In the semaphore language, the wait queue operation
57 jermar 182
      <code>waitq_sleep_timeout</code> corresponds to semaphore
183
      <code>down</code> operation, represented by the function
184
      <code>semaphore_down_timeout</code> and by way of similitude the wait
185
      queue operation waitq_wakeup corresponds to semaphore <code>up</code>
186
      operation, represented by the function <code>sempafore_up</code>. The
187
      conditional down operation is called
188
      <code>semaphore_trydown</code>.</para>
41 jermar 189
    </section>
9 bondari 190
 
41 jermar 191
    <section>
43 jermar 192
      <title>Mutexes</title>
9 bondari 193
 
45 jermar 194
      <para>Mutexes are sometimes referred to as binary sempahores. That means
195
      that mutexes are like semaphores that allow only one thread in its
43 jermar 196
      critical section. Indeed, mutexes in HelenOS are implemented exactly in
45 jermar 197
      this way: they are built on top of semaphores. From another point of
198
      view, they can be viewed as spinlocks without busy waiting. Their
199
      semaphore heritage provides good basics for both conditional operation
200
      and operation with timeout. The locking operation is called
57 jermar 201
      <code>mutex_lock</code>, the conditional locking operation is called
202
      <code>mutex_trylock</code> and the unlocking operation is called
203
      <code>mutex_unlock</code>.</para>
41 jermar 204
    </section>
9 bondari 205
 
41 jermar 206
    <section>
43 jermar 207
      <title>Reader/writer locks</title>
9 bondari 208
 
43 jermar 209
      <para>Reader/writer locks, or rwlocks, are by far the most complicated
210
      synchronization primitive within the kernel. The goal of these locks is
45 jermar 211
      to improve concurrency of applications, in which threads need to
212
      synchronize access to a shared resource, and that access can be
43 jermar 213
      partitioned into a read-only mode and a write mode. Reader/writer locks
214
      should make it possible for several, possibly many, readers to enter the
215
      critical section, provided that no writer is currently in the critical
216
      section, or to be in the critical section contemporarily. Writers are
217
      allowed to enter the critical section only individually, provided that
45 jermar 218
      no reader is in the critical section already. Applications, in which the
219
      majority of operations can be done in the read-only mode, can benefit
43 jermar 220
      from increased concurrency created by reader/writer locks.</para>
221
 
45 jermar 222
      <para>During reader/writer lock construction, a decision should be made
43 jermar 223
      whether readers will be prefered over writers or whether writers will be
224
      prefered over readers in cases when the lock is not currently held and
225
      both a reader and a writer want to gain the lock. Some operating systems
226
      prefer one group over the other, creating thus a possibility for
227
      starving the unprefered group. In the HelenOS operating system, none of
45 jermar 228
      the two groups is prefered. The lock is granted on a first come, first
43 jermar 229
      served basis with the additional note that readers are granted the lock
45 jermar 230
      in the biggest possible batch.</para>
43 jermar 231
 
232
      <para>With this policy and the timeout modes of operation, the direct
233
      hand-off becomes much more complicated. For instance, a writer leaving
234
      the critical section must wake up all leading readers in the rwlock's
235
      wait queue or one leading writer or no-one if no thread is waiting.
236
      Similarily, the last reader leaving the critical section must wakeup the
45 jermar 237
      sleeping writer if there are any sleeping threads left at all. As
238
      another example, if a writer at the beginning of the rwlock's wait queue
239
      times out and the lock is held by at least one reader, the writer which
240
      has timed out must first wake up all readers that follow him in the
241
      queue prior to signalling the timeout itself and giving up.</para>
43 jermar 242
 
45 jermar 243
      <para>Due to the issues mentioned in the previous paragraph, the
244
      reader/writer lock imlpementation needs to walk the rwlock wait queue's
245
      list of sleeping threads directly, in order to find out the type of
43 jermar 246
      access that the queueing threads demand. This makes the code difficult
247
      to understand and dependent on the internal implementation of the wait
248
      queue. Nevertheless, it remains unclear to the authors of HelenOS
249
      whether a simpler but equivalently fair solution exists.</para>
250
 
251
      <para>The implementation of rwlocks as it has been already put, makes
252
      use of one single wait queue for both readers and writers, thus avoiding
253
      any possibility of starvation. In fact, rwlocks use a mutex rather than
57 jermar 254
      a bare wait queue. This mutex is called <code>exclusive</code> and is
255
      used to synchronize writers. The writer's lock operation,
256
      <code>rwlock_write_lock_timeout</code>, simply tries to acquire the
257
      exclusive mutex. If it succeeds, the writer is granted the rwlock.
44 jermar 258
      However, if the operation fails (e.g. times out), the writer must check
259
      for potential readers at the head of the list of sleeping threads
45 jermar 260
      associated with the mutex's wait queue and then proceed according to the
44 jermar 261
      procedure outlined above.</para>
43 jermar 262
 
263
      <para>The exclusive mutex plays an important role in reader
264
      synchronization as well. However, a reader doing the reader's lock
57 jermar 265
      operation, <code>rwlock_read_lock_timeout</code>, may bypass this mutex
266
      when it detects that:</para>
43 jermar 267
 
268
      <orderedlist>
269
        <listitem>
45 jermar 270
          <para>there are other readers in the critical section and</para>
43 jermar 271
        </listitem>
272
 
273
        <listitem>
274
          <para>there are no sleeping threads waiting for the exclusive
45 jermar 275
          mutex.</para>
43 jermar 276
        </listitem>
277
      </orderedlist>
278
 
279
      <para>If both conditions are true, the reader will bypass the mutex,
45 jermar 280
      increment the number of readers in the critical section and then enter
281
      the critical section. Note that if there are any sleeping threads at the
282
      beginning of the wait queue, the first must be a writer. If the
43 jermar 283
      conditions are not fulfilled, the reader normally waits until the
284
      exclusive mutex is granted to it.</para>
41 jermar 285
    </section>
9 bondari 286
 
287
    <section>
41 jermar 288
      <title>Condition variables</title>
9 bondari 289
 
48 jermar 290
      <para>Condition variables can be used for waiting until a condition
291
      becomes true. In this respect, they are similar to wait queues. But
292
      contrary to wait queues, condition variables have different semantics
293
      that allows events to be lost when there is no thread waiting for them.
294
      In order to support this, condition variables don't use direct hand-off
295
      and operate in a way similar to the example below. A thread waiting for
296
      the condition becoming true does the following:</para>
297
 
298
      <para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
299
while (!<varname>condition</varname>)
300
        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
301
/* <remark>the condition is true, do something</remark> */
302
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
303
 
304
      <para>A thread that causes the condition become true signals this event
305
      like this:</para>
306
 
307
      <para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
308
<varname>condition</varname> = <constant>true</constant>;
309
<function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
310
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
311
 
57 jermar 312
      <para>The wait operation, <code>condvar_wait_timeout</code>, always puts
313
      the calling thread to sleep. The thread then sleeps until another thread
314
      invokes <code>condvar_broadcast</code> on the same condition variable or
315
      until it is woken up by <code>condvar_signal</code>. The
316
      <code>condvar_signal</code> operation unblocks the first thread blocking
317
      on the condition variable while the <code>condvar_broadcast</code>
318
      operation unblocks all threads blocking there. If there are no blocking
319
      threads, these two operations have no efect.</para>
48 jermar 320
 
321
      <para>Note that the threads must synchronize over a dedicated mutex. To
57 jermar 322
      prevent race condition between <code>condvar_wait_timeout</code> and
323
      <code>condvar_signal</code> or <code>condvar_broadcast</code>, the mutex
324
      is passed to <code>condvar_wait_timeout</code> which then atomically
325
      puts the calling thread asleep and unlocks the mutex. When the thread
326
      eventually wakes up, <code>condvar_wait</code> regains the mutex and
48 jermar 327
      returns.</para>
328
 
329
      <para>Also note, that there is no conditional operation for condition
330
      variables. Such an operation would make no sence since condition
331
      variables are defined to forget events for which there is no waiting
57 jermar 332
      thread and because <code>condvar_wait</code> must always go to sleep.
333
      The operation with timeout is supported as usually.</para>
48 jermar 334
 
335
      <para>In HelenOS, condition variables are based on wait queues. As it is
336
      already mentioned above, wait queues remember missed events while
337
      condition variables must not do so. This is reasoned by the fact that
338
      condition variables are designed for scenarios in which an event might
339
      occur very many times without being picked up by any waiting thread. On
340
      the other hand, wait queues would remember any event that had not been
57 jermar 341
      picked up by a call to <code>waitq_sleep_timeout</code>. Therefore, if
342
      wait queues were used directly and without any changes to implement
343
      condition variables, the missed_wakeup counter would hurt performance of
344
      the implementation: the <code>while</code> loop in
345
      <code>condvar_wait_timeout</code> would effectively do busy waiting
346
      until all missed wakeups were discarded.</para>
48 jermar 347
 
348
      <para>The requirement on the wait operation to atomically put the caller
349
      to sleep and release the mutex poses an interesting problem on
57 jermar 350
      <code>condvar_wait_timeout</code>. More precisely, the thread should
351
      sleep in the condvar's wait queue prior to releasing the mutex, but it
352
      must not hold the mutex when it is sleeping.</para>
48 jermar 353
 
354
      <para>Problems described in the two previous paragraphs are addressed in
57 jermar 355
      HelenOS by dividing the <code>waitq_sleep_timeout</code> function into
356
      three pieces:</para>
48 jermar 357
 
358
      <orderedlist>
359
        <listitem>
57 jermar 360
          <para><code>waitq_sleep_prepare</code> prepares the thread to go to
361
          sleep by, among other things, locking the wait queue;</para>
48 jermar 362
        </listitem>
363
 
364
        <listitem>
57 jermar 365
          <para><code>waitq_sleep_timeout_unsafe</code> implements the core
366
          blocking logic;</para>
48 jermar 367
        </listitem>
368
 
369
        <listitem>
57 jermar 370
          <para><code>waitq_sleep_finish</code> performs cleanup after
371
          <code>waitq_sleep_timeout_unsafe</code>; after this call, the wait
372
          queue spinlock is guaranteed to be unlocked by the caller</para>
48 jermar 373
        </listitem>
374
      </orderedlist>
375
 
57 jermar 376
      <para>The stock <code>waitq_sleep_timeout</code> is then a mere wrapper
377
      that calls these three functions. It is provided for convenience in
378
      cases where the caller doesn't require such a low level control.
379
      However, the implementation of <code>condvar_wait_timeout</code> does
380
      need this finer-grained control because it has to interleave calls to
381
      these functions by other actions. It carries its operations out in the
382
      following order:</para>
48 jermar 383
 
384
      <orderedlist>
385
        <listitem>
57 jermar 386
          <para>calls <code>waitq_sleep_prepare</code> in order to lock the
387
          condition variable's wait queue,</para>
48 jermar 388
        </listitem>
389
 
390
        <listitem>
391
          <para>releases the mutex,</para>
392
        </listitem>
393
 
394
        <listitem>
395
          <para>clears the counter of missed wakeups,</para>
396
        </listitem>
397
 
398
        <listitem>
57 jermar 399
          <para>calls <code>waitq_sleep_timeout_unsafe</code>,</para>
48 jermar 400
        </listitem>
401
 
402
        <listitem>
403
          <para>retakes the mutex,</para>
404
        </listitem>
405
 
406
        <listitem>
57 jermar 407
          <para>calls <code>waitq_sleep_finish</code>.</para>
48 jermar 408
        </listitem>
409
      </orderedlist>
9 bondari 410
    </section>
41 jermar 411
  </section>
9 bondari 412
 
41 jermar 413
  <section>
414
    <title>Userspace synchronization</title>
9 bondari 415
 
41 jermar 416
    <section>
417
      <title>Futexes</title>
418
 
419
      <para></para>
420
    </section>
421
  </section>
422
</chapter>