Subversion Repositories HelenOS-doc

Rev

Rev 45 | Rev 57 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
9 bondari 1
<?xml version="1.0" encoding="UTF-8"?>
41 jermar 2
<chapter id="sync">
3
  <?dbhtml filename="sync.html"?>
9 bondari 4
 
41 jermar 5
  <title>Mutual exclusion and synchronization</title>
9 bondari 6
 
41 jermar 7
  <section>
8
    <title>Introduction</title>
9 bondari 9
 
45 jermar 10
    <para>The HelenOS operating system is designed to make use of the
11
    parallelism offered by the hardware and to exploit concurrency of both the
12
    kernel and userspace tasks. This is achieved through multiprocessor
13
    support and several levels of multiprogramming such as multitasking,
14
    multithreading and also through userspace pseudo threads. However, such a
15
    highly concurrent environment needs safe and efficient ways to handle
16
    mutual exclusion and synchronization of many execution flows.</para>
41 jermar 17
  </section>
18
 
19
  <section>
20
    <title>Active kernel primitives</title>
21
 
9 bondari 22
    <section>
41 jermar 23
      <title>Spinlocks</title>
9 bondari 24
 
45 jermar 25
      <para>The basic mutual exclusion primitive is the spinlock. The spinlock
26
      implements active waiting for the availability of a memory lock (i.e.
41 jermar 27
      simple variable) in a multiprocessor-safe manner. This safety is
28
      achieved through the use of a specialized, architecture-dependent,
29
      atomic test-and-set operation which either locks the spinlock (i.e. sets
30
      the variable) or, provided that it is already locked, leaves it
31
      unaltered. In any case, the test-and-set operation returns a value, thus
32
      signalling either success (i.e. zero return value) or failure (i.e.
45 jermar 33
      non-zero value) in acquiring the lock. Note that this makes a
41 jermar 34
      fundamental difference between the naive algorithm that doesn't use the
35
      atomic operation and the spinlock algortihm. While the naive algorithm
45 jermar 36
      is prone to race conditions on SMP configurations and thus is completely
41 jermar 37
      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
38
      conditions and is suitable for mutual exclusion use.</para>
9 bondari 39
 
41 jermar 40
      <para>The semantics of the test-and-set operation is that the spinlock
41
      remains unavailable until this operation called on the respective
45 jermar 42
      spinlock returns zero. HelenOS builds two functions on top of the
43
      test-and-set operation. The first function is the unconditional attempt
44
      to acquire the spinlock and is called
45
      <emphasis>spinlock_lock</emphasis>. It simply loops until the
46
      test-and-set returns a zero value. The other function,
41 jermar 47
      <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
45 jermar 48
      and calls the test-and-set only once to find out whether it managed to
41 jermar 49
      acquire the spinlock or not. The conditional operation is useful in
50
      situations in which an algorithm cannot acquire more spinlocks in the
51
      proper order and a deadlock cannot be avoided. In such a case, the
52
      algorithm would detect the danger and instead of possibly deadlocking
53
      the system it would simply release some spinlocks it already holds and
54
      retry the whole operation with the hope that it will succeed next time.
45 jermar 55
      The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy
56
      - it merely clears the spinlock variable.</para>
9 bondari 57
 
41 jermar 58
      <para>Nevertheless, there is a special issue related to hardware
45 jermar 59
      optimizations that modern processors implement. Particularly problematic
60
      is the out-of-order execution of instructions within the critical
61
      section protected by a spinlock. The processors are always
41 jermar 62
      self-consistent so that they can carry out speculatively executed
63
      instructions in the right order with regard to dependencies among those
64
      instructions. However, the dependency between instructions inside the
65
      critical section and those that implement locking and unlocking of the
45 jermar 66
      respective spinlock is not implicit on some processor architectures. As
67
      a result, the processor needs to be explicitly told about each
68
      occurrence of such a dependency. Therefore, HelenOS adds
69
      architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>,
41 jermar 70
      <emphasis>spinlock_trylock</emphasis> and
45 jermar 71
      <emphasis>spinlock_unlock</emphasis> functions to prevent the
72
      instructions inside the critical section from permeating out. On some
73
      architectures, these hooks can be void because the dependencies are
74
      implicitly there because of the special properties of locking and
75
      unlocking instructions. However, other architectures need to instrument
76
      these hooks with different memory barriers, depending on what operations
77
      could permeate out.</para>
9 bondari 78
 
41 jermar 79
      <para>Spinlocks have one significant drawback: when held for longer time
45 jermar 80
      periods, they harm both parallelism and concurrency. The processor
81
      executing <emphasis>spinlock_lock</emphasis> does not do any fruitful
82
      work and is effectively halted until it can grab the lock and proceed.
83
      Similarily, other execution flows cannot execute on the processor that
84
      holds the spinlock because the kernel disables preemption on that
85
      processor when a spinlock is held. The reason behind disabling
86
      preemption is priority inversion problem avoidance. For the same reason,
87
      threads are strongly discouraged from sleeping when they hold a
88
      spinlock.</para>
9 bondari 89
 
41 jermar 90
      <para>To summarize, spinlocks represent very simple and essential mutual
91
      exclusion primitive for SMP systems. On the other hand, spinlocks scale
92
      poorly because of the active loop they are based on. Therefore,
45 jermar 93
      spinlocks are used in HelenOS only for short-time mutual exclusion and
41 jermar 94
      in cases where the mutual exclusion is required out of thread context.
95
      Lastly, spinlocks are used in the construction of passive
96
      synchronization primitives.</para>
97
    </section>
98
  </section>
9 bondari 99
 
41 jermar 100
  <section>
101
    <title>Passive kernel synchronization</title>
9 bondari 102
 
41 jermar 103
    <section>
43 jermar 104
      <title>Wait queues</title>
9 bondari 105
 
43 jermar 106
      <para>A wait queue is the basic passive synchronization primitive on
45 jermar 107
      which all other passive synchronization primitives are built. Simply
108
      put, it allows a thread to sleep until an event associated with the
109
      particular wait queue occurs. Multiple threads are notified about
110
      incoming events in a first come, first served fashion. Moreover, should
111
      the event come before any thread waits for it, it is recorded in the
112
      wait queue as a missed wakeup and later forwarded to the first thread
113
      that decides to wait in the queue. The inner structures of the wait
114
      queue are protected by a spinlock.</para>
43 jermar 115
 
116
      <para>The thread that wants to wait for a wait queue event uses the
117
      <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
118
      checks the wait queue's counter of missed wakeups and if there are any
119
      missed wakeups, the call returns immediately. The call also returns
120
      immediately if only a conditional wait was requested. Otherwise the
121
      thread is enqueued in the wait queue's list of sleeping threads and its
122
      state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
123
      one of the following events happens:</para>
124
 
125
      <orderedlist>
126
        <listitem>
127
          <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
128
          thread is the first thread in the wait queue's list of sleeping
45 jermar 129
          threads;</para>
43 jermar 130
        </listitem>
131
 
132
        <listitem>
133
          <para>another thread calls
134
          <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
45 jermar 135
          thread;</para>
43 jermar 136
        </listitem>
137
 
138
        <listitem>
45 jermar 139
          <para>the sleep times out provided that none of the previous
140
          occurred within a specified time limit; the limit can be
141
          infinity.</para>
43 jermar 142
        </listitem>
143
      </orderedlist>
144
 
145
      <para>All five possibilities (immediate return on success, immediate
146
      return on failure, wakeup after sleep, interruption and timeout) are
147
      distinguishable by the return value of
45 jermar 148
      <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a
149
      sleeping thread is essential for externally initiated thread
150
      termination. The ability to wait only for a certain amount of time is
151
      used, for instance, to passively delay thread execution by several
152
      microseconds or even seconds in <emphasis>thread_sleep</emphasis>
153
      function. Due to the fact that all other passive kernel synchronization
154
      primitives are based on wait queues, they also have the option of being
155
      interrutped and, more importantly, can timeout. All of them also
156
      implement the conditional operation. Furthemore, this very fundamental
157
      interface reaches up to the implementation of futexes - userspace
158
      synchronization primitive, which makes it possible for a userspace
159
      thread to request a synchronization operation with a timeout or a
160
      conditional operation.</para>
43 jermar 161
 
162
      <para>From the description above, it should be apparent, that when a
163
      sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
45 jermar 164
      <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the
165
      thread can be sure that the event has occurred. The thread need not and
166
      should not verify this fact. This approach is called direct hand-off and
167
      is characteristic for all passive HelenOS synchronization primitives,
168
      with the exception as described below.</para>
41 jermar 169
    </section>
9 bondari 170
 
41 jermar 171
    <section>
172
      <title>Semaphores</title>
9 bondari 173
 
43 jermar 174
      <para>The interesting point about wait queues is that the number of
175
      missed wakeups is equal to the number of threads that will not block in
176
      <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
177
      instead. On the other hand, semaphores are synchronization primitives
48 jermar 178
      that will let predefined amount of threads into their critical section
179
      and block any other threads above this count. However, these two cases
180
      are exactly the same. Semaphores in HelenOS are therefore implemented as
43 jermar 181
      wait queues with a single semantic change: their wait queue is
182
      initialized to have so many missed wakeups as is the number of threads
183
      that the semphore intends to let into its critical section
184
      simultaneously.</para>
185
 
186
      <para>In the semaphore language, the wait queue operation
187
      <emphasis>waitq_sleep_timeout</emphasis> corresponds to
188
      <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
189
      represented by the function <emphasis>semaphore_down_timeout</emphasis>
190
      and by way of similitude the wait queue operation waitq_wakeup
191
      corresponds to semaphore <emphasis>up</emphasis> operation, represented
192
      by the function <emphasis>sempafore_up</emphasis>. The conditional down
193
      operation is called <emphasis>semaphore_trydown</emphasis>.</para>
41 jermar 194
    </section>
9 bondari 195
 
41 jermar 196
    <section>
43 jermar 197
      <title>Mutexes</title>
9 bondari 198
 
45 jermar 199
      <para>Mutexes are sometimes referred to as binary sempahores. That means
200
      that mutexes are like semaphores that allow only one thread in its
43 jermar 201
      critical section. Indeed, mutexes in HelenOS are implemented exactly in
45 jermar 202
      this way: they are built on top of semaphores. From another point of
203
      view, they can be viewed as spinlocks without busy waiting. Their
204
      semaphore heritage provides good basics for both conditional operation
205
      and operation with timeout. The locking operation is called
43 jermar 206
      <emphasis>mutex_lock</emphasis>, the conditional locking operation is
207
      called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
208
      called <emphasis>mutex_unlock</emphasis>.</para>
41 jermar 209
    </section>
9 bondari 210
 
41 jermar 211
    <section>
43 jermar 212
      <title>Reader/writer locks</title>
9 bondari 213
 
43 jermar 214
      <para>Reader/writer locks, or rwlocks, are by far the most complicated
215
      synchronization primitive within the kernel. The goal of these locks is
45 jermar 216
      to improve concurrency of applications, in which threads need to
217
      synchronize access to a shared resource, and that access can be
43 jermar 218
      partitioned into a read-only mode and a write mode. Reader/writer locks
219
      should make it possible for several, possibly many, readers to enter the
220
      critical section, provided that no writer is currently in the critical
221
      section, or to be in the critical section contemporarily. Writers are
222
      allowed to enter the critical section only individually, provided that
45 jermar 223
      no reader is in the critical section already. Applications, in which the
224
      majority of operations can be done in the read-only mode, can benefit
43 jermar 225
      from increased concurrency created by reader/writer locks.</para>
226
 
45 jermar 227
      <para>During reader/writer lock construction, a decision should be made
43 jermar 228
      whether readers will be prefered over writers or whether writers will be
229
      prefered over readers in cases when the lock is not currently held and
230
      both a reader and a writer want to gain the lock. Some operating systems
231
      prefer one group over the other, creating thus a possibility for
232
      starving the unprefered group. In the HelenOS operating system, none of
45 jermar 233
      the two groups is prefered. The lock is granted on a first come, first
43 jermar 234
      served basis with the additional note that readers are granted the lock
45 jermar 235
      in the biggest possible batch.</para>
43 jermar 236
 
237
      <para>With this policy and the timeout modes of operation, the direct
238
      hand-off becomes much more complicated. For instance, a writer leaving
239
      the critical section must wake up all leading readers in the rwlock's
240
      wait queue or one leading writer or no-one if no thread is waiting.
241
      Similarily, the last reader leaving the critical section must wakeup the
45 jermar 242
      sleeping writer if there are any sleeping threads left at all. As
243
      another example, if a writer at the beginning of the rwlock's wait queue
244
      times out and the lock is held by at least one reader, the writer which
245
      has timed out must first wake up all readers that follow him in the
246
      queue prior to signalling the timeout itself and giving up.</para>
43 jermar 247
 
45 jermar 248
      <para>Due to the issues mentioned in the previous paragraph, the
249
      reader/writer lock imlpementation needs to walk the rwlock wait queue's
250
      list of sleeping threads directly, in order to find out the type of
43 jermar 251
      access that the queueing threads demand. This makes the code difficult
252
      to understand and dependent on the internal implementation of the wait
253
      queue. Nevertheless, it remains unclear to the authors of HelenOS
254
      whether a simpler but equivalently fair solution exists.</para>
255
 
256
      <para>The implementation of rwlocks as it has been already put, makes
257
      use of one single wait queue for both readers and writers, thus avoiding
258
      any possibility of starvation. In fact, rwlocks use a mutex rather than
259
      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
260
      and is used to synchronize writers. The writer's lock operation,
261
      <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
262
      the exclusive mutex. If it succeeds, the writer is granted the rwlock.
44 jermar 263
      However, if the operation fails (e.g. times out), the writer must check
264
      for potential readers at the head of the list of sleeping threads
45 jermar 265
      associated with the mutex's wait queue and then proceed according to the
44 jermar 266
      procedure outlined above.</para>
43 jermar 267
 
268
      <para>The exclusive mutex plays an important role in reader
269
      synchronization as well. However, a reader doing the reader's lock
270
      operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
271
      this mutex when it detects that:</para>
272
 
273
      <orderedlist>
274
        <listitem>
45 jermar 275
          <para>there are other readers in the critical section and</para>
43 jermar 276
        </listitem>
277
 
278
        <listitem>
279
          <para>there are no sleeping threads waiting for the exclusive
45 jermar 280
          mutex.</para>
43 jermar 281
        </listitem>
282
      </orderedlist>
283
 
284
      <para>If both conditions are true, the reader will bypass the mutex,
45 jermar 285
      increment the number of readers in the critical section and then enter
286
      the critical section. Note that if there are any sleeping threads at the
287
      beginning of the wait queue, the first must be a writer. If the
43 jermar 288
      conditions are not fulfilled, the reader normally waits until the
289
      exclusive mutex is granted to it.</para>
41 jermar 290
    </section>
9 bondari 291
 
292
    <section>
41 jermar 293
      <title>Condition variables</title>
9 bondari 294
 
48 jermar 295
      <para>Condition variables can be used for waiting until a condition
296
      becomes true. In this respect, they are similar to wait queues. But
297
      contrary to wait queues, condition variables have different semantics
298
      that allows events to be lost when there is no thread waiting for them.
299
      In order to support this, condition variables don't use direct hand-off
300
      and operate in a way similar to the example below. A thread waiting for
301
      the condition becoming true does the following:</para>
302
 
303
      <para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
304
while (!<varname>condition</varname>)
305
        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
306
/* <remark>the condition is true, do something</remark> */
307
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
308
 
309
      <para>A thread that causes the condition become true signals this event
310
      like this:</para>
311
 
312
      <para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
313
<varname>condition</varname> = <constant>true</constant>;
314
<function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
315
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
316
 
317
      <para>The wait operation, <emphasis>condvar_wait_timeout</emphasis>,
318
      always puts the calling thread to sleep. The thread then sleeps until
319
      another thread invokes <emphasis>condvar_broadcast</emphasis> on the
320
      same condition variable or until it is woken up by
321
      <emphasis>condvar_signal</emphasis>. The
322
      <emphasis>condvar_signal</emphasis> operation unblocks the first thread
323
      blocking on the condition variable while the
324
      <emphasis>condvar_broadcast</emphasis> operation unblocks all threads
325
      blocking there. If there are no blocking threads, these two operations
326
      have no efect.</para>
327
 
328
      <para>Note that the threads must synchronize over a dedicated mutex. To
329
      prevent race condition between <emphasis>condvar_wait_timeout</emphasis>
330
      and <emphasis>condvar_signal</emphasis> or
331
      <emphasis>condvar_broadcast</emphasis>, the mutex is passed to
332
      <emphasis>condvar_wait_timeout</emphasis> which then atomically puts the
333
      calling thread asleep and unlocks the mutex. When the thread eventually
334
      wakes up, <emphasis>condvar_wait</emphasis> regains the mutex and
335
      returns.</para>
336
 
337
      <para>Also note, that there is no conditional operation for condition
338
      variables. Such an operation would make no sence since condition
339
      variables are defined to forget events for which there is no waiting
340
      thread and because <emphasis>condvar_wait</emphasis> must always go to
341
      sleep. The operation with timeout is supported as usually.</para>
342
 
343
      <para>In HelenOS, condition variables are based on wait queues. As it is
344
      already mentioned above, wait queues remember missed events while
345
      condition variables must not do so. This is reasoned by the fact that
346
      condition variables are designed for scenarios in which an event might
347
      occur very many times without being picked up by any waiting thread. On
348
      the other hand, wait queues would remember any event that had not been
349
      picked up by a call to <emphasis>waitq_sleep_timeout</emphasis>.
350
      Therefore, if wait queues were used directly and without any changes to
351
      implement condition variables, the missed_wakeup counter would hurt
352
      performance of the implementation: the <code>while</code> loop in
353
      <emphasis>condvar_wait_timeout</emphasis> would effectively do busy
354
      waiting until all missed wakeups were discarded.</para>
355
 
356
      <para>The requirement on the wait operation to atomically put the caller
357
      to sleep and release the mutex poses an interesting problem on
358
      <emphasis>condvar_wait_timeout</emphasis>. More precisely, the thread
359
      should sleep in the condvar's wait queue prior to releasing the mutex,
360
      but it must not hold the mutex when it is sleeping.</para>
361
 
362
      <para>Problems described in the two previous paragraphs are addressed in
363
      HelenOS by dividing the <emphasis>waitq_sleep_timeout</emphasis>
364
      function into three pieces:</para>
365
 
366
      <orderedlist>
367
        <listitem>
368
          <para><emphasis>waitq_sleep_prepare</emphasis> prepares the thread
369
          to go to sleep by, among other things, locking the wait
370
          queue;</para>
371
        </listitem>
372
 
373
        <listitem>
374
          <para><emphasis>waitq_sleep_timeout_unsafe</emphasis> implements the
375
          core blocking logic;</para>
376
        </listitem>
377
 
378
        <listitem>
379
          <para><emphasis>waitq_sleep_finish</emphasis> performs cleanup after
380
          <emphasis>waitq_sleep_timeout_unsafe</emphasis>; after this call,
381
          the wait queue spinlock is guaranteed to be unlocked by the
382
          caller</para>
383
        </listitem>
384
      </orderedlist>
385
 
386
      <para>The stock <emphasis>waitq_sleep_timeout</emphasis> is then a mere
387
      wrapper that calls these three functions. It is provided for convenience
388
      in cases where the caller doesn't require such a low level control.
389
      However, the implementation of <emphasis>condvar_wait_timeout</emphasis>
390
      does need this finer-grained control because it has to interleave calls
391
      to these functions by other actions. It carries its operations out in
392
      the following order:</para>
393
 
394
      <orderedlist>
395
        <listitem>
396
          <para>calls <emphasis>waitq_sleep_prepare</emphasis> in order to
397
          lock the condition variable's wait queue,</para>
398
        </listitem>
399
 
400
        <listitem>
401
          <para>releases the mutex,</para>
402
        </listitem>
403
 
404
        <listitem>
405
          <para>clears the counter of missed wakeups,</para>
406
        </listitem>
407
 
408
        <listitem>
409
          <para>calls <emphasis>waitq_sleep_timeout_unsafe</emphasis>,</para>
410
        </listitem>
411
 
412
        <listitem>
413
          <para>retakes the mutex,</para>
414
        </listitem>
415
 
416
        <listitem>
417
          <para>calls <emphasis>waitq_sleep_finish</emphasis>.</para>
418
        </listitem>
419
      </orderedlist>
9 bondari 420
    </section>
41 jermar 421
  </section>
9 bondari 422
 
41 jermar 423
  <section>
424
    <title>Userspace synchronization</title>
9 bondari 425
 
41 jermar 426
    <section>
427
      <title>Futexes</title>
428
 
429
      <para></para>
430
    </section>
431
  </section>
432
</chapter>