Subversion Repositories HelenOS-doc

Rev

Rev 82 | Rev 102 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 82 Rev 86
Line 45... Line 45...
45
 
45
 
46
      <para>The semantics of the test-and-set operation is that the spinlock
46
      <para>The semantics of the test-and-set operation is that the spinlock
47
      remains unavailable until this operation called on the respective
47
      remains unavailable until this operation called on the respective
48
      spinlock returns zero. HelenOS builds two functions on top of the
48
      spinlock returns zero. HelenOS builds two functions on top of the
49
      test-and-set operation. The first function is the unconditional attempt
49
      test-and-set operation. The first function is the unconditional attempt
50
      to acquire the spinlock and is called <code>spinlock_lock</code>. It
50
      to acquire the spinlock and is called <code>spinlock_lock()</code>. It
51
      simply loops until the test-and-set returns a zero value. The other
51
      simply loops until the test-and-set returns a zero value. The other
52
      function, <code>spinlock_trylock</code>, is the conditional lock
52
      function, <code>spinlock_trylock()</code>, is the conditional lock
53
      operation and calls the test-and-set only once to find out whether it
53
      operation and calls the test-and-set only once to find out whether it
54
      managed to acquire the spinlock or not. The conditional operation is
54
      managed to acquire the spinlock or not. The conditional operation is
55
      useful in situations in which an algorithm cannot acquire more spinlocks
55
      useful in situations in which an algorithm cannot acquire more spinlocks
56
      in the proper order and a deadlock cannot be avoided. In such a case,
56
      in the proper order and a deadlock cannot be avoided. In such a case,
57
      the algorithm would detect the danger and instead of possibly
57
      the algorithm would detect the danger and instead of possibly
58
      deadlocking the system it would simply release some spinlocks it already
58
      deadlocking the system it would simply release some spinlocks it already
59
      holds and retry the whole operation with the hope that it will succeed
59
      holds and retry the whole operation with the hope that it will succeed
60
      next time. The unlock function, <code>spinlock_unlock</code>, is quite
60
      next time. The unlock function, <code>spinlock_unlock()</code>, is quite
61
      easy - it merely clears the spinlock variable.</para>
61
      easy - it merely clears the spinlock variable.</para>
62
 
62
 
63
      <para>Nevertheless, there is a special issue related to hardware
63
      <para>Nevertheless, there is a special issue related to hardware
64
      optimizations that modern processors implement. Particularly problematic
64
      optimizations that modern processors implement. Particularly problematic
65
      is the out-of-order execution of instructions within the critical
65
      is the out-of-order execution of instructions within the critical
Line 69... Line 69...
69
      instructions. However, the dependency between instructions inside the
69
      instructions. However, the dependency between instructions inside the
70
      critical section and those that implement locking and unlocking of the
70
      critical section and those that implement locking and unlocking of the
71
      respective spinlock is not implicit on some processor architectures. As
71
      respective spinlock is not implicit on some processor architectures. As
72
      a result, the processor needs to be explicitly told about each
72
      a result, the processor needs to be explicitly told about each
73
      occurrence of such a dependency. Therefore, HelenOS adds
73
      occurrence of such a dependency. Therefore, HelenOS adds
74
      architecture-specific hooks to all <code>spinlock_lock</code>,
74
      architecture-specific hooks to all <code>spinlock_lock()</code>,
75
      <code>spinlock_trylock</code> and <code>spinlock_unlock</code> functions
75
      <code>spinlock_trylock()</code> and <code>spinlock_unlock()</code> functions
76
      to prevent the instructions inside the critical section from permeating
76
      to prevent the instructions inside the critical section from permeating
77
      out. On some architectures, these hooks can be void because the
77
      out. On some architectures, these hooks can be void because the
78
      dependencies are implicitly there because of the special properties of
78
      dependencies are implicitly there because of the special properties of
79
      locking and unlocking instructions. However, other architectures need to
79
      locking and unlocking instructions. However, other architectures need to
80
      instrument these hooks with different memory barriers, depending on what
80
      instrument these hooks with different memory barriers, depending on what
81
      operations could permeate out.</para>
81
      operations could permeate out.</para>
82
 
82
 
83
      <para>Spinlocks have one significant drawback: when held for longer time
83
      <para>Spinlocks have one significant drawback: when held for longer time
84
      periods, they harm both parallelism and concurrency. The processor
84
      periods, they harm both parallelism and concurrency. The processor
85
      executing <code>spinlock_lock</code> does not do any fruitful work and
85
      executing <code>spinlock_lock()</code> does not do any fruitful work and
86
      is effectively halted until it can grab the lock and proceed.
86
      is effectively halted until it can grab the lock and proceed.
87
      Similarily, other execution flows cannot execute on the processor that
87
      Similarily, other execution flows cannot execute on the processor that
88
      holds the spinlock because the kernel disables preemption on that
88
      holds the spinlock because the kernel disables preemption on that
89
      processor when a spinlock is held. The reason behind disabling
89
      processor when a spinlock is held. The reason behind disabling
90
      preemption is priority inversion problem avoidance. For the same reason,
90
      preemption is priority inversion problem avoidance. For the same reason,
Line 122... Line 122...
122
      wait queue as a missed wakeup and later forwarded to the first thread
122
      wait queue as a missed wakeup and later forwarded to the first thread
123
      that decides to wait in the queue. The inner structures of the wait
123
      that decides to wait in the queue. The inner structures of the wait
124
      queue are protected by a spinlock.</para>
124
      queue are protected by a spinlock.</para>
125
 
125
 
126
      <para>The thread that wants to wait for a wait queue event uses the
126
      <para>The thread that wants to wait for a wait queue event uses the
127
      <code>waitq_sleep_timeout</code> function. The algorithm then checks the
127
      <code>waitq_sleep_timeout()</code> function. The algorithm then checks the
128
      wait queue's counter of missed wakeups and if there are any missed
128
      wait queue's counter of missed wakeups and if there are any missed
129
      wakeups, the call returns immediately. The call also returns immediately
129
      wakeups, the call returns immediately. The call also returns immediately
130
      if only a conditional wait was requested. Otherwise the thread is
130
      if only a conditional wait was requested. Otherwise the thread is
131
      enqueued in the wait queue's list of sleeping threads and its state is
131
      enqueued in the wait queue's list of sleeping threads and its state is
132
      changed to <constant>Sleeping</constant>. It then sleeps until one of
132
      changed to <constant>Sleeping</constant>. It then sleeps until one of
133
      the following events happens:</para>
133
      the following events happens:</para>
134
 
134
 
135
      <orderedlist>
135
      <orderedlist>
136
        <listitem>
136
        <listitem>
137
          <para>another thread calls <code>waitq_wakeup</code> and the thread
137
          <para>another thread calls <code>waitq_wakeup()</code> and the thread
138
          is the first thread in the wait queue's list of sleeping
138
          is the first thread in the wait queue's list of sleeping
139
          threads;</para>
139
          threads;</para>
140
        </listitem>
140
        </listitem>
141
 
141
 
142
        <listitem>
142
        <listitem>
143
          <para>another thread calls <code>waitq_interrupt_sleep</code> on the
143
          <para>another thread calls <code>waitq_interrupt_sleep()</code> on the
144
          sleeping thread;</para>
144
          sleeping thread;</para>
145
        </listitem>
145
        </listitem>
146
 
146
 
147
        <listitem>
147
        <listitem>
148
          <para>the sleep times out provided that none of the previous
148
          <para>the sleep times out provided that none of the previous
Line 151... Line 151...
151
        </listitem>
151
        </listitem>
152
      </orderedlist>
152
      </orderedlist>
153
 
153
 
154
      <para>All five possibilities (immediate return on success, immediate
154
      <para>All five possibilities (immediate return on success, immediate
155
      return on failure, wakeup after sleep, interruption and timeout) are
155
      return on failure, wakeup after sleep, interruption and timeout) are
156
      distinguishable by the return value of <code>waitq_sleep_timeout</code>.
156
      distinguishable by the return value of <code>waitq_sleep_timeout()</code>.
157
      Being able to interrupt a sleeping thread is essential for externally
157
      Being able to interrupt a sleeping thread is essential for externally
158
      initiated thread termination. The ability to wait only for a certain
158
      initiated thread termination. The ability to wait only for a certain
159
      amount of time is used, for instance, to passively delay thread
159
      amount of time is used, for instance, to passively delay thread
160
      execution by several microseconds or even seconds in
160
      execution by several microseconds or even seconds in
161
      <code>thread_sleep</code> function. Due to the fact that all other
161
      <code>thread_sleep()</code> function. Due to the fact that all other
162
      passive kernel synchronization primitives are based on wait queues, they
162
      passive kernel synchronization primitives are based on wait queues, they
163
      also have the option of being interrutped and, more importantly, can
163
      also have the option of being interrutped and, more importantly, can
164
      timeout. All of them also implement the conditional operation.
164
      timeout. All of them also implement the conditional operation.
165
      Furthemore, this very fundamental interface reaches up to the
165
      Furthemore, this very fundamental interface reaches up to the
166
      implementation of futexes - userspace synchronization primitive, which
166
      implementation of futexes - userspace synchronization primitive, which
167
      makes it possible for a userspace thread to request a synchronization
167
      makes it possible for a userspace thread to request a synchronization
168
      operation with a timeout or a conditional operation.</para>
168
      operation with a timeout or a conditional operation.</para>
169
 
169
 
170
      <para>From the description above, it should be apparent, that when a
170
      <para>From the description above, it should be apparent, that when a
171
      sleeping thread is woken by <code>waitq_wakeup</code> or when
171
      sleeping thread is woken by <code>waitq_wakeup()</code> or when
172
      <code>waitq_sleep_timeout</code> succeeds immediately, the thread can be
172
      <code>waitq_sleep_timeout()</code> succeeds immediately, the thread can be
173
      sure that the event has occurred. The thread need not and should not
173
      sure that the event has occurred. The thread need not and should not
174
      verify this fact. This approach is called direct hand-off and is
174
      verify this fact. This approach is called direct hand-off and is
175
      characteristic for all passive HelenOS synchronization primitives, with
175
      characteristic for all passive HelenOS synchronization primitives, with
176
      the exception as described below.</para>
176
      the exception as described below.</para>
177
    </section>
177
    </section>
Line 185... Line 185...
185
 
185
 
186
      <title>Semaphores</title>
186
      <title>Semaphores</title>
187
 
187
 
188
      <para>The interesting point about wait queues is that the number of
188
      <para>The interesting point about wait queues is that the number of
189
      missed wakeups is equal to the number of threads that will not block in
189
      missed wakeups is equal to the number of threads that will not block in
190
      <code>watiq_sleep_timeout</code> and would immediately succeed instead.
190
      <code>watiq_sleep_timeout()</code> and would immediately succeed instead.
191
      On the other hand, semaphores are synchronization primitives that will
191
      On the other hand, semaphores are synchronization primitives that will
192
      let predefined amount of threads into their critical section and block
192
      let predefined amount of threads into their critical section and block
193
      any other threads above this count. However, these two cases are exactly
193
      any other threads above this count. However, these two cases are exactly
194
      the same. Semaphores in HelenOS are therefore implemented as wait queues
194
      the same. Semaphores in HelenOS are therefore implemented as wait queues
195
      with a single semantic change: their wait queue is initialized to have
195
      with a single semantic change: their wait queue is initialized to have
196
      so many missed wakeups as is the number of threads that the semphore
196
      so many missed wakeups as is the number of threads that the semphore
197
      intends to let into its critical section simultaneously.</para>
197
      intends to let into its critical section simultaneously.</para>
198
 
198
 
199
      <para>In the semaphore language, the wait queue operation
199
      <para>In the semaphore language, the wait queue operation
200
      <code>waitq_sleep_timeout</code> corresponds to semaphore
200
      <code>waitq_sleep_timeout()</code> corresponds to semaphore
201
      <code>down</code> operation, represented by the function
201
      <code>down</code> operation, represented by the function
202
      <code>semaphore_down_timeout</code> and by way of similitude the wait
202
      <code>semaphore_down_timeout()</code> and by way of similitude the wait
203
      queue operation waitq_wakeup corresponds to semaphore <code>up</code>
203
      queue operation waitq_wakeup corresponds to semaphore <code>up</code>
204
      operation, represented by the function <code>sempafore_up</code>. The
204
      operation, represented by the function <code>sempafore_up()</code>. The
205
      conditional down operation is called
205
      conditional down operation is called
206
      <code>semaphore_trydown</code>.</para>
206
      <code>semaphore_trydown()</code>.</para>
207
    </section>
207
    </section>
208
 
208
 
209
    <section>
209
    <section>
210
      <title>Mutexes</title>
210
      <title>Mutexes</title>
211
 
211
 
Line 220... Line 220...
220
      critical section. Indeed, mutexes in HelenOS are implemented exactly in
220
      critical section. Indeed, mutexes in HelenOS are implemented exactly in
221
      this way: they are built on top of semaphores. From another point of
221
      this way: they are built on top of semaphores. From another point of
222
      view, they can be viewed as spinlocks without busy waiting. Their
222
      view, they can be viewed as spinlocks without busy waiting. Their
223
      semaphore heritage provides good basics for both conditional operation
223
      semaphore heritage provides good basics for both conditional operation
224
      and operation with timeout. The locking operation is called
224
      and operation with timeout. The locking operation is called
225
      <code>mutex_lock</code>, the conditional locking operation is called
225
      <code>mutex_lock()</code>, the conditional locking operation is called
226
      <code>mutex_trylock</code> and the unlocking operation is called
226
      <code>mutex_trylock()</code> and the unlocking operation is called
227
      <code>mutex_unlock</code>.</para>
227
      <code>mutex_unlock()</code>.</para>
228
    </section>
228
    </section>
229
 
229
 
230
    <section>
230
    <section>
231
      <title>Reader/writer locks</title>
231
      <title>Reader/writer locks</title>
232
 
232
 
Line 279... Line 279...
279
      whether a simpler but equivalently fair solution exists.</para>
279
      whether a simpler but equivalently fair solution exists.</para>
280
 
280
 
281
      <para>The implementation of rwlocks as it has been already put, makes
281
      <para>The implementation of rwlocks as it has been already put, makes
282
      use of one single wait queue for both readers and writers, thus avoiding
282
      use of one single wait queue for both readers and writers, thus avoiding
283
      any possibility of starvation. In fact, rwlocks use a mutex rather than
283
      any possibility of starvation. In fact, rwlocks use a mutex rather than
284
      a bare wait queue. This mutex is called <code>exclusive</code> and is
284
      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis> and is
285
      used to synchronize writers. The writer's lock operation,
285
      used to synchronize writers. The writer's lock operation,
286
      <code>rwlock_write_lock_timeout</code>, simply tries to acquire the
286
      <code>rwlock_write_lock_timeout()</code>, simply tries to acquire the
287
      exclusive mutex. If it succeeds, the writer is granted the rwlock.
287
      exclusive mutex. If it succeeds, the writer is granted the rwlock.
288
      However, if the operation fails (e.g. times out), the writer must check
288
      However, if the operation fails (e.g. times out), the writer must check
289
      for potential readers at the head of the list of sleeping threads
289
      for potential readers at the head of the list of sleeping threads
290
      associated with the mutex's wait queue and then proceed according to the
290
      associated with the mutex's wait queue and then proceed according to the
291
      procedure outlined above.</para>
291
      procedure outlined above.</para>
292
 
292
 
293
      <para>The exclusive mutex plays an important role in reader
293
      <para>The exclusive mutex plays an important role in reader
294
      synchronization as well. However, a reader doing the reader's lock
294
      synchronization as well. However, a reader doing the reader's lock
295
      operation, <code>rwlock_read_lock_timeout</code>, may bypass this mutex
295
      operation, <code>rwlock_read_lock_timeout()</code>, may bypass this mutex
296
      when it detects that:</para>
296
      when it detects that:</para>
297
 
297
 
298
      <orderedlist>
298
      <orderedlist>
299
        <listitem>
299
        <listitem>
300
          <para>there are other readers in the critical section and</para>
300
          <para>there are other readers in the critical section and</para>
Line 330... Line 330...
330
      In order to support this, condition variables don't use direct hand-off
330
      In order to support this, condition variables don't use direct hand-off
331
      and operate in a way similar to the example below. A thread waiting for
331
      and operate in a way similar to the example below. A thread waiting for
332
      the condition becoming true does the following:</para>
332
      the condition becoming true does the following:</para>
333
 
333
 
334
      <example>
334
      <example>
335
        <title>Use of <code>condvar_wait_timeout</code>.</title>
335
        <title>Use of <code>condvar_wait_timeout()</code>.</title>
336
 
336
 
337
        <programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
337
        <programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
338
while (!<varname>condition</varname>)
338
while (!<varname>condition</varname>)
339
        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
339
        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
340
/* <remark>the condition is true, do something</remark> */
340
/* <remark>the condition is true, do something</remark> */
Line 351... Line 351...
351
<varname>condition</varname> = <constant>true</constant>;
351
<varname>condition</varname> = <constant>true</constant>;
352
<function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
352
<function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
353
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
353
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
354
      </example>
354
      </example>
355
 
355
 
356
      <para>The wait operation, <code>condvar_wait_timeout</code>, always puts
356
      <para>The wait operation, <code>condvar_wait_timeout()</code>, always puts
357
      the calling thread to sleep. The thread then sleeps until another thread
357
      the calling thread to sleep. The thread then sleeps until another thread
358
      invokes <code>condvar_broadcast</code> on the same condition variable or
358
      invokes <code>condvar_broadcast()</code> on the same condition variable or
359
      until it is woken up by <code>condvar_signal</code>. The
359
      until it is woken up by <code>condvar_signal()</code>. The
360
      <code>condvar_signal</code> operation unblocks the first thread blocking
360
      <code>condvar_signal()</code> operation unblocks the first thread blocking
361
      on the condition variable while the <code>condvar_broadcast</code>
361
      on the condition variable while the <code>condvar_broadcast()</code>
362
      operation unblocks all threads blocking there. If there are no blocking
362
      operation unblocks all threads blocking there. If there are no blocking
363
      threads, these two operations have no efect.</para>
363
      threads, these two operations have no efect.</para>
364
 
364
 
365
      <para>Note that the threads must synchronize over a dedicated mutex. To
365
      <para>Note that the threads must synchronize over a dedicated mutex. To
366
      prevent race condition between <code>condvar_wait_timeout</code> and
366
      prevent race condition between <code>condvar_wait_timeout()</code> and
367
      <code>condvar_signal</code> or <code>condvar_broadcast</code>, the mutex
367
      <code>condvar_signal()</code> or <code>condvar_broadcast()</code>, the mutex
368
      is passed to <code>condvar_wait_timeout</code> which then atomically
368
      is passed to <code>condvar_wait_timeout()</code> which then atomically
369
      puts the calling thread asleep and unlocks the mutex. When the thread
369
      puts the calling thread asleep and unlocks the mutex. When the thread
370
      eventually wakes up, <code>condvar_wait</code> regains the mutex and
370
      eventually wakes up, <code>condvar_wait()</code> regains the mutex and
371
      returns.</para>
371
      returns.</para>
372
 
372
 
373
      <para>Also note, that there is no conditional operation for condition
373
      <para>Also note, that there is no conditional operation for condition
374
      variables. Such an operation would make no sence since condition
374
      variables. Such an operation would make no sence since condition
375
      variables are defined to forget events for which there is no waiting
375
      variables are defined to forget events for which there is no waiting
376
      thread and because <code>condvar_wait</code> must always go to sleep.
376
      thread and because <code>condvar_wait()</code> must always go to sleep.
377
      The operation with timeout is supported as usually.</para>
377
      The operation with timeout is supported as usually.</para>
378
 
378
 
379
      <para>In HelenOS, condition variables are based on wait queues. As it is
379
      <para>In HelenOS, condition variables are based on wait queues. As it is
380
      already mentioned above, wait queues remember missed events while
380
      already mentioned above, wait queues remember missed events while
381
      condition variables must not do so. This is reasoned by the fact that
381
      condition variables must not do so. This is reasoned by the fact that
382
      condition variables are designed for scenarios in which an event might
382
      condition variables are designed for scenarios in which an event might
383
      occur very many times without being picked up by any waiting thread. On
383
      occur very many times without being picked up by any waiting thread. On
384
      the other hand, wait queues would remember any event that had not been
384
      the other hand, wait queues would remember any event that had not been
385
      picked up by a call to <code>waitq_sleep_timeout</code>. Therefore, if
385
      picked up by a call to <code>waitq_sleep_timeout()</code>. Therefore, if
386
      wait queues were used directly and without any changes to implement
386
      wait queues were used directly and without any changes to implement
387
      condition variables, the missed_wakeup counter would hurt performance of
387
      condition variables, the missed_wakeup counter would hurt performance of
388
      the implementation: the <code>while</code> loop in
388
      the implementation: the <code>while</code> loop in
389
      <code>condvar_wait_timeout</code> would effectively do busy waiting
389
      <code>condvar_wait_timeout()</code> would effectively do busy waiting
390
      until all missed wakeups were discarded.</para>
390
      until all missed wakeups were discarded.</para>
391
 
391
 
392
      <para>The requirement on the wait operation to atomically put the caller
392
      <para>The requirement on the wait operation to atomically put the caller
393
      to sleep and release the mutex poses an interesting problem on
393
      to sleep and release the mutex poses an interesting problem on
394
      <code>condvar_wait_timeout</code>. More precisely, the thread should
394
      <code>condvar_wait_timeout()</code>. More precisely, the thread should
395
      sleep in the condvar's wait queue prior to releasing the mutex, but it
395
      sleep in the condvar's wait queue prior to releasing the mutex, but it
396
      must not hold the mutex when it is sleeping.</para>
396
      must not hold the mutex when it is sleeping.</para>
397
 
397
 
398
      <para>Problems described in the two previous paragraphs are addressed in
398
      <para>Problems described in the two previous paragraphs are addressed in
399
      HelenOS by dividing the <code>waitq_sleep_timeout</code> function into
399
      HelenOS by dividing the <code>waitq_sleep_timeout()</code> function into
400
      three pieces:</para>
400
      three pieces:</para>
401
 
401
 
402
      <orderedlist>
402
      <orderedlist>
403
        <listitem>
403
        <listitem>
404
          <para><code>waitq_sleep_prepare</code> prepares the thread to go to
404
          <para><code>waitq_sleep_prepare()</code> prepares the thread to go to
405
          sleep by, among other things, locking the wait queue;</para>
405
          sleep by, among other things, locking the wait queue;</para>
406
        </listitem>
406
        </listitem>
407
 
407
 
408
        <listitem>
408
        <listitem>
409
          <para><code>waitq_sleep_timeout_unsafe</code> implements the core
409
          <para><code>waitq_sleep_timeout_unsafe()</code> implements the core
410
          blocking logic;</para>
410
          blocking logic;</para>
411
        </listitem>
411
        </listitem>
412
 
412
 
413
        <listitem>
413
        <listitem>
414
          <para><code>waitq_sleep_finish</code> performs cleanup after
414
          <para><code>waitq_sleep_finish()</code> performs cleanup after
415
          <code>waitq_sleep_timeout_unsafe</code>; after this call, the wait
415
          <code>waitq_sleep_timeout_unsafe()</code>; after this call, the wait
416
          queue spinlock is guaranteed to be unlocked by the caller</para>
416
          queue spinlock is guaranteed to be unlocked by the caller</para>
417
        </listitem>
417
        </listitem>
418
      </orderedlist>
418
      </orderedlist>
419
 
419
 
420
      <para>The stock <code>waitq_sleep_timeout</code> is then a mere wrapper
420
      <para>The stock <code>waitq_sleep_timeout()</code> is then a mere wrapper
421
      that calls these three functions. It is provided for convenience in
421
      that calls these three functions. It is provided for convenience in
422
      cases where the caller doesn't require such a low level control.
422
      cases where the caller doesn't require such a low level control.
423
      However, the implementation of <code>condvar_wait_timeout</code> does
423
      However, the implementation of <code>condvar_wait_timeout()</code> does
424
      need this finer-grained control because it has to interleave calls to
424
      need this finer-grained control because it has to interleave calls to
425
      these functions by other actions. It carries its operations out in the
425
      these functions by other actions. It carries its operations out in the
426
      following order:</para>
426
      following order:</para>
427
 
427
 
428
      <orderedlist>
428
      <orderedlist>
429
        <listitem>
429
        <listitem>
430
          <para>calls <code>waitq_sleep_prepare</code> in order to lock the
430
          <para>calls <code>waitq_sleep_prepare()</code> in order to lock the
431
          condition variable's wait queue,</para>
431
          condition variable's wait queue,</para>
432
        </listitem>
432
        </listitem>
433
 
433
 
434
        <listitem>
434
        <listitem>
435
          <para>releases the mutex,</para>
435
          <para>releases the mutex,</para>
Line 438... Line 438...
438
        <listitem>
438
        <listitem>
439
          <para>clears the counter of missed wakeups,</para>
439
          <para>clears the counter of missed wakeups,</para>
440
        </listitem>
440
        </listitem>
441
 
441
 
442
        <listitem>
442
        <listitem>
443
          <para>calls <code>waitq_sleep_timeout_unsafe</code>,</para>
443
          <para>calls <code>waitq_sleep_timeout_unsafe()</code>,</para>
444
        </listitem>
444
        </listitem>
445
 
445
 
446
        <listitem>
446
        <listitem>
447
          <para>retakes the mutex,</para>
447
          <para>retakes the mutex,</para>
448
        </listitem>
448
        </listitem>
449
 
449
 
450
        <listitem>
450
        <listitem>
451
          <para>calls <code>waitq_sleep_finish</code>.</para>
451
          <para>calls <code>waitq_sleep_finish()</code>.</para>
452
        </listitem>
452
        </listitem>
453
      </orderedlist>
453
      </orderedlist>
454
    </section>
454
    </section>
455
  </section>
455
  </section>
456
 
456
 
Line 508... Line 508...
508
      includes a wait queue. The wait queue is used to put threads that didn't
508
      includes a wait queue. The wait queue is used to put threads that didn't
509
      win the futex to sleep until the winner wakes one of them up.</para>
509
      win the futex to sleep until the winner wakes one of them up.</para>
510
 
510
 
511
      <para>A futex should be initialized by setting its userspace counter to
511
      <para>A futex should be initialized by setting its userspace counter to
512
      one before it is used. When locking the futex via userspace library
512
      one before it is used. When locking the futex via userspace library
513
      function <code>futex_down_timeout</code>, the library code atomically
513
      function <code>futex_down_timeout()</code>, the library code atomically
514
      decrements the futex counter and tests if it dropped below zero. If it
514
      decrements the futex counter and tests if it dropped below zero. If it
515
      did, then the futex is locked by another thread and the library uses the
515
      did, then the futex is locked by another thread and the library uses the
516
      <constant>SYS_FUTEX_SLEEP</constant> syscall to put the thread asleep.
516
      <constant>SYS_FUTEX_SLEEP</constant> syscall to put the thread asleep.
517
      If the counter decreased to 0, then there was no contention and the
517
      If the counter decreased to 0, then there was no contention and the
518
      thread can enter the critical section protected by the futex. When the
518
      thread can enter the critical section protected by the futex. When the
519
      thread later leaves that critical section, it, using library function
519
      thread later leaves that critical section, it, using library function
520
      <code>futex_up</code>, atomically increments the counter. If the counter
520
      <code>futex_up()</code>, atomically increments the counter. If the counter
521
      value increased to one, then there again was no contention and no action
521
      value increased to one, then there again was no contention and no action
522
      needs to be done. However, if it increased to zero or even a smaller
522
      needs to be done. However, if it increased to zero or even a smaller
523
      number, then there are sleeping threads waiting for the futex to become
523
      number, then there are sleeping threads waiting for the futex to become
524
      available. In that case, the library has to use the
524
      available. In that case, the library has to use the
525
      <constant>SYS_FUTEX_WAKEUP</constant> syscall to wake one sleeping
525
      <constant>SYS_FUTEX_WAKEUP</constant> syscall to wake one sleeping
Line 530... Line 530...
530
      futexes is that they don't need to be registered anywhere prior to the
530
      futexes is that they don't need to be registered anywhere prior to the
531
      first kernel intervention.</para>
531
      first kernel intervention.</para>
532
 
532
 
533
      <para>Both futex related syscalls, <constant>SYS_FUTEX_SLEEP</constant>
533
      <para>Both futex related syscalls, <constant>SYS_FUTEX_SLEEP</constant>
534
      and <constant>SYS_FUTEX_WAKEUP</constant>, respectivelly, are mere
534
      and <constant>SYS_FUTEX_WAKEUP</constant>, respectivelly, are mere
535
      wrappers for <code>waitq_sleep_timeout</code> and
535
      wrappers for <code>waitq_sleep_timeout()</code> and
536
      <code>waitq_sleep_wakeup</code>, respectively, with some housekeeping
536
      <code>waitq_sleep_wakeup()</code>, respectively, with some housekeeping
537
      functionality added. Both syscalls need to translate the userspace
537
      functionality added. Both syscalls need to translate the userspace
538
      virtual address of the futex counter to physical address in order to
538
      virtual address of the futex counter to physical address in order to
539
      support synchronization accross shared memory. Once the physical address
539
      support synchronization accross shared memory. Once the physical address
540
      is known, the kernel checks whether the futexes are already known to it
540
      is known, the kernel checks whether the futexes are already known to it
541
      by searching the global futex hash table for an item with the physical
541
      by searching the global futex hash table for an item with the physical