Subversion Repositories HelenOS-doc

Rev

Rev 45 | Rev 57 | Go to most recent revision | Only display areas with differences | Regard whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 45 Rev 48
1
<?xml version="1.0" encoding="UTF-8"?>
1
<?xml version="1.0" encoding="UTF-8"?>
2
<chapter id="sync">
2
<chapter id="sync">
3
  <?dbhtml filename="sync.html"?>
3
  <?dbhtml filename="sync.html"?>
4
 
4
 
5
  <title>Mutual exclusion and synchronization</title>
5
  <title>Mutual exclusion and synchronization</title>
6
 
6
 
7
  <section>
7
  <section>
8
    <title>Introduction</title>
8
    <title>Introduction</title>
9
 
9
 
10
    <para>The HelenOS operating system is designed to make use of the
10
    <para>The HelenOS operating system is designed to make use of the
11
    parallelism offered by the hardware and to exploit concurrency of both the
11
    parallelism offered by the hardware and to exploit concurrency of both the
12
    kernel and userspace tasks. This is achieved through multiprocessor
12
    kernel and userspace tasks. This is achieved through multiprocessor
13
    support and several levels of multiprogramming such as multitasking,
13
    support and several levels of multiprogramming such as multitasking,
14
    multithreading and also through userspace pseudo threads. However, such a
14
    multithreading and also through userspace pseudo threads. However, such a
15
    highly concurrent environment needs safe and efficient ways to handle
15
    highly concurrent environment needs safe and efficient ways to handle
16
    mutual exclusion and synchronization of many execution flows.</para>
16
    mutual exclusion and synchronization of many execution flows.</para>
17
  </section>
17
  </section>
18
 
18
 
19
  <section>
19
  <section>
20
    <title>Active kernel primitives</title>
20
    <title>Active kernel primitives</title>
21
 
21
 
22
    <section>
22
    <section>
23
      <title>Spinlocks</title>
23
      <title>Spinlocks</title>
24
 
24
 
25
      <para>The basic mutual exclusion primitive is the spinlock. The spinlock
25
      <para>The basic mutual exclusion primitive is the spinlock. The spinlock
26
      implements active waiting for the availability of a memory lock (i.e.
26
      implements active waiting for the availability of a memory lock (i.e.
27
      simple variable) in a multiprocessor-safe manner. This safety is
27
      simple variable) in a multiprocessor-safe manner. This safety is
28
      achieved through the use of a specialized, architecture-dependent,
28
      achieved through the use of a specialized, architecture-dependent,
29
      atomic test-and-set operation which either locks the spinlock (i.e. sets
29
      atomic test-and-set operation which either locks the spinlock (i.e. sets
30
      the variable) or, provided that it is already locked, leaves it
30
      the variable) or, provided that it is already locked, leaves it
31
      unaltered. In any case, the test-and-set operation returns a value, thus
31
      unaltered. In any case, the test-and-set operation returns a value, thus
32
      signalling either success (i.e. zero return value) or failure (i.e.
32
      signalling either success (i.e. zero return value) or failure (i.e.
33
      non-zero value) in acquiring the lock. Note that this makes a
33
      non-zero value) in acquiring the lock. Note that this makes a
34
      fundamental difference between the naive algorithm that doesn't use the
34
      fundamental difference between the naive algorithm that doesn't use the
35
      atomic operation and the spinlock algortihm. While the naive algorithm
35
      atomic operation and the spinlock algortihm. While the naive algorithm
36
      is prone to race conditions on SMP configurations and thus is completely
36
      is prone to race conditions on SMP configurations and thus is completely
37
      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
37
      SMP-unsafe, the spinlock algorithm eliminates the possibility of race
38
      conditions and is suitable for mutual exclusion use.</para>
38
      conditions and is suitable for mutual exclusion use.</para>
39
 
39
 
40
      <para>The semantics of the test-and-set operation is that the spinlock
40
      <para>The semantics of the test-and-set operation is that the spinlock
41
      remains unavailable until this operation called on the respective
41
      remains unavailable until this operation called on the respective
42
      spinlock returns zero. HelenOS builds two functions on top of the
42
      spinlock returns zero. HelenOS builds two functions on top of the
43
      test-and-set operation. The first function is the unconditional attempt
43
      test-and-set operation. The first function is the unconditional attempt
44
      to acquire the spinlock and is called
44
      to acquire the spinlock and is called
45
      <emphasis>spinlock_lock</emphasis>. It simply loops until the
45
      <emphasis>spinlock_lock</emphasis>. It simply loops until the
46
      test-and-set returns a zero value. The other function,
46
      test-and-set returns a zero value. The other function,
47
      <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
47
      <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
48
      and calls the test-and-set only once to find out whether it managed to
48
      and calls the test-and-set only once to find out whether it managed to
49
      acquire the spinlock or not. The conditional operation is useful in
49
      acquire the spinlock or not. The conditional operation is useful in
50
      situations in which an algorithm cannot acquire more spinlocks in the
50
      situations in which an algorithm cannot acquire more spinlocks in the
51
      proper order and a deadlock cannot be avoided. In such a case, the
51
      proper order and a deadlock cannot be avoided. In such a case, the
52
      algorithm would detect the danger and instead of possibly deadlocking
52
      algorithm would detect the danger and instead of possibly deadlocking
53
      the system it would simply release some spinlocks it already holds and
53
      the system it would simply release some spinlocks it already holds and
54
      retry the whole operation with the hope that it will succeed next time.
54
      retry the whole operation with the hope that it will succeed next time.
55
      The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy
55
      The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy
56
      - it merely clears the spinlock variable.</para>
56
      - it merely clears the spinlock variable.</para>
57
 
57
 
58
      <para>Nevertheless, there is a special issue related to hardware
58
      <para>Nevertheless, there is a special issue related to hardware
59
      optimizations that modern processors implement. Particularly problematic
59
      optimizations that modern processors implement. Particularly problematic
60
      is the out-of-order execution of instructions within the critical
60
      is the out-of-order execution of instructions within the critical
61
      section protected by a spinlock. The processors are always
61
      section protected by a spinlock. The processors are always
62
      self-consistent so that they can carry out speculatively executed
62
      self-consistent so that they can carry out speculatively executed
63
      instructions in the right order with regard to dependencies among those
63
      instructions in the right order with regard to dependencies among those
64
      instructions. However, the dependency between instructions inside the
64
      instructions. However, the dependency between instructions inside the
65
      critical section and those that implement locking and unlocking of the
65
      critical section and those that implement locking and unlocking of the
66
      respective spinlock is not implicit on some processor architectures. As
66
      respective spinlock is not implicit on some processor architectures. As
67
      a result, the processor needs to be explicitly told about each
67
      a result, the processor needs to be explicitly told about each
68
      occurrence of such a dependency. Therefore, HelenOS adds
68
      occurrence of such a dependency. Therefore, HelenOS adds
69
      architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>,
69
      architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>,
70
      <emphasis>spinlock_trylock</emphasis> and
70
      <emphasis>spinlock_trylock</emphasis> and
71
      <emphasis>spinlock_unlock</emphasis> functions to prevent the
71
      <emphasis>spinlock_unlock</emphasis> functions to prevent the
72
      instructions inside the critical section from permeating out. On some
72
      instructions inside the critical section from permeating out. On some
73
      architectures, these hooks can be void because the dependencies are
73
      architectures, these hooks can be void because the dependencies are
74
      implicitly there because of the special properties of locking and
74
      implicitly there because of the special properties of locking and
75
      unlocking instructions. However, other architectures need to instrument
75
      unlocking instructions. However, other architectures need to instrument
76
      these hooks with different memory barriers, depending on what operations
76
      these hooks with different memory barriers, depending on what operations
77
      could permeate out.</para>
77
      could permeate out.</para>
78
 
78
 
79
      <para>Spinlocks have one significant drawback: when held for longer time
79
      <para>Spinlocks have one significant drawback: when held for longer time
80
      periods, they harm both parallelism and concurrency. The processor
80
      periods, they harm both parallelism and concurrency. The processor
81
      executing <emphasis>spinlock_lock</emphasis> does not do any fruitful
81
      executing <emphasis>spinlock_lock</emphasis> does not do any fruitful
82
      work and is effectively halted until it can grab the lock and proceed.
82
      work and is effectively halted until it can grab the lock and proceed.
83
      Similarily, other execution flows cannot execute on the processor that
83
      Similarily, other execution flows cannot execute on the processor that
84
      holds the spinlock because the kernel disables preemption on that
84
      holds the spinlock because the kernel disables preemption on that
85
      processor when a spinlock is held. The reason behind disabling
85
      processor when a spinlock is held. The reason behind disabling
86
      preemption is priority inversion problem avoidance. For the same reason,
86
      preemption is priority inversion problem avoidance. For the same reason,
87
      threads are strongly discouraged from sleeping when they hold a
87
      threads are strongly discouraged from sleeping when they hold a
88
      spinlock.</para>
88
      spinlock.</para>
89
 
89
 
90
      <para>To summarize, spinlocks represent very simple and essential mutual
90
      <para>To summarize, spinlocks represent very simple and essential mutual
91
      exclusion primitive for SMP systems. On the other hand, spinlocks scale
91
      exclusion primitive for SMP systems. On the other hand, spinlocks scale
92
      poorly because of the active loop they are based on. Therefore,
92
      poorly because of the active loop they are based on. Therefore,
93
      spinlocks are used in HelenOS only for short-time mutual exclusion and
93
      spinlocks are used in HelenOS only for short-time mutual exclusion and
94
      in cases where the mutual exclusion is required out of thread context.
94
      in cases where the mutual exclusion is required out of thread context.
95
      Lastly, spinlocks are used in the construction of passive
95
      Lastly, spinlocks are used in the construction of passive
96
      synchronization primitives.</para>
96
      synchronization primitives.</para>
97
    </section>
97
    </section>
98
  </section>
98
  </section>
99
 
99
 
100
  <section>
100
  <section>
101
    <title>Passive kernel synchronization</title>
101
    <title>Passive kernel synchronization</title>
102
 
102
 
103
    <section>
103
    <section>
104
      <title>Wait queues</title>
104
      <title>Wait queues</title>
105
 
105
 
106
      <para>A wait queue is the basic passive synchronization primitive on
106
      <para>A wait queue is the basic passive synchronization primitive on
107
      which all other passive synchronization primitives are built. Simply
107
      which all other passive synchronization primitives are built. Simply
108
      put, it allows a thread to sleep until an event associated with the
108
      put, it allows a thread to sleep until an event associated with the
109
      particular wait queue occurs. Multiple threads are notified about
109
      particular wait queue occurs. Multiple threads are notified about
110
      incoming events in a first come, first served fashion. Moreover, should
110
      incoming events in a first come, first served fashion. Moreover, should
111
      the event come before any thread waits for it, it is recorded in the
111
      the event come before any thread waits for it, it is recorded in the
112
      wait queue as a missed wakeup and later forwarded to the first thread
112
      wait queue as a missed wakeup and later forwarded to the first thread
113
      that decides to wait in the queue. The inner structures of the wait
113
      that decides to wait in the queue. The inner structures of the wait
114
      queue are protected by a spinlock.</para>
114
      queue are protected by a spinlock.</para>
115
 
115
 
116
      <para>The thread that wants to wait for a wait queue event uses the
116
      <para>The thread that wants to wait for a wait queue event uses the
117
      <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
117
      <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
118
      checks the wait queue's counter of missed wakeups and if there are any
118
      checks the wait queue's counter of missed wakeups and if there are any
119
      missed wakeups, the call returns immediately. The call also returns
119
      missed wakeups, the call returns immediately. The call also returns
120
      immediately if only a conditional wait was requested. Otherwise the
120
      immediately if only a conditional wait was requested. Otherwise the
121
      thread is enqueued in the wait queue's list of sleeping threads and its
121
      thread is enqueued in the wait queue's list of sleeping threads and its
122
      state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
122
      state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
123
      one of the following events happens:</para>
123
      one of the following events happens:</para>
124
 
124
 
125
      <orderedlist>
125
      <orderedlist>
126
        <listitem>
126
        <listitem>
127
          <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
127
          <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
128
          thread is the first thread in the wait queue's list of sleeping
128
          thread is the first thread in the wait queue's list of sleeping
129
          threads;</para>
129
          threads;</para>
130
        </listitem>
130
        </listitem>
131
 
131
 
132
        <listitem>
132
        <listitem>
133
          <para>another thread calls
133
          <para>another thread calls
134
          <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
134
          <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
135
          thread;</para>
135
          thread;</para>
136
        </listitem>
136
        </listitem>
137
 
137
 
138
        <listitem>
138
        <listitem>
139
          <para>the sleep times out provided that none of the previous
139
          <para>the sleep times out provided that none of the previous
140
          occurred within a specified time limit; the limit can be
140
          occurred within a specified time limit; the limit can be
141
          infinity.</para>
141
          infinity.</para>
142
        </listitem>
142
        </listitem>
143
      </orderedlist>
143
      </orderedlist>
144
 
144
 
145
      <para>All five possibilities (immediate return on success, immediate
145
      <para>All five possibilities (immediate return on success, immediate
146
      return on failure, wakeup after sleep, interruption and timeout) are
146
      return on failure, wakeup after sleep, interruption and timeout) are
147
      distinguishable by the return value of
147
      distinguishable by the return value of
148
      <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a
148
      <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a
149
      sleeping thread is essential for externally initiated thread
149
      sleeping thread is essential for externally initiated thread
150
      termination. The ability to wait only for a certain amount of time is
150
      termination. The ability to wait only for a certain amount of time is
151
      used, for instance, to passively delay thread execution by several
151
      used, for instance, to passively delay thread execution by several
152
      microseconds or even seconds in <emphasis>thread_sleep</emphasis>
152
      microseconds or even seconds in <emphasis>thread_sleep</emphasis>
153
      function. Due to the fact that all other passive kernel synchronization
153
      function. Due to the fact that all other passive kernel synchronization
154
      primitives are based on wait queues, they also have the option of being
154
      primitives are based on wait queues, they also have the option of being
155
      interrutped and, more importantly, can timeout. All of them also
155
      interrutped and, more importantly, can timeout. All of them also
156
      implement the conditional operation. Furthemore, this very fundamental
156
      implement the conditional operation. Furthemore, this very fundamental
157
      interface reaches up to the implementation of futexes - userspace
157
      interface reaches up to the implementation of futexes - userspace
158
      synchronization primitive, which makes it possible for a userspace
158
      synchronization primitive, which makes it possible for a userspace
159
      thread to request a synchronization operation with a timeout or a
159
      thread to request a synchronization operation with a timeout or a
160
      conditional operation.</para>
160
      conditional operation.</para>
161
 
161
 
162
      <para>From the description above, it should be apparent, that when a
162
      <para>From the description above, it should be apparent, that when a
163
      sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
163
      sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
164
      <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the
164
      <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the
165
      thread can be sure that the event has occurred. The thread need not and
165
      thread can be sure that the event has occurred. The thread need not and
166
      should not verify this fact. This approach is called direct hand-off and
166
      should not verify this fact. This approach is called direct hand-off and
167
      is characteristic for all passive HelenOS synchronization primitives,
167
      is characteristic for all passive HelenOS synchronization primitives,
168
      with the exception as described below.</para>
168
      with the exception as described below.</para>
169
    </section>
169
    </section>
170
 
170
 
171
    <section>
171
    <section>
172
      <title>Semaphores</title>
172
      <title>Semaphores</title>
173
 
173
 
174
      <para>The interesting point about wait queues is that the number of
174
      <para>The interesting point about wait queues is that the number of
175
      missed wakeups is equal to the number of threads that will not block in
175
      missed wakeups is equal to the number of threads that will not block in
176
      <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
176
      <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
177
      instead. On the other hand, semaphores are synchronization primitives
177
      instead. On the other hand, semaphores are synchronization primitives
178
      that will let predefined amount of threads into its critical section and
178
      that will let predefined amount of threads into their critical section
179
      block any other threads above this count. However, these two cases are
179
      and block any other threads above this count. However, these two cases
180
      exactly the same. Semaphores in HelenOS are therefore implemented as
180
      are exactly the same. Semaphores in HelenOS are therefore implemented as
181
      wait queues with a single semantic change: their wait queue is
181
      wait queues with a single semantic change: their wait queue is
182
      initialized to have so many missed wakeups as is the number of threads
182
      initialized to have so many missed wakeups as is the number of threads
183
      that the semphore intends to let into its critical section
183
      that the semphore intends to let into its critical section
184
      simultaneously.</para>
184
      simultaneously.</para>
185
 
185
 
186
      <para>In the semaphore language, the wait queue operation
186
      <para>In the semaphore language, the wait queue operation
187
      <emphasis>waitq_sleep_timeout</emphasis> corresponds to
187
      <emphasis>waitq_sleep_timeout</emphasis> corresponds to
188
      <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
188
      <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
189
      represented by the function <emphasis>semaphore_down_timeout</emphasis>
189
      represented by the function <emphasis>semaphore_down_timeout</emphasis>
190
      and by way of similitude the wait queue operation waitq_wakeup
190
      and by way of similitude the wait queue operation waitq_wakeup
191
      corresponds to semaphore <emphasis>up</emphasis> operation, represented
191
      corresponds to semaphore <emphasis>up</emphasis> operation, represented
192
      by the function <emphasis>sempafore_up</emphasis>. The conditional down
192
      by the function <emphasis>sempafore_up</emphasis>. The conditional down
193
      operation is called <emphasis>semaphore_trydown</emphasis>.</para>
193
      operation is called <emphasis>semaphore_trydown</emphasis>.</para>
194
    </section>
194
    </section>
195
 
195
 
196
    <section>
196
    <section>
197
      <title>Mutexes</title>
197
      <title>Mutexes</title>
198
 
198
 
199
      <para>Mutexes are sometimes referred to as binary sempahores. That means
199
      <para>Mutexes are sometimes referred to as binary sempahores. That means
200
      that mutexes are like semaphores that allow only one thread in its
200
      that mutexes are like semaphores that allow only one thread in its
201
      critical section. Indeed, mutexes in HelenOS are implemented exactly in
201
      critical section. Indeed, mutexes in HelenOS are implemented exactly in
202
      this way: they are built on top of semaphores. From another point of
202
      this way: they are built on top of semaphores. From another point of
203
      view, they can be viewed as spinlocks without busy waiting. Their
203
      view, they can be viewed as spinlocks without busy waiting. Their
204
      semaphore heritage provides good basics for both conditional operation
204
      semaphore heritage provides good basics for both conditional operation
205
      and operation with timeout. The locking operation is called
205
      and operation with timeout. The locking operation is called
206
      <emphasis>mutex_lock</emphasis>, the conditional locking operation is
206
      <emphasis>mutex_lock</emphasis>, the conditional locking operation is
207
      called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
207
      called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
208
      called <emphasis>mutex_unlock</emphasis>.</para>
208
      called <emphasis>mutex_unlock</emphasis>.</para>
209
    </section>
209
    </section>
210
 
210
 
211
    <section>
211
    <section>
212
      <title>Reader/writer locks</title>
212
      <title>Reader/writer locks</title>
213
 
213
 
214
      <para>Reader/writer locks, or rwlocks, are by far the most complicated
214
      <para>Reader/writer locks, or rwlocks, are by far the most complicated
215
      synchronization primitive within the kernel. The goal of these locks is
215
      synchronization primitive within the kernel. The goal of these locks is
216
      to improve concurrency of applications, in which threads need to
216
      to improve concurrency of applications, in which threads need to
217
      synchronize access to a shared resource, and that access can be
217
      synchronize access to a shared resource, and that access can be
218
      partitioned into a read-only mode and a write mode. Reader/writer locks
218
      partitioned into a read-only mode and a write mode. Reader/writer locks
219
      should make it possible for several, possibly many, readers to enter the
219
      should make it possible for several, possibly many, readers to enter the
220
      critical section, provided that no writer is currently in the critical
220
      critical section, provided that no writer is currently in the critical
221
      section, or to be in the critical section contemporarily. Writers are
221
      section, or to be in the critical section contemporarily. Writers are
222
      allowed to enter the critical section only individually, provided that
222
      allowed to enter the critical section only individually, provided that
223
      no reader is in the critical section already. Applications, in which the
223
      no reader is in the critical section already. Applications, in which the
224
      majority of operations can be done in the read-only mode, can benefit
224
      majority of operations can be done in the read-only mode, can benefit
225
      from increased concurrency created by reader/writer locks.</para>
225
      from increased concurrency created by reader/writer locks.</para>
226
 
226
 
227
      <para>During reader/writer lock construction, a decision should be made
227
      <para>During reader/writer lock construction, a decision should be made
228
      whether readers will be prefered over writers or whether writers will be
228
      whether readers will be prefered over writers or whether writers will be
229
      prefered over readers in cases when the lock is not currently held and
229
      prefered over readers in cases when the lock is not currently held and
230
      both a reader and a writer want to gain the lock. Some operating systems
230
      both a reader and a writer want to gain the lock. Some operating systems
231
      prefer one group over the other, creating thus a possibility for
231
      prefer one group over the other, creating thus a possibility for
232
      starving the unprefered group. In the HelenOS operating system, none of
232
      starving the unprefered group. In the HelenOS operating system, none of
233
      the two groups is prefered. The lock is granted on a first come, first
233
      the two groups is prefered. The lock is granted on a first come, first
234
      served basis with the additional note that readers are granted the lock
234
      served basis with the additional note that readers are granted the lock
235
      in the biggest possible batch.</para>
235
      in the biggest possible batch.</para>
236
 
236
 
237
      <para>With this policy and the timeout modes of operation, the direct
237
      <para>With this policy and the timeout modes of operation, the direct
238
      hand-off becomes much more complicated. For instance, a writer leaving
238
      hand-off becomes much more complicated. For instance, a writer leaving
239
      the critical section must wake up all leading readers in the rwlock's
239
      the critical section must wake up all leading readers in the rwlock's
240
      wait queue or one leading writer or no-one if no thread is waiting.
240
      wait queue or one leading writer or no-one if no thread is waiting.
241
      Similarily, the last reader leaving the critical section must wakeup the
241
      Similarily, the last reader leaving the critical section must wakeup the
242
      sleeping writer if there are any sleeping threads left at all. As
242
      sleeping writer if there are any sleeping threads left at all. As
243
      another example, if a writer at the beginning of the rwlock's wait queue
243
      another example, if a writer at the beginning of the rwlock's wait queue
244
      times out and the lock is held by at least one reader, the writer which
244
      times out and the lock is held by at least one reader, the writer which
245
      has timed out must first wake up all readers that follow him in the
245
      has timed out must first wake up all readers that follow him in the
246
      queue prior to signalling the timeout itself and giving up.</para>
246
      queue prior to signalling the timeout itself and giving up.</para>
247
 
247
 
248
      <para>Due to the issues mentioned in the previous paragraph, the
248
      <para>Due to the issues mentioned in the previous paragraph, the
249
      reader/writer lock imlpementation needs to walk the rwlock wait queue's
249
      reader/writer lock imlpementation needs to walk the rwlock wait queue's
250
      list of sleeping threads directly, in order to find out the type of
250
      list of sleeping threads directly, in order to find out the type of
251
      access that the queueing threads demand. This makes the code difficult
251
      access that the queueing threads demand. This makes the code difficult
252
      to understand and dependent on the internal implementation of the wait
252
      to understand and dependent on the internal implementation of the wait
253
      queue. Nevertheless, it remains unclear to the authors of HelenOS
253
      queue. Nevertheless, it remains unclear to the authors of HelenOS
254
      whether a simpler but equivalently fair solution exists.</para>
254
      whether a simpler but equivalently fair solution exists.</para>
255
 
255
 
256
      <para>The implementation of rwlocks as it has been already put, makes
256
      <para>The implementation of rwlocks as it has been already put, makes
257
      use of one single wait queue for both readers and writers, thus avoiding
257
      use of one single wait queue for both readers and writers, thus avoiding
258
      any possibility of starvation. In fact, rwlocks use a mutex rather than
258
      any possibility of starvation. In fact, rwlocks use a mutex rather than
259
      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
259
      a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
260
      and is used to synchronize writers. The writer's lock operation,
260
      and is used to synchronize writers. The writer's lock operation,
261
      <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
261
      <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
262
      the exclusive mutex. If it succeeds, the writer is granted the rwlock.
262
      the exclusive mutex. If it succeeds, the writer is granted the rwlock.
263
      However, if the operation fails (e.g. times out), the writer must check
263
      However, if the operation fails (e.g. times out), the writer must check
264
      for potential readers at the head of the list of sleeping threads
264
      for potential readers at the head of the list of sleeping threads
265
      associated with the mutex's wait queue and then proceed according to the
265
      associated with the mutex's wait queue and then proceed according to the
266
      procedure outlined above.</para>
266
      procedure outlined above.</para>
267
 
267
 
268
      <para>The exclusive mutex plays an important role in reader
268
      <para>The exclusive mutex plays an important role in reader
269
      synchronization as well. However, a reader doing the reader's lock
269
      synchronization as well. However, a reader doing the reader's lock
270
      operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
270
      operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
271
      this mutex when it detects that:</para>
271
      this mutex when it detects that:</para>
272
 
272
 
273
      <orderedlist>
273
      <orderedlist>
274
        <listitem>
274
        <listitem>
275
          <para>there are other readers in the critical section and</para>
275
          <para>there are other readers in the critical section and</para>
276
        </listitem>
276
        </listitem>
277
 
277
 
278
        <listitem>
278
        <listitem>
279
          <para>there are no sleeping threads waiting for the exclusive
279
          <para>there are no sleeping threads waiting for the exclusive
280
          mutex.</para>
280
          mutex.</para>
281
        </listitem>
281
        </listitem>
282
      </orderedlist>
282
      </orderedlist>
283
 
283
 
284
      <para>If both conditions are true, the reader will bypass the mutex,
284
      <para>If both conditions are true, the reader will bypass the mutex,
285
      increment the number of readers in the critical section and then enter
285
      increment the number of readers in the critical section and then enter
286
      the critical section. Note that if there are any sleeping threads at the
286
      the critical section. Note that if there are any sleeping threads at the
287
      beginning of the wait queue, the first must be a writer. If the
287
      beginning of the wait queue, the first must be a writer. If the
288
      conditions are not fulfilled, the reader normally waits until the
288
      conditions are not fulfilled, the reader normally waits until the
289
      exclusive mutex is granted to it.</para>
289
      exclusive mutex is granted to it.</para>
290
    </section>
290
    </section>
291
 
291
 
292
    <section>
292
    <section>
293
      <title>Condition variables</title>
293
      <title>Condition variables</title>
294
 
294
 
-
 
295
      <para>Condition variables can be used for waiting until a condition
-
 
296
      becomes true. In this respect, they are similar to wait queues. But
-
 
297
      contrary to wait queues, condition variables have different semantics
-
 
298
      that allows events to be lost when there is no thread waiting for them.
-
 
299
      In order to support this, condition variables don't use direct hand-off
-
 
300
      and operate in a way similar to the example below. A thread waiting for
-
 
301
      the condition becoming true does the following:</para>
-
 
302
 
-
 
303
      <para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
-
 
304
while (!<varname>condition</varname>)
-
 
305
        <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
-
 
306
/* <remark>the condition is true, do something</remark> */
-
 
307
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
-
 
308
 
-
 
309
      <para>A thread that causes the condition become true signals this event
-
 
310
      like this:</para>
-
 
311
 
-
 
312
      <para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
-
 
313
<varname>condition</varname> = <constant>true</constant>;
-
 
314
<function>condvar_signal</function>(<varname>cv</varname>);  /* <remark>condvar_broadcast(cv);</remark> */
-
 
315
<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
-
 
316
 
-
 
317
      <para>The wait operation, <emphasis>condvar_wait_timeout</emphasis>,
-
 
318
      always puts the calling thread to sleep. The thread then sleeps until
-
 
319
      another thread invokes <emphasis>condvar_broadcast</emphasis> on the
-
 
320
      same condition variable or until it is woken up by
-
 
321
      <emphasis>condvar_signal</emphasis>. The
-
 
322
      <emphasis>condvar_signal</emphasis> operation unblocks the first thread
-
 
323
      blocking on the condition variable while the
-
 
324
      <emphasis>condvar_broadcast</emphasis> operation unblocks all threads
-
 
325
      blocking there. If there are no blocking threads, these two operations
-
 
326
      have no efect.</para>
-
 
327
 
-
 
328
      <para>Note that the threads must synchronize over a dedicated mutex. To
-
 
329
      prevent race condition between <emphasis>condvar_wait_timeout</emphasis>
-
 
330
      and <emphasis>condvar_signal</emphasis> or
-
 
331
      <emphasis>condvar_broadcast</emphasis>, the mutex is passed to
-
 
332
      <emphasis>condvar_wait_timeout</emphasis> which then atomically puts the
-
 
333
      calling thread asleep and unlocks the mutex. When the thread eventually
-
 
334
      wakes up, <emphasis>condvar_wait</emphasis> regains the mutex and
-
 
335
      returns.</para>
-
 
336
 
-
 
337
      <para>Also note, that there is no conditional operation for condition
-
 
338
      variables. Such an operation would make no sence since condition
-
 
339
      variables are defined to forget events for which there is no waiting
-
 
340
      thread and because <emphasis>condvar_wait</emphasis> must always go to
-
 
341
      sleep. The operation with timeout is supported as usually.</para>
-
 
342
 
-
 
343
      <para>In HelenOS, condition variables are based on wait queues. As it is
-
 
344
      already mentioned above, wait queues remember missed events while
-
 
345
      condition variables must not do so. This is reasoned by the fact that
-
 
346
      condition variables are designed for scenarios in which an event might
-
 
347
      occur very many times without being picked up by any waiting thread. On
-
 
348
      the other hand, wait queues would remember any event that had not been
-
 
349
      picked up by a call to <emphasis>waitq_sleep_timeout</emphasis>.
-
 
350
      Therefore, if wait queues were used directly and without any changes to
-
 
351
      implement condition variables, the missed_wakeup counter would hurt
-
 
352
      performance of the implementation: the <code>while</code> loop in
-
 
353
      <emphasis>condvar_wait_timeout</emphasis> would effectively do busy
-
 
354
      waiting until all missed wakeups were discarded.</para>
-
 
355
 
-
 
356
      <para>The requirement on the wait operation to atomically put the caller
-
 
357
      to sleep and release the mutex poses an interesting problem on
-
 
358
      <emphasis>condvar_wait_timeout</emphasis>. More precisely, the thread
-
 
359
      should sleep in the condvar's wait queue prior to releasing the mutex,
-
 
360
      but it must not hold the mutex when it is sleeping.</para>
-
 
361
 
-
 
362
      <para>Problems described in the two previous paragraphs are addressed in
-
 
363
      HelenOS by dividing the <emphasis>waitq_sleep_timeout</emphasis>
-
 
364
      function into three pieces:</para>
-
 
365
 
-
 
366
      <orderedlist>
-
 
367
        <listitem>
-
 
368
          <para><emphasis>waitq_sleep_prepare</emphasis> prepares the thread
-
 
369
          to go to sleep by, among other things, locking the wait
-
 
370
          queue;</para>
-
 
371
        </listitem>
-
 
372
 
-
 
373
        <listitem>
-
 
374
          <para><emphasis>waitq_sleep_timeout_unsafe</emphasis> implements the
-
 
375
          core blocking logic;</para>
-
 
376
        </listitem>
-
 
377
 
-
 
378
        <listitem>
-
 
379
          <para><emphasis>waitq_sleep_finish</emphasis> performs cleanup after
-
 
380
          <emphasis>waitq_sleep_timeout_unsafe</emphasis>; after this call,
-
 
381
          the wait queue spinlock is guaranteed to be unlocked by the
-
 
382
          caller</para>
-
 
383
        </listitem>
-
 
384
      </orderedlist>
-
 
385
 
-
 
386
      <para>The stock <emphasis>waitq_sleep_timeout</emphasis> is then a mere
-
 
387
      wrapper that calls these three functions. It is provided for convenience
-
 
388
      in cases where the caller doesn't require such a low level control.
-
 
389
      However, the implementation of <emphasis>condvar_wait_timeout</emphasis>
-
 
390
      does need this finer-grained control because it has to interleave calls
-
 
391
      to these functions by other actions. It carries its operations out in
-
 
392
      the following order:</para>
-
 
393
 
-
 
394
      <orderedlist>
-
 
395
        <listitem>
-
 
396
          <para>calls <emphasis>waitq_sleep_prepare</emphasis> in order to
-
 
397
          lock the condition variable's wait queue,</para>
-
 
398
        </listitem>
-
 
399
 
-
 
400
        <listitem>
295
      <para>Condvars explanation</para>
401
          <para>releases the mutex,</para>
-
 
402
        </listitem>
-
 
403
 
-
 
404
        <listitem>
-
 
405
          <para>clears the counter of missed wakeups,</para>
-
 
406
        </listitem>
-
 
407
 
-
 
408
        <listitem>
-
 
409
          <para>calls <emphasis>waitq_sleep_timeout_unsafe</emphasis>,</para>
-
 
410
        </listitem>
-
 
411
 
-
 
412
        <listitem>
-
 
413
          <para>retakes the mutex,</para>
-
 
414
        </listitem>
-
 
415
 
-
 
416
        <listitem>
-
 
417
          <para>calls <emphasis>waitq_sleep_finish</emphasis>.</para>
-
 
418
        </listitem>
-
 
419
      </orderedlist>
296
    </section>
420
    </section>
297
  </section>
421
  </section>
298
 
422
 
299
  <section>
423
  <section>
300
    <title>Userspace synchronization</title>
424
    <title>Userspace synchronization</title>
301
 
425
 
302
    <section>
426
    <section>
303
      <title>Futexes</title>
427
      <title>Futexes</title>
304
 
428
 
305
      <para></para>
429
      <para></para>
306
    </section>
430
    </section>
307
  </section>
431
  </section>
308
</chapter>
432
</chapter>