Rev 45 | Rev 57 | Go to most recent revision | Only display areas with differences | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 45 | Rev 48 | ||
---|---|---|---|
1 | <?xml version="1.0" encoding="UTF-8"?> |
1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <chapter id="sync"> |
2 | <chapter id="sync"> |
3 | <?dbhtml filename="sync.html"?> |
3 | <?dbhtml filename="sync.html"?> |
4 | 4 | ||
5 | <title>Mutual exclusion and synchronization</title> |
5 | <title>Mutual exclusion and synchronization</title> |
6 | 6 | ||
7 | <section> |
7 | <section> |
8 | <title>Introduction</title> |
8 | <title>Introduction</title> |
9 | 9 | ||
10 | <para>The HelenOS operating system is designed to make use of the |
10 | <para>The HelenOS operating system is designed to make use of the |
11 | parallelism offered by the hardware and to exploit concurrency of both the |
11 | parallelism offered by the hardware and to exploit concurrency of both the |
12 | kernel and userspace tasks. This is achieved through multiprocessor |
12 | kernel and userspace tasks. This is achieved through multiprocessor |
13 | support and several levels of multiprogramming such as multitasking, |
13 | support and several levels of multiprogramming such as multitasking, |
14 | multithreading and also through userspace pseudo threads. However, such a |
14 | multithreading and also through userspace pseudo threads. However, such a |
15 | highly concurrent environment needs safe and efficient ways to handle |
15 | highly concurrent environment needs safe and efficient ways to handle |
16 | mutual exclusion and synchronization of many execution flows.</para> |
16 | mutual exclusion and synchronization of many execution flows.</para> |
17 | </section> |
17 | </section> |
18 | 18 | ||
19 | <section> |
19 | <section> |
20 | <title>Active kernel primitives</title> |
20 | <title>Active kernel primitives</title> |
21 | 21 | ||
22 | <section> |
22 | <section> |
23 | <title>Spinlocks</title> |
23 | <title>Spinlocks</title> |
24 | 24 | ||
25 | <para>The basic mutual exclusion primitive is the spinlock. The spinlock |
25 | <para>The basic mutual exclusion primitive is the spinlock. The spinlock |
26 | implements active waiting for the availability of a memory lock (i.e. |
26 | implements active waiting for the availability of a memory lock (i.e. |
27 | simple variable) in a multiprocessor-safe manner. This safety is |
27 | simple variable) in a multiprocessor-safe manner. This safety is |
28 | achieved through the use of a specialized, architecture-dependent, |
28 | achieved through the use of a specialized, architecture-dependent, |
29 | atomic test-and-set operation which either locks the spinlock (i.e. sets |
29 | atomic test-and-set operation which either locks the spinlock (i.e. sets |
30 | the variable) or, provided that it is already locked, leaves it |
30 | the variable) or, provided that it is already locked, leaves it |
31 | unaltered. In any case, the test-and-set operation returns a value, thus |
31 | unaltered. In any case, the test-and-set operation returns a value, thus |
32 | signalling either success (i.e. zero return value) or failure (i.e. |
32 | signalling either success (i.e. zero return value) or failure (i.e. |
33 | non-zero value) in acquiring the lock. Note that this makes a |
33 | non-zero value) in acquiring the lock. Note that this makes a |
34 | fundamental difference between the naive algorithm that doesn't use the |
34 | fundamental difference between the naive algorithm that doesn't use the |
35 | atomic operation and the spinlock algortihm. While the naive algorithm |
35 | atomic operation and the spinlock algortihm. While the naive algorithm |
36 | is prone to race conditions on SMP configurations and thus is completely |
36 | is prone to race conditions on SMP configurations and thus is completely |
37 | SMP-unsafe, the spinlock algorithm eliminates the possibility of race |
37 | SMP-unsafe, the spinlock algorithm eliminates the possibility of race |
38 | conditions and is suitable for mutual exclusion use.</para> |
38 | conditions and is suitable for mutual exclusion use.</para> |
39 | 39 | ||
40 | <para>The semantics of the test-and-set operation is that the spinlock |
40 | <para>The semantics of the test-and-set operation is that the spinlock |
41 | remains unavailable until this operation called on the respective |
41 | remains unavailable until this operation called on the respective |
42 | spinlock returns zero. HelenOS builds two functions on top of the |
42 | spinlock returns zero. HelenOS builds two functions on top of the |
43 | test-and-set operation. The first function is the unconditional attempt |
43 | test-and-set operation. The first function is the unconditional attempt |
44 | to acquire the spinlock and is called |
44 | to acquire the spinlock and is called |
45 | <emphasis>spinlock_lock</emphasis>. It simply loops until the |
45 | <emphasis>spinlock_lock</emphasis>. It simply loops until the |
46 | test-and-set returns a zero value. The other function, |
46 | test-and-set returns a zero value. The other function, |
47 | <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation |
47 | <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation |
48 | and calls the test-and-set only once to find out whether it managed to |
48 | and calls the test-and-set only once to find out whether it managed to |
49 | acquire the spinlock or not. The conditional operation is useful in |
49 | acquire the spinlock or not. The conditional operation is useful in |
50 | situations in which an algorithm cannot acquire more spinlocks in the |
50 | situations in which an algorithm cannot acquire more spinlocks in the |
51 | proper order and a deadlock cannot be avoided. In such a case, the |
51 | proper order and a deadlock cannot be avoided. In such a case, the |
52 | algorithm would detect the danger and instead of possibly deadlocking |
52 | algorithm would detect the danger and instead of possibly deadlocking |
53 | the system it would simply release some spinlocks it already holds and |
53 | the system it would simply release some spinlocks it already holds and |
54 | retry the whole operation with the hope that it will succeed next time. |
54 | retry the whole operation with the hope that it will succeed next time. |
55 | The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy |
55 | The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy |
56 | - it merely clears the spinlock variable.</para> |
56 | - it merely clears the spinlock variable.</para> |
57 | 57 | ||
58 | <para>Nevertheless, there is a special issue related to hardware |
58 | <para>Nevertheless, there is a special issue related to hardware |
59 | optimizations that modern processors implement. Particularly problematic |
59 | optimizations that modern processors implement. Particularly problematic |
60 | is the out-of-order execution of instructions within the critical |
60 | is the out-of-order execution of instructions within the critical |
61 | section protected by a spinlock. The processors are always |
61 | section protected by a spinlock. The processors are always |
62 | self-consistent so that they can carry out speculatively executed |
62 | self-consistent so that they can carry out speculatively executed |
63 | instructions in the right order with regard to dependencies among those |
63 | instructions in the right order with regard to dependencies among those |
64 | instructions. However, the dependency between instructions inside the |
64 | instructions. However, the dependency between instructions inside the |
65 | critical section and those that implement locking and unlocking of the |
65 | critical section and those that implement locking and unlocking of the |
66 | respective spinlock is not implicit on some processor architectures. As |
66 | respective spinlock is not implicit on some processor architectures. As |
67 | a result, the processor needs to be explicitly told about each |
67 | a result, the processor needs to be explicitly told about each |
68 | occurrence of such a dependency. Therefore, HelenOS adds |
68 | occurrence of such a dependency. Therefore, HelenOS adds |
69 | architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>, |
69 | architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>, |
70 | <emphasis>spinlock_trylock</emphasis> and |
70 | <emphasis>spinlock_trylock</emphasis> and |
71 | <emphasis>spinlock_unlock</emphasis> functions to prevent the |
71 | <emphasis>spinlock_unlock</emphasis> functions to prevent the |
72 | instructions inside the critical section from permeating out. On some |
72 | instructions inside the critical section from permeating out. On some |
73 | architectures, these hooks can be void because the dependencies are |
73 | architectures, these hooks can be void because the dependencies are |
74 | implicitly there because of the special properties of locking and |
74 | implicitly there because of the special properties of locking and |
75 | unlocking instructions. However, other architectures need to instrument |
75 | unlocking instructions. However, other architectures need to instrument |
76 | these hooks with different memory barriers, depending on what operations |
76 | these hooks with different memory barriers, depending on what operations |
77 | could permeate out.</para> |
77 | could permeate out.</para> |
78 | 78 | ||
79 | <para>Spinlocks have one significant drawback: when held for longer time |
79 | <para>Spinlocks have one significant drawback: when held for longer time |
80 | periods, they harm both parallelism and concurrency. The processor |
80 | periods, they harm both parallelism and concurrency. The processor |
81 | executing <emphasis>spinlock_lock</emphasis> does not do any fruitful |
81 | executing <emphasis>spinlock_lock</emphasis> does not do any fruitful |
82 | work and is effectively halted until it can grab the lock and proceed. |
82 | work and is effectively halted until it can grab the lock and proceed. |
83 | Similarily, other execution flows cannot execute on the processor that |
83 | Similarily, other execution flows cannot execute on the processor that |
84 | holds the spinlock because the kernel disables preemption on that |
84 | holds the spinlock because the kernel disables preemption on that |
85 | processor when a spinlock is held. The reason behind disabling |
85 | processor when a spinlock is held. The reason behind disabling |
86 | preemption is priority inversion problem avoidance. For the same reason, |
86 | preemption is priority inversion problem avoidance. For the same reason, |
87 | threads are strongly discouraged from sleeping when they hold a |
87 | threads are strongly discouraged from sleeping when they hold a |
88 | spinlock.</para> |
88 | spinlock.</para> |
89 | 89 | ||
90 | <para>To summarize, spinlocks represent very simple and essential mutual |
90 | <para>To summarize, spinlocks represent very simple and essential mutual |
91 | exclusion primitive for SMP systems. On the other hand, spinlocks scale |
91 | exclusion primitive for SMP systems. On the other hand, spinlocks scale |
92 | poorly because of the active loop they are based on. Therefore, |
92 | poorly because of the active loop they are based on. Therefore, |
93 | spinlocks are used in HelenOS only for short-time mutual exclusion and |
93 | spinlocks are used in HelenOS only for short-time mutual exclusion and |
94 | in cases where the mutual exclusion is required out of thread context. |
94 | in cases where the mutual exclusion is required out of thread context. |
95 | Lastly, spinlocks are used in the construction of passive |
95 | Lastly, spinlocks are used in the construction of passive |
96 | synchronization primitives.</para> |
96 | synchronization primitives.</para> |
97 | </section> |
97 | </section> |
98 | </section> |
98 | </section> |
99 | 99 | ||
100 | <section> |
100 | <section> |
101 | <title>Passive kernel synchronization</title> |
101 | <title>Passive kernel synchronization</title> |
102 | 102 | ||
103 | <section> |
103 | <section> |
104 | <title>Wait queues</title> |
104 | <title>Wait queues</title> |
105 | 105 | ||
106 | <para>A wait queue is the basic passive synchronization primitive on |
106 | <para>A wait queue is the basic passive synchronization primitive on |
107 | which all other passive synchronization primitives are built. Simply |
107 | which all other passive synchronization primitives are built. Simply |
108 | put, it allows a thread to sleep until an event associated with the |
108 | put, it allows a thread to sleep until an event associated with the |
109 | particular wait queue occurs. Multiple threads are notified about |
109 | particular wait queue occurs. Multiple threads are notified about |
110 | incoming events in a first come, first served fashion. Moreover, should |
110 | incoming events in a first come, first served fashion. Moreover, should |
111 | the event come before any thread waits for it, it is recorded in the |
111 | the event come before any thread waits for it, it is recorded in the |
112 | wait queue as a missed wakeup and later forwarded to the first thread |
112 | wait queue as a missed wakeup and later forwarded to the first thread |
113 | that decides to wait in the queue. The inner structures of the wait |
113 | that decides to wait in the queue. The inner structures of the wait |
114 | queue are protected by a spinlock.</para> |
114 | queue are protected by a spinlock.</para> |
115 | 115 | ||
116 | <para>The thread that wants to wait for a wait queue event uses the |
116 | <para>The thread that wants to wait for a wait queue event uses the |
117 | <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then |
117 | <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then |
118 | checks the wait queue's counter of missed wakeups and if there are any |
118 | checks the wait queue's counter of missed wakeups and if there are any |
119 | missed wakeups, the call returns immediately. The call also returns |
119 | missed wakeups, the call returns immediately. The call also returns |
120 | immediately if only a conditional wait was requested. Otherwise the |
120 | immediately if only a conditional wait was requested. Otherwise the |
121 | thread is enqueued in the wait queue's list of sleeping threads and its |
121 | thread is enqueued in the wait queue's list of sleeping threads and its |
122 | state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until |
122 | state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until |
123 | one of the following events happens:</para> |
123 | one of the following events happens:</para> |
124 | 124 | ||
125 | <orderedlist> |
125 | <orderedlist> |
126 | <listitem> |
126 | <listitem> |
127 | <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the |
127 | <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the |
128 | thread is the first thread in the wait queue's list of sleeping |
128 | thread is the first thread in the wait queue's list of sleeping |
129 | threads;</para> |
129 | threads;</para> |
130 | </listitem> |
130 | </listitem> |
131 | 131 | ||
132 | <listitem> |
132 | <listitem> |
133 | <para>another thread calls |
133 | <para>another thread calls |
134 | <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping |
134 | <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping |
135 | thread;</para> |
135 | thread;</para> |
136 | </listitem> |
136 | </listitem> |
137 | 137 | ||
138 | <listitem> |
138 | <listitem> |
139 | <para>the sleep times out provided that none of the previous |
139 | <para>the sleep times out provided that none of the previous |
140 | occurred within a specified time limit; the limit can be |
140 | occurred within a specified time limit; the limit can be |
141 | infinity.</para> |
141 | infinity.</para> |
142 | </listitem> |
142 | </listitem> |
143 | </orderedlist> |
143 | </orderedlist> |
144 | 144 | ||
145 | <para>All five possibilities (immediate return on success, immediate |
145 | <para>All five possibilities (immediate return on success, immediate |
146 | return on failure, wakeup after sleep, interruption and timeout) are |
146 | return on failure, wakeup after sleep, interruption and timeout) are |
147 | distinguishable by the return value of |
147 | distinguishable by the return value of |
148 | <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a |
148 | <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a |
149 | sleeping thread is essential for externally initiated thread |
149 | sleeping thread is essential for externally initiated thread |
150 | termination. The ability to wait only for a certain amount of time is |
150 | termination. The ability to wait only for a certain amount of time is |
151 | used, for instance, to passively delay thread execution by several |
151 | used, for instance, to passively delay thread execution by several |
152 | microseconds or even seconds in <emphasis>thread_sleep</emphasis> |
152 | microseconds or even seconds in <emphasis>thread_sleep</emphasis> |
153 | function. Due to the fact that all other passive kernel synchronization |
153 | function. Due to the fact that all other passive kernel synchronization |
154 | primitives are based on wait queues, they also have the option of being |
154 | primitives are based on wait queues, they also have the option of being |
155 | interrutped and, more importantly, can timeout. All of them also |
155 | interrutped and, more importantly, can timeout. All of them also |
156 | implement the conditional operation. Furthemore, this very fundamental |
156 | implement the conditional operation. Furthemore, this very fundamental |
157 | interface reaches up to the implementation of futexes - userspace |
157 | interface reaches up to the implementation of futexes - userspace |
158 | synchronization primitive, which makes it possible for a userspace |
158 | synchronization primitive, which makes it possible for a userspace |
159 | thread to request a synchronization operation with a timeout or a |
159 | thread to request a synchronization operation with a timeout or a |
160 | conditional operation.</para> |
160 | conditional operation.</para> |
161 | 161 | ||
162 | <para>From the description above, it should be apparent, that when a |
162 | <para>From the description above, it should be apparent, that when a |
163 | sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when |
163 | sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when |
164 | <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the |
164 | <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the |
165 | thread can be sure that the event has occurred. The thread need not and |
165 | thread can be sure that the event has occurred. The thread need not and |
166 | should not verify this fact. This approach is called direct hand-off and |
166 | should not verify this fact. This approach is called direct hand-off and |
167 | is characteristic for all passive HelenOS synchronization primitives, |
167 | is characteristic for all passive HelenOS synchronization primitives, |
168 | with the exception as described below.</para> |
168 | with the exception as described below.</para> |
169 | </section> |
169 | </section> |
170 | 170 | ||
171 | <section> |
171 | <section> |
172 | <title>Semaphores</title> |
172 | <title>Semaphores</title> |
173 | 173 | ||
174 | <para>The interesting point about wait queues is that the number of |
174 | <para>The interesting point about wait queues is that the number of |
175 | missed wakeups is equal to the number of threads that will not block in |
175 | missed wakeups is equal to the number of threads that will not block in |
176 | <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed |
176 | <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed |
177 | instead. On the other hand, semaphores are synchronization primitives |
177 | instead. On the other hand, semaphores are synchronization primitives |
178 | that will let predefined amount of threads into its critical section and |
178 | that will let predefined amount of threads into their critical section |
179 | block any other threads above this count. However, these two cases are |
179 | and block any other threads above this count. However, these two cases |
180 | exactly the same. Semaphores in HelenOS are therefore implemented as |
180 | are exactly the same. Semaphores in HelenOS are therefore implemented as |
181 | wait queues with a single semantic change: their wait queue is |
181 | wait queues with a single semantic change: their wait queue is |
182 | initialized to have so many missed wakeups as is the number of threads |
182 | initialized to have so many missed wakeups as is the number of threads |
183 | that the semphore intends to let into its critical section |
183 | that the semphore intends to let into its critical section |
184 | simultaneously.</para> |
184 | simultaneously.</para> |
185 | 185 | ||
186 | <para>In the semaphore language, the wait queue operation |
186 | <para>In the semaphore language, the wait queue operation |
187 | <emphasis>waitq_sleep_timeout</emphasis> corresponds to |
187 | <emphasis>waitq_sleep_timeout</emphasis> corresponds to |
188 | <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation, |
188 | <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation, |
189 | represented by the function <emphasis>semaphore_down_timeout</emphasis> |
189 | represented by the function <emphasis>semaphore_down_timeout</emphasis> |
190 | and by way of similitude the wait queue operation waitq_wakeup |
190 | and by way of similitude the wait queue operation waitq_wakeup |
191 | corresponds to semaphore <emphasis>up</emphasis> operation, represented |
191 | corresponds to semaphore <emphasis>up</emphasis> operation, represented |
192 | by the function <emphasis>sempafore_up</emphasis>. The conditional down |
192 | by the function <emphasis>sempafore_up</emphasis>. The conditional down |
193 | operation is called <emphasis>semaphore_trydown</emphasis>.</para> |
193 | operation is called <emphasis>semaphore_trydown</emphasis>.</para> |
194 | </section> |
194 | </section> |
195 | 195 | ||
196 | <section> |
196 | <section> |
197 | <title>Mutexes</title> |
197 | <title>Mutexes</title> |
198 | 198 | ||
199 | <para>Mutexes are sometimes referred to as binary sempahores. That means |
199 | <para>Mutexes are sometimes referred to as binary sempahores. That means |
200 | that mutexes are like semaphores that allow only one thread in its |
200 | that mutexes are like semaphores that allow only one thread in its |
201 | critical section. Indeed, mutexes in HelenOS are implemented exactly in |
201 | critical section. Indeed, mutexes in HelenOS are implemented exactly in |
202 | this way: they are built on top of semaphores. From another point of |
202 | this way: they are built on top of semaphores. From another point of |
203 | view, they can be viewed as spinlocks without busy waiting. Their |
203 | view, they can be viewed as spinlocks without busy waiting. Their |
204 | semaphore heritage provides good basics for both conditional operation |
204 | semaphore heritage provides good basics for both conditional operation |
205 | and operation with timeout. The locking operation is called |
205 | and operation with timeout. The locking operation is called |
206 | <emphasis>mutex_lock</emphasis>, the conditional locking operation is |
206 | <emphasis>mutex_lock</emphasis>, the conditional locking operation is |
207 | called <emphasis>mutex_trylock</emphasis> and the unlocking operation is |
207 | called <emphasis>mutex_trylock</emphasis> and the unlocking operation is |
208 | called <emphasis>mutex_unlock</emphasis>.</para> |
208 | called <emphasis>mutex_unlock</emphasis>.</para> |
209 | </section> |
209 | </section> |
210 | 210 | ||
211 | <section> |
211 | <section> |
212 | <title>Reader/writer locks</title> |
212 | <title>Reader/writer locks</title> |
213 | 213 | ||
214 | <para>Reader/writer locks, or rwlocks, are by far the most complicated |
214 | <para>Reader/writer locks, or rwlocks, are by far the most complicated |
215 | synchronization primitive within the kernel. The goal of these locks is |
215 | synchronization primitive within the kernel. The goal of these locks is |
216 | to improve concurrency of applications, in which threads need to |
216 | to improve concurrency of applications, in which threads need to |
217 | synchronize access to a shared resource, and that access can be |
217 | synchronize access to a shared resource, and that access can be |
218 | partitioned into a read-only mode and a write mode. Reader/writer locks |
218 | partitioned into a read-only mode and a write mode. Reader/writer locks |
219 | should make it possible for several, possibly many, readers to enter the |
219 | should make it possible for several, possibly many, readers to enter the |
220 | critical section, provided that no writer is currently in the critical |
220 | critical section, provided that no writer is currently in the critical |
221 | section, or to be in the critical section contemporarily. Writers are |
221 | section, or to be in the critical section contemporarily. Writers are |
222 | allowed to enter the critical section only individually, provided that |
222 | allowed to enter the critical section only individually, provided that |
223 | no reader is in the critical section already. Applications, in which the |
223 | no reader is in the critical section already. Applications, in which the |
224 | majority of operations can be done in the read-only mode, can benefit |
224 | majority of operations can be done in the read-only mode, can benefit |
225 | from increased concurrency created by reader/writer locks.</para> |
225 | from increased concurrency created by reader/writer locks.</para> |
226 | 226 | ||
227 | <para>During reader/writer lock construction, a decision should be made |
227 | <para>During reader/writer lock construction, a decision should be made |
228 | whether readers will be prefered over writers or whether writers will be |
228 | whether readers will be prefered over writers or whether writers will be |
229 | prefered over readers in cases when the lock is not currently held and |
229 | prefered over readers in cases when the lock is not currently held and |
230 | both a reader and a writer want to gain the lock. Some operating systems |
230 | both a reader and a writer want to gain the lock. Some operating systems |
231 | prefer one group over the other, creating thus a possibility for |
231 | prefer one group over the other, creating thus a possibility for |
232 | starving the unprefered group. In the HelenOS operating system, none of |
232 | starving the unprefered group. In the HelenOS operating system, none of |
233 | the two groups is prefered. The lock is granted on a first come, first |
233 | the two groups is prefered. The lock is granted on a first come, first |
234 | served basis with the additional note that readers are granted the lock |
234 | served basis with the additional note that readers are granted the lock |
235 | in the biggest possible batch.</para> |
235 | in the biggest possible batch.</para> |
236 | 236 | ||
237 | <para>With this policy and the timeout modes of operation, the direct |
237 | <para>With this policy and the timeout modes of operation, the direct |
238 | hand-off becomes much more complicated. For instance, a writer leaving |
238 | hand-off becomes much more complicated. For instance, a writer leaving |
239 | the critical section must wake up all leading readers in the rwlock's |
239 | the critical section must wake up all leading readers in the rwlock's |
240 | wait queue or one leading writer or no-one if no thread is waiting. |
240 | wait queue or one leading writer or no-one if no thread is waiting. |
241 | Similarily, the last reader leaving the critical section must wakeup the |
241 | Similarily, the last reader leaving the critical section must wakeup the |
242 | sleeping writer if there are any sleeping threads left at all. As |
242 | sleeping writer if there are any sleeping threads left at all. As |
243 | another example, if a writer at the beginning of the rwlock's wait queue |
243 | another example, if a writer at the beginning of the rwlock's wait queue |
244 | times out and the lock is held by at least one reader, the writer which |
244 | times out and the lock is held by at least one reader, the writer which |
245 | has timed out must first wake up all readers that follow him in the |
245 | has timed out must first wake up all readers that follow him in the |
246 | queue prior to signalling the timeout itself and giving up.</para> |
246 | queue prior to signalling the timeout itself and giving up.</para> |
247 | 247 | ||
248 | <para>Due to the issues mentioned in the previous paragraph, the |
248 | <para>Due to the issues mentioned in the previous paragraph, the |
249 | reader/writer lock imlpementation needs to walk the rwlock wait queue's |
249 | reader/writer lock imlpementation needs to walk the rwlock wait queue's |
250 | list of sleeping threads directly, in order to find out the type of |
250 | list of sleeping threads directly, in order to find out the type of |
251 | access that the queueing threads demand. This makes the code difficult |
251 | access that the queueing threads demand. This makes the code difficult |
252 | to understand and dependent on the internal implementation of the wait |
252 | to understand and dependent on the internal implementation of the wait |
253 | queue. Nevertheless, it remains unclear to the authors of HelenOS |
253 | queue. Nevertheless, it remains unclear to the authors of HelenOS |
254 | whether a simpler but equivalently fair solution exists.</para> |
254 | whether a simpler but equivalently fair solution exists.</para> |
255 | 255 | ||
256 | <para>The implementation of rwlocks as it has been already put, makes |
256 | <para>The implementation of rwlocks as it has been already put, makes |
257 | use of one single wait queue for both readers and writers, thus avoiding |
257 | use of one single wait queue for both readers and writers, thus avoiding |
258 | any possibility of starvation. In fact, rwlocks use a mutex rather than |
258 | any possibility of starvation. In fact, rwlocks use a mutex rather than |
259 | a bare wait queue. This mutex is called <emphasis>exclusive</emphasis> |
259 | a bare wait queue. This mutex is called <emphasis>exclusive</emphasis> |
260 | and is used to synchronize writers. The writer's lock operation, |
260 | and is used to synchronize writers. The writer's lock operation, |
261 | <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire |
261 | <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire |
262 | the exclusive mutex. If it succeeds, the writer is granted the rwlock. |
262 | the exclusive mutex. If it succeeds, the writer is granted the rwlock. |
263 | However, if the operation fails (e.g. times out), the writer must check |
263 | However, if the operation fails (e.g. times out), the writer must check |
264 | for potential readers at the head of the list of sleeping threads |
264 | for potential readers at the head of the list of sleeping threads |
265 | associated with the mutex's wait queue and then proceed according to the |
265 | associated with the mutex's wait queue and then proceed according to the |
266 | procedure outlined above.</para> |
266 | procedure outlined above.</para> |
267 | 267 | ||
268 | <para>The exclusive mutex plays an important role in reader |
268 | <para>The exclusive mutex plays an important role in reader |
269 | synchronization as well. However, a reader doing the reader's lock |
269 | synchronization as well. However, a reader doing the reader's lock |
270 | operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass |
270 | operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass |
271 | this mutex when it detects that:</para> |
271 | this mutex when it detects that:</para> |
272 | 272 | ||
273 | <orderedlist> |
273 | <orderedlist> |
274 | <listitem> |
274 | <listitem> |
275 | <para>there are other readers in the critical section and</para> |
275 | <para>there are other readers in the critical section and</para> |
276 | </listitem> |
276 | </listitem> |
277 | 277 | ||
278 | <listitem> |
278 | <listitem> |
279 | <para>there are no sleeping threads waiting for the exclusive |
279 | <para>there are no sleeping threads waiting for the exclusive |
280 | mutex.</para> |
280 | mutex.</para> |
281 | </listitem> |
281 | </listitem> |
282 | </orderedlist> |
282 | </orderedlist> |
283 | 283 | ||
284 | <para>If both conditions are true, the reader will bypass the mutex, |
284 | <para>If both conditions are true, the reader will bypass the mutex, |
285 | increment the number of readers in the critical section and then enter |
285 | increment the number of readers in the critical section and then enter |
286 | the critical section. Note that if there are any sleeping threads at the |
286 | the critical section. Note that if there are any sleeping threads at the |
287 | beginning of the wait queue, the first must be a writer. If the |
287 | beginning of the wait queue, the first must be a writer. If the |
288 | conditions are not fulfilled, the reader normally waits until the |
288 | conditions are not fulfilled, the reader normally waits until the |
289 | exclusive mutex is granted to it.</para> |
289 | exclusive mutex is granted to it.</para> |
290 | </section> |
290 | </section> |
291 | 291 | ||
292 | <section> |
292 | <section> |
293 | <title>Condition variables</title> |
293 | <title>Condition variables</title> |
294 | 294 | ||
- | 295 | <para>Condition variables can be used for waiting until a condition |
|
- | 296 | becomes true. In this respect, they are similar to wait queues. But |
|
- | 297 | contrary to wait queues, condition variables have different semantics |
|
- | 298 | that allows events to be lost when there is no thread waiting for them. |
|
- | 299 | In order to support this, condition variables don't use direct hand-off |
|
- | 300 | and operate in a way similar to the example below. A thread waiting for |
|
- | 301 | the condition becoming true does the following:</para> |
|
- | 302 | ||
- | 303 | <para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>); |
|
- | 304 | while (!<varname>condition</varname>) |
|
- | 305 | <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>); |
|
- | 306 | /* <remark>the condition is true, do something</remark> */ |
|
- | 307 | <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para> |
|
- | 308 | ||
- | 309 | <para>A thread that causes the condition become true signals this event |
|
- | 310 | like this:</para> |
|
- | 311 | ||
- | 312 | <para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>); |
|
- | 313 | <varname>condition</varname> = <constant>true</constant>; |
|
- | 314 | <function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */ |
|
- | 315 | <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para> |
|
- | 316 | ||
- | 317 | <para>The wait operation, <emphasis>condvar_wait_timeout</emphasis>, |
|
- | 318 | always puts the calling thread to sleep. The thread then sleeps until |
|
- | 319 | another thread invokes <emphasis>condvar_broadcast</emphasis> on the |
|
- | 320 | same condition variable or until it is woken up by |
|
- | 321 | <emphasis>condvar_signal</emphasis>. The |
|
- | 322 | <emphasis>condvar_signal</emphasis> operation unblocks the first thread |
|
- | 323 | blocking on the condition variable while the |
|
- | 324 | <emphasis>condvar_broadcast</emphasis> operation unblocks all threads |
|
- | 325 | blocking there. If there are no blocking threads, these two operations |
|
- | 326 | have no efect.</para> |
|
- | 327 | ||
- | 328 | <para>Note that the threads must synchronize over a dedicated mutex. To |
|
- | 329 | prevent race condition between <emphasis>condvar_wait_timeout</emphasis> |
|
- | 330 | and <emphasis>condvar_signal</emphasis> or |
|
- | 331 | <emphasis>condvar_broadcast</emphasis>, the mutex is passed to |
|
- | 332 | <emphasis>condvar_wait_timeout</emphasis> which then atomically puts the |
|
- | 333 | calling thread asleep and unlocks the mutex. When the thread eventually |
|
- | 334 | wakes up, <emphasis>condvar_wait</emphasis> regains the mutex and |
|
- | 335 | returns.</para> |
|
- | 336 | ||
- | 337 | <para>Also note, that there is no conditional operation for condition |
|
- | 338 | variables. Such an operation would make no sence since condition |
|
- | 339 | variables are defined to forget events for which there is no waiting |
|
- | 340 | thread and because <emphasis>condvar_wait</emphasis> must always go to |
|
- | 341 | sleep. The operation with timeout is supported as usually.</para> |
|
- | 342 | ||
- | 343 | <para>In HelenOS, condition variables are based on wait queues. As it is |
|
- | 344 | already mentioned above, wait queues remember missed events while |
|
- | 345 | condition variables must not do so. This is reasoned by the fact that |
|
- | 346 | condition variables are designed for scenarios in which an event might |
|
- | 347 | occur very many times without being picked up by any waiting thread. On |
|
- | 348 | the other hand, wait queues would remember any event that had not been |
|
- | 349 | picked up by a call to <emphasis>waitq_sleep_timeout</emphasis>. |
|
- | 350 | Therefore, if wait queues were used directly and without any changes to |
|
- | 351 | implement condition variables, the missed_wakeup counter would hurt |
|
- | 352 | performance of the implementation: the <code>while</code> loop in |
|
- | 353 | <emphasis>condvar_wait_timeout</emphasis> would effectively do busy |
|
- | 354 | waiting until all missed wakeups were discarded.</para> |
|
- | 355 | ||
- | 356 | <para>The requirement on the wait operation to atomically put the caller |
|
- | 357 | to sleep and release the mutex poses an interesting problem on |
|
- | 358 | <emphasis>condvar_wait_timeout</emphasis>. More precisely, the thread |
|
- | 359 | should sleep in the condvar's wait queue prior to releasing the mutex, |
|
- | 360 | but it must not hold the mutex when it is sleeping.</para> |
|
- | 361 | ||
- | 362 | <para>Problems described in the two previous paragraphs are addressed in |
|
- | 363 | HelenOS by dividing the <emphasis>waitq_sleep_timeout</emphasis> |
|
- | 364 | function into three pieces:</para> |
|
- | 365 | ||
- | 366 | <orderedlist> |
|
- | 367 | <listitem> |
|
- | 368 | <para><emphasis>waitq_sleep_prepare</emphasis> prepares the thread |
|
- | 369 | to go to sleep by, among other things, locking the wait |
|
- | 370 | queue;</para> |
|
- | 371 | </listitem> |
|
- | 372 | ||
- | 373 | <listitem> |
|
- | 374 | <para><emphasis>waitq_sleep_timeout_unsafe</emphasis> implements the |
|
- | 375 | core blocking logic;</para> |
|
- | 376 | </listitem> |
|
- | 377 | ||
- | 378 | <listitem> |
|
- | 379 | <para><emphasis>waitq_sleep_finish</emphasis> performs cleanup after |
|
- | 380 | <emphasis>waitq_sleep_timeout_unsafe</emphasis>; after this call, |
|
- | 381 | the wait queue spinlock is guaranteed to be unlocked by the |
|
- | 382 | caller</para> |
|
- | 383 | </listitem> |
|
- | 384 | </orderedlist> |
|
- | 385 | ||
- | 386 | <para>The stock <emphasis>waitq_sleep_timeout</emphasis> is then a mere |
|
- | 387 | wrapper that calls these three functions. It is provided for convenience |
|
- | 388 | in cases where the caller doesn't require such a low level control. |
|
- | 389 | However, the implementation of <emphasis>condvar_wait_timeout</emphasis> |
|
- | 390 | does need this finer-grained control because it has to interleave calls |
|
- | 391 | to these functions by other actions. It carries its operations out in |
|
- | 392 | the following order:</para> |
|
- | 393 | ||
- | 394 | <orderedlist> |
|
- | 395 | <listitem> |
|
- | 396 | <para>calls <emphasis>waitq_sleep_prepare</emphasis> in order to |
|
- | 397 | lock the condition variable's wait queue,</para> |
|
- | 398 | </listitem> |
|
- | 399 | ||
- | 400 | <listitem> |
|
295 | <para>Condvars explanation</para> |
401 | <para>releases the mutex,</para> |
- | 402 | </listitem> |
|
- | 403 | ||
- | 404 | <listitem> |
|
- | 405 | <para>clears the counter of missed wakeups,</para> |
|
- | 406 | </listitem> |
|
- | 407 | ||
- | 408 | <listitem> |
|
- | 409 | <para>calls <emphasis>waitq_sleep_timeout_unsafe</emphasis>,</para> |
|
- | 410 | </listitem> |
|
- | 411 | ||
- | 412 | <listitem> |
|
- | 413 | <para>retakes the mutex,</para> |
|
- | 414 | </listitem> |
|
- | 415 | ||
- | 416 | <listitem> |
|
- | 417 | <para>calls <emphasis>waitq_sleep_finish</emphasis>.</para> |
|
- | 418 | </listitem> |
|
- | 419 | </orderedlist> |
|
296 | </section> |
420 | </section> |
297 | </section> |
421 | </section> |
298 | 422 | ||
299 | <section> |
423 | <section> |
300 | <title>Userspace synchronization</title> |
424 | <title>Userspace synchronization</title> |
301 | 425 | ||
302 | <section> |
426 | <section> |
303 | <title>Futexes</title> |
427 | <title>Futexes</title> |
304 | 428 | ||
305 | <para></para> |
429 | <para></para> |
306 | </section> |
430 | </section> |
307 | </section> |
431 | </section> |
308 | </chapter> |
432 | </chapter> |