Rev 57 | Rev 72 | Go to most recent revision | Only display areas with differences | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 57 | Rev 62 | ||
---|---|---|---|
1 | <?xml version="1.0" encoding="UTF-8"?> |
1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <chapter id="sync"> |
2 | <chapter id="sync"> |
3 | <?dbhtml filename="sync.html"?> |
3 | <?dbhtml filename="sync.html"?> |
4 | 4 | ||
5 | <title>Mutual exclusion and synchronization</title> |
5 | <title>Mutual exclusion and synchronization</title> |
6 | 6 | ||
7 | <section> |
7 | <section> |
8 | <title>Introduction</title> |
8 | <title>Introduction</title> |
9 | 9 | ||
10 | <para>The HelenOS operating system is designed to make use of the |
10 | <para>The HelenOS operating system is designed to make use of the |
11 | parallelism offered by the hardware and to exploit concurrency of both the |
11 | parallelism offered by the hardware and to exploit concurrency of both the |
12 | kernel and userspace tasks. This is achieved through multiprocessor |
12 | kernel and userspace tasks. This is achieved through multiprocessor |
13 | support and several levels of multiprogramming such as multitasking, |
13 | support and several levels of multiprogramming such as multitasking, |
14 | multithreading and also through userspace pseudo threads. However, such a |
14 | multithreading and also through userspace pseudo threads. However, such a |
15 | highly concurrent environment needs safe and efficient ways to handle |
15 | highly concurrent environment needs safe and efficient ways to handle |
16 | mutual exclusion and synchronization of many execution flows.</para> |
16 | mutual exclusion and synchronization of many execution flows.</para> |
17 | </section> |
17 | </section> |
18 | 18 | ||
19 | <section> |
19 | <section> |
20 | <title>Active kernel primitives</title> |
20 | <title>Active kernel primitives</title> |
21 | 21 | ||
22 | <section> |
22 | <section> |
23 | <title>Spinlocks</title> |
23 | <title>Spinlocks</title> |
24 | 24 | ||
25 | <para>The basic mutual exclusion primitive is the spinlock. The spinlock |
25 | <para>The basic mutual exclusion primitive is the spinlock. The spinlock |
26 | implements active waiting for the availability of a memory lock (i.e. |
26 | implements active waiting for the availability of a memory lock (i.e. |
27 | simple variable) in a multiprocessor-safe manner. This safety is |
27 | simple variable) in a multiprocessor-safe manner. This safety is |
28 | achieved through the use of a specialized, architecture-dependent, |
28 | achieved through the use of a specialized, architecture-dependent, |
29 | atomic test-and-set operation which either locks the spinlock (i.e. sets |
29 | atomic test-and-set operation which either locks the spinlock (i.e. sets |
30 | the variable) or, provided that it is already locked, leaves it |
30 | the variable) or, provided that it is already locked, leaves it |
31 | unaltered. In any case, the test-and-set operation returns a value, thus |
31 | unaltered. In any case, the test-and-set operation returns a value, thus |
32 | signalling either success (i.e. zero return value) or failure (i.e. |
32 | signalling either success (i.e. zero return value) or failure (i.e. |
33 | non-zero value) in acquiring the lock. Note that this makes a |
33 | non-zero value) in acquiring the lock. Note that this makes a |
34 | fundamental difference between the naive algorithm that doesn't use the |
34 | fundamental difference between the naive algorithm that doesn't use the |
35 | atomic operation and the spinlock algortihm. While the naive algorithm |
35 | atomic operation and the spinlock algortihm. While the naive algorithm |
36 | is prone to race conditions on SMP configurations and thus is completely |
36 | is prone to race conditions on SMP configurations and thus is completely |
37 | SMP-unsafe, the spinlock algorithm eliminates the possibility of race |
37 | SMP-unsafe, the spinlock algorithm eliminates the possibility of race |
38 | conditions and is suitable for mutual exclusion use.</para> |
38 | conditions and is suitable for mutual exclusion use.</para> |
39 | 39 | ||
40 | <para>The semantics of the test-and-set operation is that the spinlock |
40 | <para>The semantics of the test-and-set operation is that the spinlock |
41 | remains unavailable until this operation called on the respective |
41 | remains unavailable until this operation called on the respective |
42 | spinlock returns zero. HelenOS builds two functions on top of the |
42 | spinlock returns zero. HelenOS builds two functions on top of the |
43 | test-and-set operation. The first function is the unconditional attempt |
43 | test-and-set operation. The first function is the unconditional attempt |
44 | to acquire the spinlock and is called <code>spinlock_lock</code>. It |
44 | to acquire the spinlock and is called <code>spinlock_lock</code>. It |
45 | simply loops until the test-and-set returns a zero value. The other |
45 | simply loops until the test-and-set returns a zero value. The other |
46 | function, <code>spinlock_trylock</code>, is the conditional lock |
46 | function, <code>spinlock_trylock</code>, is the conditional lock |
47 | operation and calls the test-and-set only once to find out whether it |
47 | operation and calls the test-and-set only once to find out whether it |
48 | managed to acquire the spinlock or not. The conditional operation is |
48 | managed to acquire the spinlock or not. The conditional operation is |
49 | useful in situations in which an algorithm cannot acquire more spinlocks |
49 | useful in situations in which an algorithm cannot acquire more spinlocks |
50 | in the proper order and a deadlock cannot be avoided. In such a case, |
50 | in the proper order and a deadlock cannot be avoided. In such a case, |
51 | the algorithm would detect the danger and instead of possibly |
51 | the algorithm would detect the danger and instead of possibly |
52 | deadlocking the system it would simply release some spinlocks it already |
52 | deadlocking the system it would simply release some spinlocks it already |
53 | holds and retry the whole operation with the hope that it will succeed |
53 | holds and retry the whole operation with the hope that it will succeed |
54 | next time. The unlock function, <code>spinlock_unlock</code>, is quite |
54 | next time. The unlock function, <code>spinlock_unlock</code>, is quite |
55 | easy - it merely clears the spinlock variable.</para> |
55 | easy - it merely clears the spinlock variable.</para> |
56 | 56 | ||
57 | <para>Nevertheless, there is a special issue related to hardware |
57 | <para>Nevertheless, there is a special issue related to hardware |
58 | optimizations that modern processors implement. Particularly problematic |
58 | optimizations that modern processors implement. Particularly problematic |
59 | is the out-of-order execution of instructions within the critical |
59 | is the out-of-order execution of instructions within the critical |
60 | section protected by a spinlock. The processors are always |
60 | section protected by a spinlock. The processors are always |
61 | self-consistent so that they can carry out speculatively executed |
61 | self-consistent so that they can carry out speculatively executed |
62 | instructions in the right order with regard to dependencies among those |
62 | instructions in the right order with regard to dependencies among those |
63 | instructions. However, the dependency between instructions inside the |
63 | instructions. However, the dependency between instructions inside the |
64 | critical section and those that implement locking and unlocking of the |
64 | critical section and those that implement locking and unlocking of the |
65 | respective spinlock is not implicit on some processor architectures. As |
65 | respective spinlock is not implicit on some processor architectures. As |
66 | a result, the processor needs to be explicitly told about each |
66 | a result, the processor needs to be explicitly told about each |
67 | occurrence of such a dependency. Therefore, HelenOS adds |
67 | occurrence of such a dependency. Therefore, HelenOS adds |
68 | architecture-specific hooks to all <code>spinlock_lock</code>, |
68 | architecture-specific hooks to all <code>spinlock_lock</code>, |
69 | <code>spinlock_trylock</code> and <code>spinlock_unlock</code> functions |
69 | <code>spinlock_trylock</code> and <code>spinlock_unlock</code> functions |
70 | to prevent the instructions inside the critical section from permeating |
70 | to prevent the instructions inside the critical section from permeating |
71 | out. On some architectures, these hooks can be void because the |
71 | out. On some architectures, these hooks can be void because the |
72 | dependencies are implicitly there because of the special properties of |
72 | dependencies are implicitly there because of the special properties of |
73 | locking and unlocking instructions. However, other architectures need to |
73 | locking and unlocking instructions. However, other architectures need to |
74 | instrument these hooks with different memory barriers, depending on what |
74 | instrument these hooks with different memory barriers, depending on what |
75 | operations could permeate out.</para> |
75 | operations could permeate out.</para> |
76 | 76 | ||
77 | <para>Spinlocks have one significant drawback: when held for longer time |
77 | <para>Spinlocks have one significant drawback: when held for longer time |
78 | periods, they harm both parallelism and concurrency. The processor |
78 | periods, they harm both parallelism and concurrency. The processor |
79 | executing <code>spinlock_lock</code> does not do any fruitful work and |
79 | executing <code>spinlock_lock</code> does not do any fruitful work and |
80 | is effectively halted until it can grab the lock and proceed. |
80 | is effectively halted until it can grab the lock and proceed. |
81 | Similarily, other execution flows cannot execute on the processor that |
81 | Similarily, other execution flows cannot execute on the processor that |
82 | holds the spinlock because the kernel disables preemption on that |
82 | holds the spinlock because the kernel disables preemption on that |
83 | processor when a spinlock is held. The reason behind disabling |
83 | processor when a spinlock is held. The reason behind disabling |
84 | preemption is priority inversion problem avoidance. For the same reason, |
84 | preemption is priority inversion problem avoidance. For the same reason, |
85 | threads are strongly discouraged from sleeping when they hold a |
85 | threads are strongly discouraged from sleeping when they hold a |
86 | spinlock.</para> |
86 | spinlock.</para> |
87 | 87 | ||
88 | <para>To summarize, spinlocks represent very simple and essential mutual |
88 | <para>To summarize, spinlocks represent very simple and essential mutual |
89 | exclusion primitive for SMP systems. On the other hand, spinlocks scale |
89 | exclusion primitive for SMP systems. On the other hand, spinlocks scale |
90 | poorly because of the active loop they are based on. Therefore, |
90 | poorly because of the active loop they are based on. Therefore, |
91 | spinlocks are used in HelenOS only for short-time mutual exclusion and |
91 | spinlocks are used in HelenOS only for short-time mutual exclusion and |
92 | in cases where the mutual exclusion is required out of thread context. |
92 | in cases where the mutual exclusion is required out of thread context. |
93 | Lastly, spinlocks are used in the construction of passive |
93 | Lastly, spinlocks are used in the construction of passive |
94 | synchronization primitives.</para> |
94 | synchronization primitives.</para> |
95 | </section> |
95 | </section> |
96 | </section> |
96 | </section> |
97 | 97 | ||
98 | <section> |
98 | <section> |
99 | <title>Passive kernel synchronization</title> |
99 | <title>Passive kernel synchronization</title> |
100 | 100 | ||
101 | <section> |
101 | <section> |
102 | <title>Wait queues</title> |
102 | <title>Wait queues</title> |
103 | 103 | ||
104 | <para>A wait queue is the basic passive synchronization primitive on |
104 | <para>A wait queue is the basic passive synchronization primitive on |
105 | which all other passive synchronization primitives are built. Simply |
105 | which all other passive synchronization primitives are built. Simply |
106 | put, it allows a thread to sleep until an event associated with the |
106 | put, it allows a thread to sleep until an event associated with the |
107 | particular wait queue occurs. Multiple threads are notified about |
107 | particular wait queue occurs. Multiple threads are notified about |
108 | incoming events in a first come, first served fashion. Moreover, should |
108 | incoming events in a first come, first served fashion. Moreover, should |
109 | the event come before any thread waits for it, it is recorded in the |
109 | the event come before any thread waits for it, it is recorded in the |
110 | wait queue as a missed wakeup and later forwarded to the first thread |
110 | wait queue as a missed wakeup and later forwarded to the first thread |
111 | that decides to wait in the queue. The inner structures of the wait |
111 | that decides to wait in the queue. The inner structures of the wait |
112 | queue are protected by a spinlock.</para> |
112 | queue are protected by a spinlock.</para> |
113 | 113 | ||
114 | <para>The thread that wants to wait for a wait queue event uses the |
114 | <para>The thread that wants to wait for a wait queue event uses the |
115 | <code>waitq_sleep_timeout</code> function. The algorithm then checks the |
115 | <code>waitq_sleep_timeout</code> function. The algorithm then checks the |
116 | wait queue's counter of missed wakeups and if there are any missed |
116 | wait queue's counter of missed wakeups and if there are any missed |
117 | wakeups, the call returns immediately. The call also returns immediately |
117 | wakeups, the call returns immediately. The call also returns immediately |
118 | if only a conditional wait was requested. Otherwise the thread is |
118 | if only a conditional wait was requested. Otherwise the thread is |
119 | enqueued in the wait queue's list of sleeping threads and its state is |
119 | enqueued in the wait queue's list of sleeping threads and its state is |
120 | changed to <constant>Sleeping</constant>. It then sleeps until one of |
120 | changed to <constant>Sleeping</constant>. It then sleeps until one of |
121 | the following events happens:</para> |
121 | the following events happens:</para> |
122 | 122 | ||
123 | <orderedlist> |
123 | <orderedlist> |
124 | <listitem> |
124 | <listitem> |
125 | <para>another thread calls <code>waitq_wakeup</code> and the thread |
125 | <para>another thread calls <code>waitq_wakeup</code> and the thread |
126 | is the first thread in the wait queue's list of sleeping |
126 | is the first thread in the wait queue's list of sleeping |
127 | threads;</para> |
127 | threads;</para> |
128 | </listitem> |
128 | </listitem> |
129 | 129 | ||
130 | <listitem> |
130 | <listitem> |
131 | <para>another thread calls <code>waitq_interrupt_sleep</code> on the |
131 | <para>another thread calls <code>waitq_interrupt_sleep</code> on the |
132 | sleeping thread;</para> |
132 | sleeping thread;</para> |
133 | </listitem> |
133 | </listitem> |
134 | 134 | ||
135 | <listitem> |
135 | <listitem> |
136 | <para>the sleep times out provided that none of the previous |
136 | <para>the sleep times out provided that none of the previous |
137 | occurred within a specified time limit; the limit can be |
137 | occurred within a specified time limit; the limit can be |
138 | infinity.</para> |
138 | infinity.</para> |
139 | </listitem> |
139 | </listitem> |
140 | </orderedlist> |
140 | </orderedlist> |
141 | 141 | ||
142 | <para>All five possibilities (immediate return on success, immediate |
142 | <para>All five possibilities (immediate return on success, immediate |
143 | return on failure, wakeup after sleep, interruption and timeout) are |
143 | return on failure, wakeup after sleep, interruption and timeout) are |
144 | distinguishable by the return value of <code>waitq_sleep_timeout</code>. |
144 | distinguishable by the return value of <code>waitq_sleep_timeout</code>. |
145 | Being able to interrupt a sleeping thread is essential for externally |
145 | Being able to interrupt a sleeping thread is essential for externally |
146 | initiated thread termination. The ability to wait only for a certain |
146 | initiated thread termination. The ability to wait only for a certain |
147 | amount of time is used, for instance, to passively delay thread |
147 | amount of time is used, for instance, to passively delay thread |
148 | execution by several microseconds or even seconds in |
148 | execution by several microseconds or even seconds in |
149 | <code>thread_sleep</code> function. Due to the fact that all other |
149 | <code>thread_sleep</code> function. Due to the fact that all other |
150 | passive kernel synchronization primitives are based on wait queues, they |
150 | passive kernel synchronization primitives are based on wait queues, they |
151 | also have the option of being interrutped and, more importantly, can |
151 | also have the option of being interrutped and, more importantly, can |
152 | timeout. All of them also implement the conditional operation. |
152 | timeout. All of them also implement the conditional operation. |
153 | Furthemore, this very fundamental interface reaches up to the |
153 | Furthemore, this very fundamental interface reaches up to the |
154 | implementation of futexes - userspace synchronization primitive, which |
154 | implementation of futexes - userspace synchronization primitive, which |
155 | makes it possible for a userspace thread to request a synchronization |
155 | makes it possible for a userspace thread to request a synchronization |
156 | operation with a timeout or a conditional operation.</para> |
156 | operation with a timeout or a conditional operation.</para> |
157 | 157 | ||
158 | <para>From the description above, it should be apparent, that when a |
158 | <para>From the description above, it should be apparent, that when a |
159 | sleeping thread is woken by <code>waitq_wakeup</code> or when |
159 | sleeping thread is woken by <code>waitq_wakeup</code> or when |
160 | <code>waitq_sleep_timeout</code> succeeds immediately, the thread can be |
160 | <code>waitq_sleep_timeout</code> succeeds immediately, the thread can be |
161 | sure that the event has occurred. The thread need not and should not |
161 | sure that the event has occurred. The thread need not and should not |
162 | verify this fact. This approach is called direct hand-off and is |
162 | verify this fact. This approach is called direct hand-off and is |
163 | characteristic for all passive HelenOS synchronization primitives, with |
163 | characteristic for all passive HelenOS synchronization primitives, with |
164 | the exception as described below.</para> |
164 | the exception as described below.</para> |
165 | </section> |
165 | </section> |
166 | 166 | ||
167 | <section> |
167 | <section> |
168 | <title>Semaphores</title> |
168 | <title>Semaphores</title> |
169 | 169 | ||
170 | <para>The interesting point about wait queues is that the number of |
170 | <para>The interesting point about wait queues is that the number of |
171 | missed wakeups is equal to the number of threads that will not block in |
171 | missed wakeups is equal to the number of threads that will not block in |
172 | <code>watiq_sleep_timeout</code> and would immediately succeed instead. |
172 | <code>watiq_sleep_timeout</code> and would immediately succeed instead. |
173 | On the other hand, semaphores are synchronization primitives that will |
173 | On the other hand, semaphores are synchronization primitives that will |
174 | let predefined amount of threads into their critical section and block |
174 | let predefined amount of threads into their critical section and block |
175 | any other threads above this count. However, these two cases are exactly |
175 | any other threads above this count. However, these two cases are exactly |
176 | the same. Semaphores in HelenOS are therefore implemented as wait queues |
176 | the same. Semaphores in HelenOS are therefore implemented as wait queues |
177 | with a single semantic change: their wait queue is initialized to have |
177 | with a single semantic change: their wait queue is initialized to have |
178 | so many missed wakeups as is the number of threads that the semphore |
178 | so many missed wakeups as is the number of threads that the semphore |
179 | intends to let into its critical section simultaneously.</para> |
179 | intends to let into its critical section simultaneously.</para> |
180 | 180 | ||
181 | <para>In the semaphore language, the wait queue operation |
181 | <para>In the semaphore language, the wait queue operation |
182 | <code>waitq_sleep_timeout</code> corresponds to semaphore |
182 | <code>waitq_sleep_timeout</code> corresponds to semaphore |
183 | <code>down</code> operation, represented by the function |
183 | <code>down</code> operation, represented by the function |
184 | <code>semaphore_down_timeout</code> and by way of similitude the wait |
184 | <code>semaphore_down_timeout</code> and by way of similitude the wait |
185 | queue operation waitq_wakeup corresponds to semaphore <code>up</code> |
185 | queue operation waitq_wakeup corresponds to semaphore <code>up</code> |
186 | operation, represented by the function <code>sempafore_up</code>. The |
186 | operation, represented by the function <code>sempafore_up</code>. The |
187 | conditional down operation is called |
187 | conditional down operation is called |
188 | <code>semaphore_trydown</code>.</para> |
188 | <code>semaphore_trydown</code>.</para> |
189 | </section> |
189 | </section> |
190 | 190 | ||
191 | <section> |
191 | <section> |
192 | <title>Mutexes</title> |
192 | <title>Mutexes</title> |
193 | 193 | ||
194 | <para>Mutexes are sometimes referred to as binary sempahores. That means |
194 | <para>Mutexes are sometimes referred to as binary sempahores. That means |
195 | that mutexes are like semaphores that allow only one thread in its |
195 | that mutexes are like semaphores that allow only one thread in its |
196 | critical section. Indeed, mutexes in HelenOS are implemented exactly in |
196 | critical section. Indeed, mutexes in HelenOS are implemented exactly in |
197 | this way: they are built on top of semaphores. From another point of |
197 | this way: they are built on top of semaphores. From another point of |
198 | view, they can be viewed as spinlocks without busy waiting. Their |
198 | view, they can be viewed as spinlocks without busy waiting. Their |
199 | semaphore heritage provides good basics for both conditional operation |
199 | semaphore heritage provides good basics for both conditional operation |
200 | and operation with timeout. The locking operation is called |
200 | and operation with timeout. The locking operation is called |
201 | <code>mutex_lock</code>, the conditional locking operation is called |
201 | <code>mutex_lock</code>, the conditional locking operation is called |
202 | <code>mutex_trylock</code> and the unlocking operation is called |
202 | <code>mutex_trylock</code> and the unlocking operation is called |
203 | <code>mutex_unlock</code>.</para> |
203 | <code>mutex_unlock</code>.</para> |
204 | </section> |
204 | </section> |
205 | 205 | ||
206 | <section> |
206 | <section> |
207 | <title>Reader/writer locks</title> |
207 | <title>Reader/writer locks</title> |
208 | 208 | ||
209 | <para>Reader/writer locks, or rwlocks, are by far the most complicated |
209 | <para>Reader/writer locks, or rwlocks, are by far the most complicated |
210 | synchronization primitive within the kernel. The goal of these locks is |
210 | synchronization primitive within the kernel. The goal of these locks is |
211 | to improve concurrency of applications, in which threads need to |
211 | to improve concurrency of applications, in which threads need to |
212 | synchronize access to a shared resource, and that access can be |
212 | synchronize access to a shared resource, and that access can be |
213 | partitioned into a read-only mode and a write mode. Reader/writer locks |
213 | partitioned into a read-only mode and a write mode. Reader/writer locks |
214 | should make it possible for several, possibly many, readers to enter the |
214 | should make it possible for several, possibly many, readers to enter the |
215 | critical section, provided that no writer is currently in the critical |
215 | critical section, provided that no writer is currently in the critical |
216 | section, or to be in the critical section contemporarily. Writers are |
216 | section, or to be in the critical section contemporarily. Writers are |
217 | allowed to enter the critical section only individually, provided that |
217 | allowed to enter the critical section only individually, provided that |
218 | no reader is in the critical section already. Applications, in which the |
218 | no reader is in the critical section already. Applications, in which the |
219 | majority of operations can be done in the read-only mode, can benefit |
219 | majority of operations can be done in the read-only mode, can benefit |
220 | from increased concurrency created by reader/writer locks.</para> |
220 | from increased concurrency created by reader/writer locks.</para> |
221 | 221 | ||
222 | <para>During reader/writer lock construction, a decision should be made |
222 | <para>During reader/writer lock construction, a decision should be made |
223 | whether readers will be prefered over writers or whether writers will be |
223 | whether readers will be prefered over writers or whether writers will be |
224 | prefered over readers in cases when the lock is not currently held and |
224 | prefered over readers in cases when the lock is not currently held and |
225 | both a reader and a writer want to gain the lock. Some operating systems |
225 | both a reader and a writer want to gain the lock. Some operating systems |
226 | prefer one group over the other, creating thus a possibility for |
226 | prefer one group over the other, creating thus a possibility for |
227 | starving the unprefered group. In the HelenOS operating system, none of |
227 | starving the unprefered group. In the HelenOS operating system, none of |
228 | the two groups is prefered. The lock is granted on a first come, first |
228 | the two groups is prefered. The lock is granted on a first come, first |
229 | served basis with the additional note that readers are granted the lock |
229 | served basis with the additional note that readers are granted the lock |
230 | in the biggest possible batch.</para> |
230 | in the biggest possible batch.</para> |
231 | 231 | ||
232 | <para>With this policy and the timeout modes of operation, the direct |
232 | <para>With this policy and the timeout modes of operation, the direct |
233 | hand-off becomes much more complicated. For instance, a writer leaving |
233 | hand-off becomes much more complicated. For instance, a writer leaving |
234 | the critical section must wake up all leading readers in the rwlock's |
234 | the critical section must wake up all leading readers in the rwlock's |
235 | wait queue or one leading writer or no-one if no thread is waiting. |
235 | wait queue or one leading writer or no-one if no thread is waiting. |
236 | Similarily, the last reader leaving the critical section must wakeup the |
236 | Similarily, the last reader leaving the critical section must wakeup the |
237 | sleeping writer if there are any sleeping threads left at all. As |
237 | sleeping writer if there are any sleeping threads left at all. As |
238 | another example, if a writer at the beginning of the rwlock's wait queue |
238 | another example, if a writer at the beginning of the rwlock's wait queue |
239 | times out and the lock is held by at least one reader, the writer which |
239 | times out and the lock is held by at least one reader, the writer which |
240 | has timed out must first wake up all readers that follow him in the |
240 | has timed out must first wake up all readers that follow him in the |
241 | queue prior to signalling the timeout itself and giving up.</para> |
241 | queue prior to signalling the timeout itself and giving up.</para> |
242 | 242 | ||
243 | <para>Due to the issues mentioned in the previous paragraph, the |
243 | <para>Due to the issues mentioned in the previous paragraph, the |
244 | reader/writer lock imlpementation needs to walk the rwlock wait queue's |
244 | reader/writer lock imlpementation needs to walk the rwlock wait queue's |
245 | list of sleeping threads directly, in order to find out the type of |
245 | list of sleeping threads directly, in order to find out the type of |
246 | access that the queueing threads demand. This makes the code difficult |
246 | access that the queueing threads demand. This makes the code difficult |
247 | to understand and dependent on the internal implementation of the wait |
247 | to understand and dependent on the internal implementation of the wait |
248 | queue. Nevertheless, it remains unclear to the authors of HelenOS |
248 | queue. Nevertheless, it remains unclear to the authors of HelenOS |
249 | whether a simpler but equivalently fair solution exists.</para> |
249 | whether a simpler but equivalently fair solution exists.</para> |
250 | 250 | ||
251 | <para>The implementation of rwlocks as it has been already put, makes |
251 | <para>The implementation of rwlocks as it has been already put, makes |
252 | use of one single wait queue for both readers and writers, thus avoiding |
252 | use of one single wait queue for both readers and writers, thus avoiding |
253 | any possibility of starvation. In fact, rwlocks use a mutex rather than |
253 | any possibility of starvation. In fact, rwlocks use a mutex rather than |
254 | a bare wait queue. This mutex is called <code>exclusive</code> and is |
254 | a bare wait queue. This mutex is called <code>exclusive</code> and is |
255 | used to synchronize writers. The writer's lock operation, |
255 | used to synchronize writers. The writer's lock operation, |
256 | <code>rwlock_write_lock_timeout</code>, simply tries to acquire the |
256 | <code>rwlock_write_lock_timeout</code>, simply tries to acquire the |
257 | exclusive mutex. If it succeeds, the writer is granted the rwlock. |
257 | exclusive mutex. If it succeeds, the writer is granted the rwlock. |
258 | However, if the operation fails (e.g. times out), the writer must check |
258 | However, if the operation fails (e.g. times out), the writer must check |
259 | for potential readers at the head of the list of sleeping threads |
259 | for potential readers at the head of the list of sleeping threads |
260 | associated with the mutex's wait queue and then proceed according to the |
260 | associated with the mutex's wait queue and then proceed according to the |
261 | procedure outlined above.</para> |
261 | procedure outlined above.</para> |
262 | 262 | ||
263 | <para>The exclusive mutex plays an important role in reader |
263 | <para>The exclusive mutex plays an important role in reader |
264 | synchronization as well. However, a reader doing the reader's lock |
264 | synchronization as well. However, a reader doing the reader's lock |
265 | operation, <code>rwlock_read_lock_timeout</code>, may bypass this mutex |
265 | operation, <code>rwlock_read_lock_timeout</code>, may bypass this mutex |
266 | when it detects that:</para> |
266 | when it detects that:</para> |
267 | 267 | ||
268 | <orderedlist> |
268 | <orderedlist> |
269 | <listitem> |
269 | <listitem> |
270 | <para>there are other readers in the critical section and</para> |
270 | <para>there are other readers in the critical section and</para> |
271 | </listitem> |
271 | </listitem> |
272 | 272 | ||
273 | <listitem> |
273 | <listitem> |
274 | <para>there are no sleeping threads waiting for the exclusive |
274 | <para>there are no sleeping threads waiting for the exclusive |
275 | mutex.</para> |
275 | mutex.</para> |
276 | </listitem> |
276 | </listitem> |
277 | </orderedlist> |
277 | </orderedlist> |
278 | 278 | ||
279 | <para>If both conditions are true, the reader will bypass the mutex, |
279 | <para>If both conditions are true, the reader will bypass the mutex, |
280 | increment the number of readers in the critical section and then enter |
280 | increment the number of readers in the critical section and then enter |
281 | the critical section. Note that if there are any sleeping threads at the |
281 | the critical section. Note that if there are any sleeping threads at the |
282 | beginning of the wait queue, the first must be a writer. If the |
282 | beginning of the wait queue, the first must be a writer. If the |
283 | conditions are not fulfilled, the reader normally waits until the |
283 | conditions are not fulfilled, the reader normally waits until the |
284 | exclusive mutex is granted to it.</para> |
284 | exclusive mutex is granted to it.</para> |
285 | </section> |
285 | </section> |
286 | 286 | ||
287 | <section> |
287 | <section> |
288 | <title>Condition variables</title> |
288 | <title>Condition variables</title> |
289 | 289 | ||
290 | <para>Condition variables can be used for waiting until a condition |
290 | <para>Condition variables can be used for waiting until a condition |
291 | becomes true. In this respect, they are similar to wait queues. But |
291 | becomes true. In this respect, they are similar to wait queues. But |
292 | contrary to wait queues, condition variables have different semantics |
292 | contrary to wait queues, condition variables have different semantics |
293 | that allows events to be lost when there is no thread waiting for them. |
293 | that allows events to be lost when there is no thread waiting for them. |
294 | In order to support this, condition variables don't use direct hand-off |
294 | In order to support this, condition variables don't use direct hand-off |
295 | and operate in a way similar to the example below. A thread waiting for |
295 | and operate in a way similar to the example below. A thread waiting for |
296 | the condition becoming true does the following:</para> |
296 | the condition becoming true does the following:</para> |
297 | 297 | ||
- | 298 | <example> |
|
- | 299 | <title>Use of <code>condvar_wait_timeout</code>.</title> |
|
298 | <para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>); |
300 | <programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>); |
299 | while (!<varname>condition</varname>) |
301 | while (!<varname>condition</varname>) |
300 | <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>); |
302 | <function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>); |
301 | /* <remark>the condition is true, do something</remark> */ |
303 | /* <remark>the condition is true, do something</remark> */ |
302 | <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para> |
304 | <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting> |
- | 305 | </example> |
|
303 | 306 | ||
304 | <para>A thread that causes the condition become true signals this event |
307 | <para>A thread that causes the condition become true signals this event |
305 | like this:</para> |
308 | like this:</para> |
306 | 309 | ||
- | 310 | <example> |
|
- | 311 | <title>Use of <code>condvar_signal</code>.</title> |
|
307 | <para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>); |
312 | <programlisting><function>mutex_lock</function>(<varname>mtx</varname>); |
308 | <varname>condition</varname> = <constant>true</constant>; |
313 | <varname>condition</varname> = <constant>true</constant>; |
309 | <function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */ |
314 | <function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */ |
310 | <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para> |
315 | <function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></example> |
311 | 316 | ||
312 | <para>The wait operation, <code>condvar_wait_timeout</code>, always puts |
317 | <para>The wait operation, <code>condvar_wait_timeout</code>, always puts |
313 | the calling thread to sleep. The thread then sleeps until another thread |
318 | the calling thread to sleep. The thread then sleeps until another thread |
314 | invokes <code>condvar_broadcast</code> on the same condition variable or |
319 | invokes <code>condvar_broadcast</code> on the same condition variable or |
315 | until it is woken up by <code>condvar_signal</code>. The |
320 | until it is woken up by <code>condvar_signal</code>. The |
316 | <code>condvar_signal</code> operation unblocks the first thread blocking |
321 | <code>condvar_signal</code> operation unblocks the first thread blocking |
317 | on the condition variable while the <code>condvar_broadcast</code> |
322 | on the condition variable while the <code>condvar_broadcast</code> |
318 | operation unblocks all threads blocking there. If there are no blocking |
323 | operation unblocks all threads blocking there. If there are no blocking |
319 | threads, these two operations have no efect.</para> |
324 | threads, these two operations have no efect.</para> |
320 | 325 | ||
321 | <para>Note that the threads must synchronize over a dedicated mutex. To |
326 | <para>Note that the threads must synchronize over a dedicated mutex. To |
322 | prevent race condition between <code>condvar_wait_timeout</code> and |
327 | prevent race condition between <code>condvar_wait_timeout</code> and |
323 | <code>condvar_signal</code> or <code>condvar_broadcast</code>, the mutex |
328 | <code>condvar_signal</code> or <code>condvar_broadcast</code>, the mutex |
324 | is passed to <code>condvar_wait_timeout</code> which then atomically |
329 | is passed to <code>condvar_wait_timeout</code> which then atomically |
325 | puts the calling thread asleep and unlocks the mutex. When the thread |
330 | puts the calling thread asleep and unlocks the mutex. When the thread |
326 | eventually wakes up, <code>condvar_wait</code> regains the mutex and |
331 | eventually wakes up, <code>condvar_wait</code> regains the mutex and |
327 | returns.</para> |
332 | returns.</para> |
328 | 333 | ||
329 | <para>Also note, that there is no conditional operation for condition |
334 | <para>Also note, that there is no conditional operation for condition |
330 | variables. Such an operation would make no sence since condition |
335 | variables. Such an operation would make no sence since condition |
331 | variables are defined to forget events for which there is no waiting |
336 | variables are defined to forget events for which there is no waiting |
332 | thread and because <code>condvar_wait</code> must always go to sleep. |
337 | thread and because <code>condvar_wait</code> must always go to sleep. |
333 | The operation with timeout is supported as usually.</para> |
338 | The operation with timeout is supported as usually.</para> |
334 | 339 | ||
335 | <para>In HelenOS, condition variables are based on wait queues. As it is |
340 | <para>In HelenOS, condition variables are based on wait queues. As it is |
336 | already mentioned above, wait queues remember missed events while |
341 | already mentioned above, wait queues remember missed events while |
337 | condition variables must not do so. This is reasoned by the fact that |
342 | condition variables must not do so. This is reasoned by the fact that |
338 | condition variables are designed for scenarios in which an event might |
343 | condition variables are designed for scenarios in which an event might |
339 | occur very many times without being picked up by any waiting thread. On |
344 | occur very many times without being picked up by any waiting thread. On |
340 | the other hand, wait queues would remember any event that had not been |
345 | the other hand, wait queues would remember any event that had not been |
341 | picked up by a call to <code>waitq_sleep_timeout</code>. Therefore, if |
346 | picked up by a call to <code>waitq_sleep_timeout</code>. Therefore, if |
342 | wait queues were used directly and without any changes to implement |
347 | wait queues were used directly and without any changes to implement |
343 | condition variables, the missed_wakeup counter would hurt performance of |
348 | condition variables, the missed_wakeup counter would hurt performance of |
344 | the implementation: the <code>while</code> loop in |
349 | the implementation: the <code>while</code> loop in |
345 | <code>condvar_wait_timeout</code> would effectively do busy waiting |
350 | <code>condvar_wait_timeout</code> would effectively do busy waiting |
346 | until all missed wakeups were discarded.</para> |
351 | until all missed wakeups were discarded.</para> |
347 | 352 | ||
348 | <para>The requirement on the wait operation to atomically put the caller |
353 | <para>The requirement on the wait operation to atomically put the caller |
349 | to sleep and release the mutex poses an interesting problem on |
354 | to sleep and release the mutex poses an interesting problem on |
350 | <code>condvar_wait_timeout</code>. More precisely, the thread should |
355 | <code>condvar_wait_timeout</code>. More precisely, the thread should |
351 | sleep in the condvar's wait queue prior to releasing the mutex, but it |
356 | sleep in the condvar's wait queue prior to releasing the mutex, but it |
352 | must not hold the mutex when it is sleeping.</para> |
357 | must not hold the mutex when it is sleeping.</para> |
353 | 358 | ||
354 | <para>Problems described in the two previous paragraphs are addressed in |
359 | <para>Problems described in the two previous paragraphs are addressed in |
355 | HelenOS by dividing the <code>waitq_sleep_timeout</code> function into |
360 | HelenOS by dividing the <code>waitq_sleep_timeout</code> function into |
356 | three pieces:</para> |
361 | three pieces:</para> |
357 | 362 | ||
358 | <orderedlist> |
363 | <orderedlist> |
359 | <listitem> |
364 | <listitem> |
360 | <para><code>waitq_sleep_prepare</code> prepares the thread to go to |
365 | <para><code>waitq_sleep_prepare</code> prepares the thread to go to |
361 | sleep by, among other things, locking the wait queue;</para> |
366 | sleep by, among other things, locking the wait queue;</para> |
362 | </listitem> |
367 | </listitem> |
363 | 368 | ||
364 | <listitem> |
369 | <listitem> |
365 | <para><code>waitq_sleep_timeout_unsafe</code> implements the core |
370 | <para><code>waitq_sleep_timeout_unsafe</code> implements the core |
366 | blocking logic;</para> |
371 | blocking logic;</para> |
367 | </listitem> |
372 | </listitem> |
368 | 373 | ||
369 | <listitem> |
374 | <listitem> |
370 | <para><code>waitq_sleep_finish</code> performs cleanup after |
375 | <para><code>waitq_sleep_finish</code> performs cleanup after |
371 | <code>waitq_sleep_timeout_unsafe</code>; after this call, the wait |
376 | <code>waitq_sleep_timeout_unsafe</code>; after this call, the wait |
372 | queue spinlock is guaranteed to be unlocked by the caller</para> |
377 | queue spinlock is guaranteed to be unlocked by the caller</para> |
373 | </listitem> |
378 | </listitem> |
374 | </orderedlist> |
379 | </orderedlist> |
375 | 380 | ||
376 | <para>The stock <code>waitq_sleep_timeout</code> is then a mere wrapper |
381 | <para>The stock <code>waitq_sleep_timeout</code> is then a mere wrapper |
377 | that calls these three functions. It is provided for convenience in |
382 | that calls these three functions. It is provided for convenience in |
378 | cases where the caller doesn't require such a low level control. |
383 | cases where the caller doesn't require such a low level control. |
379 | However, the implementation of <code>condvar_wait_timeout</code> does |
384 | However, the implementation of <code>condvar_wait_timeout</code> does |
380 | need this finer-grained control because it has to interleave calls to |
385 | need this finer-grained control because it has to interleave calls to |
381 | these functions by other actions. It carries its operations out in the |
386 | these functions by other actions. It carries its operations out in the |
382 | following order:</para> |
387 | following order:</para> |
383 | 388 | ||
384 | <orderedlist> |
389 | <orderedlist> |
385 | <listitem> |
390 | <listitem> |
386 | <para>calls <code>waitq_sleep_prepare</code> in order to lock the |
391 | <para>calls <code>waitq_sleep_prepare</code> in order to lock the |
387 | condition variable's wait queue,</para> |
392 | condition variable's wait queue,</para> |
388 | </listitem> |
393 | </listitem> |
389 | 394 | ||
390 | <listitem> |
395 | <listitem> |
391 | <para>releases the mutex,</para> |
396 | <para>releases the mutex,</para> |
392 | </listitem> |
397 | </listitem> |
393 | 398 | ||
394 | <listitem> |
399 | <listitem> |
395 | <para>clears the counter of missed wakeups,</para> |
400 | <para>clears the counter of missed wakeups,</para> |
396 | </listitem> |
401 | </listitem> |
397 | 402 | ||
398 | <listitem> |
403 | <listitem> |
399 | <para>calls <code>waitq_sleep_timeout_unsafe</code>,</para> |
404 | <para>calls <code>waitq_sleep_timeout_unsafe</code>,</para> |
400 | </listitem> |
405 | </listitem> |
401 | 406 | ||
402 | <listitem> |
407 | <listitem> |
403 | <para>retakes the mutex,</para> |
408 | <para>retakes the mutex,</para> |
404 | </listitem> |
409 | </listitem> |
405 | 410 | ||
406 | <listitem> |
411 | <listitem> |
407 | <para>calls <code>waitq_sleep_finish</code>.</para> |
412 | <para>calls <code>waitq_sleep_finish</code>.</para> |
408 | </listitem> |
413 | </listitem> |
409 | </orderedlist> |
414 | </orderedlist> |
410 | </section> |
415 | </section> |
411 | </section> |
416 | </section> |
412 | 417 | ||
413 | <section> |
418 | <section> |
414 | <title>Userspace synchronization</title> |
419 | <title>Userspace synchronization</title> |
415 | 420 | ||
416 | <section> |
421 | <section> |
417 | <title>Futexes</title> |
422 | <title>Futexes</title> |
418 | 423 | ||
419 | <para></para> |
424 | <para></para> |
420 | </section> |
425 | </section> |
421 | </section> |
426 | </section> |
422 | </chapter> |
427 | </chapter> |