Rev 44 | Rev 48 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
9 | bondari | 1 | <?xml version="1.0" encoding="UTF-8"?> |
41 | jermar | 2 | <chapter id="sync"> |
3 | <?dbhtml filename="sync.html"?> |
||
9 | bondari | 4 | |
41 | jermar | 5 | <title>Mutual exclusion and synchronization</title> |
9 | bondari | 6 | |
41 | jermar | 7 | <section> |
8 | <title>Introduction</title> |
||
9 | bondari | 9 | |
45 | jermar | 10 | <para>The HelenOS operating system is designed to make use of the |
11 | parallelism offered by the hardware and to exploit concurrency of both the |
||
12 | kernel and userspace tasks. This is achieved through multiprocessor |
||
13 | support and several levels of multiprogramming such as multitasking, |
||
14 | multithreading and also through userspace pseudo threads. However, such a |
||
15 | highly concurrent environment needs safe and efficient ways to handle |
||
16 | mutual exclusion and synchronization of many execution flows.</para> |
||
41 | jermar | 17 | </section> |
18 | |||
19 | <section> |
||
20 | <title>Active kernel primitives</title> |
||
21 | |||
9 | bondari | 22 | <section> |
41 | jermar | 23 | <title>Spinlocks</title> |
9 | bondari | 24 | |
45 | jermar | 25 | <para>The basic mutual exclusion primitive is the spinlock. The spinlock |
26 | implements active waiting for the availability of a memory lock (i.e. |
||
41 | jermar | 27 | simple variable) in a multiprocessor-safe manner. This safety is |
28 | achieved through the use of a specialized, architecture-dependent, |
||
29 | atomic test-and-set operation which either locks the spinlock (i.e. sets |
||
30 | the variable) or, provided that it is already locked, leaves it |
||
31 | unaltered. In any case, the test-and-set operation returns a value, thus |
||
32 | signalling either success (i.e. zero return value) or failure (i.e. |
||
45 | jermar | 33 | non-zero value) in acquiring the lock. Note that this makes a |
41 | jermar | 34 | fundamental difference between the naive algorithm that doesn't use the |
35 | atomic operation and the spinlock algortihm. While the naive algorithm |
||
45 | jermar | 36 | is prone to race conditions on SMP configurations and thus is completely |
41 | jermar | 37 | SMP-unsafe, the spinlock algorithm eliminates the possibility of race |
38 | conditions and is suitable for mutual exclusion use.</para> |
||
9 | bondari | 39 | |
41 | jermar | 40 | <para>The semantics of the test-and-set operation is that the spinlock |
41 | remains unavailable until this operation called on the respective |
||
45 | jermar | 42 | spinlock returns zero. HelenOS builds two functions on top of the |
43 | test-and-set operation. The first function is the unconditional attempt |
||
44 | to acquire the spinlock and is called |
||
45 | <emphasis>spinlock_lock</emphasis>. It simply loops until the |
||
46 | test-and-set returns a zero value. The other function, |
||
41 | jermar | 47 | <emphasis>spinlock_trylock</emphasis>, is the conditional lock operation |
45 | jermar | 48 | and calls the test-and-set only once to find out whether it managed to |
41 | jermar | 49 | acquire the spinlock or not. The conditional operation is useful in |
50 | situations in which an algorithm cannot acquire more spinlocks in the |
||
51 | proper order and a deadlock cannot be avoided. In such a case, the |
||
52 | algorithm would detect the danger and instead of possibly deadlocking |
||
53 | the system it would simply release some spinlocks it already holds and |
||
54 | retry the whole operation with the hope that it will succeed next time. |
||
45 | jermar | 55 | The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy |
56 | - it merely clears the spinlock variable.</para> |
||
9 | bondari | 57 | |
41 | jermar | 58 | <para>Nevertheless, there is a special issue related to hardware |
45 | jermar | 59 | optimizations that modern processors implement. Particularly problematic |
60 | is the out-of-order execution of instructions within the critical |
||
61 | section protected by a spinlock. The processors are always |
||
41 | jermar | 62 | self-consistent so that they can carry out speculatively executed |
63 | instructions in the right order with regard to dependencies among those |
||
64 | instructions. However, the dependency between instructions inside the |
||
65 | critical section and those that implement locking and unlocking of the |
||
45 | jermar | 66 | respective spinlock is not implicit on some processor architectures. As |
67 | a result, the processor needs to be explicitly told about each |
||
68 | occurrence of such a dependency. Therefore, HelenOS adds |
||
69 | architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>, |
||
41 | jermar | 70 | <emphasis>spinlock_trylock</emphasis> and |
45 | jermar | 71 | <emphasis>spinlock_unlock</emphasis> functions to prevent the |
72 | instructions inside the critical section from permeating out. On some |
||
73 | architectures, these hooks can be void because the dependencies are |
||
74 | implicitly there because of the special properties of locking and |
||
75 | unlocking instructions. However, other architectures need to instrument |
||
76 | these hooks with different memory barriers, depending on what operations |
||
77 | could permeate out.</para> |
||
9 | bondari | 78 | |
41 | jermar | 79 | <para>Spinlocks have one significant drawback: when held for longer time |
45 | jermar | 80 | periods, they harm both parallelism and concurrency. The processor |
81 | executing <emphasis>spinlock_lock</emphasis> does not do any fruitful |
||
82 | work and is effectively halted until it can grab the lock and proceed. |
||
83 | Similarily, other execution flows cannot execute on the processor that |
||
84 | holds the spinlock because the kernel disables preemption on that |
||
85 | processor when a spinlock is held. The reason behind disabling |
||
86 | preemption is priority inversion problem avoidance. For the same reason, |
||
87 | threads are strongly discouraged from sleeping when they hold a |
||
88 | spinlock.</para> |
||
9 | bondari | 89 | |
41 | jermar | 90 | <para>To summarize, spinlocks represent very simple and essential mutual |
91 | exclusion primitive for SMP systems. On the other hand, spinlocks scale |
||
92 | poorly because of the active loop they are based on. Therefore, |
||
45 | jermar | 93 | spinlocks are used in HelenOS only for short-time mutual exclusion and |
41 | jermar | 94 | in cases where the mutual exclusion is required out of thread context. |
95 | Lastly, spinlocks are used in the construction of passive |
||
96 | synchronization primitives.</para> |
||
97 | </section> |
||
98 | </section> |
||
9 | bondari | 99 | |
41 | jermar | 100 | <section> |
101 | <title>Passive kernel synchronization</title> |
||
9 | bondari | 102 | |
41 | jermar | 103 | <section> |
43 | jermar | 104 | <title>Wait queues</title> |
9 | bondari | 105 | |
43 | jermar | 106 | <para>A wait queue is the basic passive synchronization primitive on |
45 | jermar | 107 | which all other passive synchronization primitives are built. Simply |
108 | put, it allows a thread to sleep until an event associated with the |
||
109 | particular wait queue occurs. Multiple threads are notified about |
||
110 | incoming events in a first come, first served fashion. Moreover, should |
||
111 | the event come before any thread waits for it, it is recorded in the |
||
112 | wait queue as a missed wakeup and later forwarded to the first thread |
||
113 | that decides to wait in the queue. The inner structures of the wait |
||
114 | queue are protected by a spinlock.</para> |
||
43 | jermar | 115 | |
116 | <para>The thread that wants to wait for a wait queue event uses the |
||
117 | <emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then |
||
118 | checks the wait queue's counter of missed wakeups and if there are any |
||
119 | missed wakeups, the call returns immediately. The call also returns |
||
120 | immediately if only a conditional wait was requested. Otherwise the |
||
121 | thread is enqueued in the wait queue's list of sleeping threads and its |
||
122 | state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until |
||
123 | one of the following events happens:</para> |
||
124 | |||
125 | <orderedlist> |
||
126 | <listitem> |
||
127 | <para>another thread calls <emphasis>waitq_wakeup</emphasis> and the |
||
128 | thread is the first thread in the wait queue's list of sleeping |
||
45 | jermar | 129 | threads;</para> |
43 | jermar | 130 | </listitem> |
131 | |||
132 | <listitem> |
||
133 | <para>another thread calls |
||
134 | <emphasis>waitq_interrupt_sleep</emphasis> on the sleeping |
||
45 | jermar | 135 | thread;</para> |
43 | jermar | 136 | </listitem> |
137 | |||
138 | <listitem> |
||
45 | jermar | 139 | <para>the sleep times out provided that none of the previous |
140 | occurred within a specified time limit; the limit can be |
||
141 | infinity.</para> |
||
43 | jermar | 142 | </listitem> |
143 | </orderedlist> |
||
144 | |||
145 | <para>All five possibilities (immediate return on success, immediate |
||
146 | return on failure, wakeup after sleep, interruption and timeout) are |
||
147 | distinguishable by the return value of |
||
45 | jermar | 148 | <emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a |
149 | sleeping thread is essential for externally initiated thread |
||
150 | termination. The ability to wait only for a certain amount of time is |
||
151 | used, for instance, to passively delay thread execution by several |
||
152 | microseconds or even seconds in <emphasis>thread_sleep</emphasis> |
||
153 | function. Due to the fact that all other passive kernel synchronization |
||
154 | primitives are based on wait queues, they also have the option of being |
||
155 | interrutped and, more importantly, can timeout. All of them also |
||
156 | implement the conditional operation. Furthemore, this very fundamental |
||
157 | interface reaches up to the implementation of futexes - userspace |
||
158 | synchronization primitive, which makes it possible for a userspace |
||
159 | thread to request a synchronization operation with a timeout or a |
||
160 | conditional operation.</para> |
||
43 | jermar | 161 | |
162 | <para>From the description above, it should be apparent, that when a |
||
163 | sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when |
||
45 | jermar | 164 | <emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the |
165 | thread can be sure that the event has occurred. The thread need not and |
||
166 | should not verify this fact. This approach is called direct hand-off and |
||
167 | is characteristic for all passive HelenOS synchronization primitives, |
||
168 | with the exception as described below.</para> |
||
41 | jermar | 169 | </section> |
9 | bondari | 170 | |
41 | jermar | 171 | <section> |
172 | <title>Semaphores</title> |
||
9 | bondari | 173 | |
43 | jermar | 174 | <para>The interesting point about wait queues is that the number of |
175 | missed wakeups is equal to the number of threads that will not block in |
||
176 | <emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed |
||
177 | instead. On the other hand, semaphores are synchronization primitives |
||
45 | jermar | 178 | that will let predefined amount of threads into its critical section and |
43 | jermar | 179 | block any other threads above this count. However, these two cases are |
180 | exactly the same. Semaphores in HelenOS are therefore implemented as |
||
181 | wait queues with a single semantic change: their wait queue is |
||
182 | initialized to have so many missed wakeups as is the number of threads |
||
183 | that the semphore intends to let into its critical section |
||
184 | simultaneously.</para> |
||
185 | |||
186 | <para>In the semaphore language, the wait queue operation |
||
187 | <emphasis>waitq_sleep_timeout</emphasis> corresponds to |
||
188 | <emphasis><emphasis>semaphore</emphasis> down</emphasis> operation, |
||
189 | represented by the function <emphasis>semaphore_down_timeout</emphasis> |
||
190 | and by way of similitude the wait queue operation waitq_wakeup |
||
191 | corresponds to semaphore <emphasis>up</emphasis> operation, represented |
||
192 | by the function <emphasis>sempafore_up</emphasis>. The conditional down |
||
193 | operation is called <emphasis>semaphore_trydown</emphasis>.</para> |
||
41 | jermar | 194 | </section> |
9 | bondari | 195 | |
41 | jermar | 196 | <section> |
43 | jermar | 197 | <title>Mutexes</title> |
9 | bondari | 198 | |
45 | jermar | 199 | <para>Mutexes are sometimes referred to as binary sempahores. That means |
200 | that mutexes are like semaphores that allow only one thread in its |
||
43 | jermar | 201 | critical section. Indeed, mutexes in HelenOS are implemented exactly in |
45 | jermar | 202 | this way: they are built on top of semaphores. From another point of |
203 | view, they can be viewed as spinlocks without busy waiting. Their |
||
204 | semaphore heritage provides good basics for both conditional operation |
||
205 | and operation with timeout. The locking operation is called |
||
43 | jermar | 206 | <emphasis>mutex_lock</emphasis>, the conditional locking operation is |
207 | called <emphasis>mutex_trylock</emphasis> and the unlocking operation is |
||
208 | called <emphasis>mutex_unlock</emphasis>.</para> |
||
41 | jermar | 209 | </section> |
9 | bondari | 210 | |
41 | jermar | 211 | <section> |
43 | jermar | 212 | <title>Reader/writer locks</title> |
9 | bondari | 213 | |
43 | jermar | 214 | <para>Reader/writer locks, or rwlocks, are by far the most complicated |
215 | synchronization primitive within the kernel. The goal of these locks is |
||
45 | jermar | 216 | to improve concurrency of applications, in which threads need to |
217 | synchronize access to a shared resource, and that access can be |
||
43 | jermar | 218 | partitioned into a read-only mode and a write mode. Reader/writer locks |
219 | should make it possible for several, possibly many, readers to enter the |
||
220 | critical section, provided that no writer is currently in the critical |
||
221 | section, or to be in the critical section contemporarily. Writers are |
||
222 | allowed to enter the critical section only individually, provided that |
||
45 | jermar | 223 | no reader is in the critical section already. Applications, in which the |
224 | majority of operations can be done in the read-only mode, can benefit |
||
43 | jermar | 225 | from increased concurrency created by reader/writer locks.</para> |
226 | |||
45 | jermar | 227 | <para>During reader/writer lock construction, a decision should be made |
43 | jermar | 228 | whether readers will be prefered over writers or whether writers will be |
229 | prefered over readers in cases when the lock is not currently held and |
||
230 | both a reader and a writer want to gain the lock. Some operating systems |
||
231 | prefer one group over the other, creating thus a possibility for |
||
232 | starving the unprefered group. In the HelenOS operating system, none of |
||
45 | jermar | 233 | the two groups is prefered. The lock is granted on a first come, first |
43 | jermar | 234 | served basis with the additional note that readers are granted the lock |
45 | jermar | 235 | in the biggest possible batch.</para> |
43 | jermar | 236 | |
237 | <para>With this policy and the timeout modes of operation, the direct |
||
238 | hand-off becomes much more complicated. For instance, a writer leaving |
||
239 | the critical section must wake up all leading readers in the rwlock's |
||
240 | wait queue or one leading writer or no-one if no thread is waiting. |
||
241 | Similarily, the last reader leaving the critical section must wakeup the |
||
45 | jermar | 242 | sleeping writer if there are any sleeping threads left at all. As |
243 | another example, if a writer at the beginning of the rwlock's wait queue |
||
244 | times out and the lock is held by at least one reader, the writer which |
||
245 | has timed out must first wake up all readers that follow him in the |
||
246 | queue prior to signalling the timeout itself and giving up.</para> |
||
43 | jermar | 247 | |
45 | jermar | 248 | <para>Due to the issues mentioned in the previous paragraph, the |
249 | reader/writer lock imlpementation needs to walk the rwlock wait queue's |
||
250 | list of sleeping threads directly, in order to find out the type of |
||
43 | jermar | 251 | access that the queueing threads demand. This makes the code difficult |
252 | to understand and dependent on the internal implementation of the wait |
||
253 | queue. Nevertheless, it remains unclear to the authors of HelenOS |
||
254 | whether a simpler but equivalently fair solution exists.</para> |
||
255 | |||
256 | <para>The implementation of rwlocks as it has been already put, makes |
||
257 | use of one single wait queue for both readers and writers, thus avoiding |
||
258 | any possibility of starvation. In fact, rwlocks use a mutex rather than |
||
259 | a bare wait queue. This mutex is called <emphasis>exclusive</emphasis> |
||
260 | and is used to synchronize writers. The writer's lock operation, |
||
261 | <emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire |
||
262 | the exclusive mutex. If it succeeds, the writer is granted the rwlock. |
||
44 | jermar | 263 | However, if the operation fails (e.g. times out), the writer must check |
264 | for potential readers at the head of the list of sleeping threads |
||
45 | jermar | 265 | associated with the mutex's wait queue and then proceed according to the |
44 | jermar | 266 | procedure outlined above.</para> |
43 | jermar | 267 | |
268 | <para>The exclusive mutex plays an important role in reader |
||
269 | synchronization as well. However, a reader doing the reader's lock |
||
270 | operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass |
||
271 | this mutex when it detects that:</para> |
||
272 | |||
273 | <orderedlist> |
||
274 | <listitem> |
||
45 | jermar | 275 | <para>there are other readers in the critical section and</para> |
43 | jermar | 276 | </listitem> |
277 | |||
278 | <listitem> |
||
279 | <para>there are no sleeping threads waiting for the exclusive |
||
45 | jermar | 280 | mutex.</para> |
43 | jermar | 281 | </listitem> |
282 | </orderedlist> |
||
283 | |||
284 | <para>If both conditions are true, the reader will bypass the mutex, |
||
45 | jermar | 285 | increment the number of readers in the critical section and then enter |
286 | the critical section. Note that if there are any sleeping threads at the |
||
287 | beginning of the wait queue, the first must be a writer. If the |
||
43 | jermar | 288 | conditions are not fulfilled, the reader normally waits until the |
289 | exclusive mutex is granted to it.</para> |
||
41 | jermar | 290 | </section> |
9 | bondari | 291 | |
292 | <section> |
||
41 | jermar | 293 | <title>Condition variables</title> |
9 | bondari | 294 | |
41 | jermar | 295 | <para>Condvars explanation</para> |
9 | bondari | 296 | </section> |
41 | jermar | 297 | </section> |
9 | bondari | 298 | |
41 | jermar | 299 | <section> |
300 | <title>Userspace synchronization</title> |
||
9 | bondari | 301 | |
41 | jermar | 302 | <section> |
303 | <title>Futexes</title> |
||
304 | |||
305 | <para></para> |
||
306 | </section> |
||
307 | </section> |
||
308 | </chapter> |