WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_synchronization.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
41	jermar	2	<chapter id="sync">
		3	<?dbhtml filename="sync.html"?>
9	bondari	4
41	jermar	5	<title>Mutual exclusion and synchronization</title>
9	bondari	6
41	jermar	7	<section>
		8	<title>Introduction</title>
9	bondari	9
45	jermar	10	<para>The HelenOS operating system is designed to make use of the
		11	parallelism offered by the hardware and to exploit concurrency of both the
		12	kernel and userspace tasks. This is achieved through multiprocessor
		13	support and several levels of multiprogramming such as multitasking,
		14	multithreading and also through userspace pseudo threads. However, such a
		15	highly concurrent environment needs safe and efficient ways to handle
		16	mutual exclusion and synchronization of many execution flows.</para>
41	jermar	17	</section>
		18
		19	<section>
		20	<title>Active kernel primitives</title>
		21
9	bondari	22	<section>
41	jermar	23	<title>Spinlocks</title>
9	bondari	24
45	jermar	25	<para>The basic mutual exclusion primitive is the spinlock. The spinlock
		26	implements active waiting for the availability of a memory lock (i.e.
41	jermar	27	simple variable) in a multiprocessor-safe manner. This safety is
		28	achieved through the use of a specialized, architecture-dependent,
		29	atomic test-and-set operation which either locks the spinlock (i.e. sets
		30	the variable) or, provided that it is already locked, leaves it
		31	unaltered. In any case, the test-and-set operation returns a value, thus
		32	signalling either success (i.e. zero return value) or failure (i.e.
45	jermar	33	non-zero value) in acquiring the lock. Note that this makes a
41	jermar	34	fundamental difference between the naive algorithm that doesn't use the
		35	atomic operation and the spinlock algortihm. While the naive algorithm
45	jermar	36	is prone to race conditions on SMP configurations and thus is completely
41	jermar	37	SMP-unsafe, the spinlock algorithm eliminates the possibility of race
		38	conditions and is suitable for mutual exclusion use.</para>
9	bondari	39
41	jermar	40	<para>The semantics of the test-and-set operation is that the spinlock
		41	remains unavailable until this operation called on the respective
45	jermar	42	spinlock returns zero. HelenOS builds two functions on top of the
		43	test-and-set operation. The first function is the unconditional attempt
57	jermar	44	to acquire the spinlock and is called <code>spinlock_lock</code>. It
		45	simply loops until the test-and-set returns a zero value. The other
		46	function, <code>spinlock_trylock</code>, is the conditional lock
		47	operation and calls the test-and-set only once to find out whether it
		48	managed to acquire the spinlock or not. The conditional operation is
		49	useful in situations in which an algorithm cannot acquire more spinlocks
		50	in the proper order and a deadlock cannot be avoided. In such a case,
		51	the algorithm would detect the danger and instead of possibly
		52	deadlocking the system it would simply release some spinlocks it already
		53	holds and retry the whole operation with the hope that it will succeed
		54	next time. The unlock function, <code>spinlock_unlock</code>, is quite
		55	easy - it merely clears the spinlock variable.</para>
9	bondari	56
41	jermar	57	<para>Nevertheless, there is a special issue related to hardware
45	jermar	58	optimizations that modern processors implement. Particularly problematic
		59	is the out-of-order execution of instructions within the critical
		60	section protected by a spinlock. The processors are always
41	jermar	61	self-consistent so that they can carry out speculatively executed
		62	instructions in the right order with regard to dependencies among those
		63	instructions. However, the dependency between instructions inside the
		64	critical section and those that implement locking and unlocking of the
45	jermar	65	respective spinlock is not implicit on some processor architectures. As
		66	a result, the processor needs to be explicitly told about each
		67	occurrence of such a dependency. Therefore, HelenOS adds
57	jermar	68	architecture-specific hooks to all <code>spinlock_lock</code>,
		69	<code>spinlock_trylock</code> and <code>spinlock_unlock</code> functions
		70	to prevent the instructions inside the critical section from permeating
		71	out. On some architectures, these hooks can be void because the
		72	dependencies are implicitly there because of the special properties of
		73	locking and unlocking instructions. However, other architectures need to
		74	instrument these hooks with different memory barriers, depending on what
		75	operations could permeate out.</para>
9	bondari	76
41	jermar	77	<para>Spinlocks have one significant drawback: when held for longer time
45	jermar	78	periods, they harm both parallelism and concurrency. The processor
57	jermar	79	executing <code>spinlock_lock</code> does not do any fruitful work and
		80	is effectively halted until it can grab the lock and proceed.
45	jermar	81	Similarily, other execution flows cannot execute on the processor that
		82	holds the spinlock because the kernel disables preemption on that
		83	processor when a spinlock is held. The reason behind disabling
		84	preemption is priority inversion problem avoidance. For the same reason,
		85	threads are strongly discouraged from sleeping when they hold a
		86	spinlock.</para>
9	bondari	87
41	jermar	88	<para>To summarize, spinlocks represent very simple and essential mutual
		89	exclusion primitive for SMP systems. On the other hand, spinlocks scale
		90	poorly because of the active loop they are based on. Therefore,
45	jermar	91	spinlocks are used in HelenOS only for short-time mutual exclusion and
41	jermar	92	in cases where the mutual exclusion is required out of thread context.
		93	Lastly, spinlocks are used in the construction of passive
		94	synchronization primitives.</para>
		95	</section>
		96	</section>
9	bondari	97
41	jermar	98	<section>
		99	<title>Passive kernel synchronization</title>
9	bondari	100
41	jermar	101	<section>
43	jermar	102	<title>Wait queues</title>
9	bondari	103
43	jermar	104	<para>A wait queue is the basic passive synchronization primitive on
45	jermar	105	which all other passive synchronization primitives are built. Simply
		106	put, it allows a thread to sleep until an event associated with the
		107	particular wait queue occurs. Multiple threads are notified about
		108	incoming events in a first come, first served fashion. Moreover, should
		109	the event come before any thread waits for it, it is recorded in the
		110	wait queue as a missed wakeup and later forwarded to the first thread
		111	that decides to wait in the queue. The inner structures of the wait
		112	queue are protected by a spinlock.</para>
43	jermar	113
		114	<para>The thread that wants to wait for a wait queue event uses the
57	jermar	115	<code>waitq_sleep_timeout</code> function. The algorithm then checks the
		116	wait queue's counter of missed wakeups and if there are any missed
		117	wakeups, the call returns immediately. The call also returns immediately
		118	if only a conditional wait was requested. Otherwise the thread is
		119	enqueued in the wait queue's list of sleeping threads and its state is
		120	changed to <constant>Sleeping</constant>. It then sleeps until one of
		121	the following events happens:</para>
43	jermar	122
		123	<orderedlist>
		124	<listitem>
57	jermar	125	<para>another thread calls <code>waitq_wakeup</code> and the thread
		126	is the first thread in the wait queue's list of sleeping
45	jermar	127	threads;</para>
43	jermar	128	</listitem>
		129
		130	<listitem>
57	jermar	131	<para>another thread calls <code>waitq_interrupt_sleep</code> on the
		132	sleeping thread;</para>
43	jermar	133	</listitem>
		134
		135	<listitem>
45	jermar	136	<para>the sleep times out provided that none of the previous
		137	occurred within a specified time limit; the limit can be
		138	infinity.</para>
43	jermar	139	</listitem>
		140	</orderedlist>
		141
		142	<para>All five possibilities (immediate return on success, immediate
		143	return on failure, wakeup after sleep, interruption and timeout) are
57	jermar	144	distinguishable by the return value of <code>waitq_sleep_timeout</code>.
		145	Being able to interrupt a sleeping thread is essential for externally
		146	initiated thread termination. The ability to wait only for a certain
		147	amount of time is used, for instance, to passively delay thread
		148	execution by several microseconds or even seconds in
		149	<code>thread_sleep</code> function. Due to the fact that all other
		150	passive kernel synchronization primitives are based on wait queues, they
		151	also have the option of being interrutped and, more importantly, can
		152	timeout. All of them also implement the conditional operation.
		153	Furthemore, this very fundamental interface reaches up to the
		154	implementation of futexes - userspace synchronization primitive, which
		155	makes it possible for a userspace thread to request a synchronization
		156	operation with a timeout or a conditional operation.</para>
43	jermar	157
		158	<para>From the description above, it should be apparent, that when a
57	jermar	159	sleeping thread is woken by <code>waitq_wakeup</code> or when
		160	<code>waitq_sleep_timeout</code> succeeds immediately, the thread can be
		161	sure that the event has occurred. The thread need not and should not
		162	verify this fact. This approach is called direct hand-off and is
		163	characteristic for all passive HelenOS synchronization primitives, with
		164	the exception as described below.</para>
41	jermar	165	</section>
9	bondari	166
41	jermar	167	<section>
		168	<title>Semaphores</title>
9	bondari	169
43	jermar	170	<para>The interesting point about wait queues is that the number of
		171	missed wakeups is equal to the number of threads that will not block in
57	jermar	172	<code>watiq_sleep_timeout</code> and would immediately succeed instead.
		173	On the other hand, semaphores are synchronization primitives that will
		174	let predefined amount of threads into their critical section and block
		175	any other threads above this count. However, these two cases are exactly
		176	the same. Semaphores in HelenOS are therefore implemented as wait queues
		177	with a single semantic change: their wait queue is initialized to have
		178	so many missed wakeups as is the number of threads that the semphore
		179	intends to let into its critical section simultaneously.</para>
43	jermar	180
		181	<para>In the semaphore language, the wait queue operation
57	jermar	182	<code>waitq_sleep_timeout</code> corresponds to semaphore
		183	<code>down</code> operation, represented by the function
		184	<code>semaphore_down_timeout</code> and by way of similitude the wait
		185	queue operation waitq_wakeup corresponds to semaphore <code>up</code>
		186	operation, represented by the function <code>sempafore_up</code>. The
		187	conditional down operation is called
		188	<code>semaphore_trydown</code>.</para>
41	jermar	189	</section>
9	bondari	190
41	jermar	191	<section>
43	jermar	192	<title>Mutexes</title>
9	bondari	193
45	jermar	194	<para>Mutexes are sometimes referred to as binary sempahores. That means
		195	that mutexes are like semaphores that allow only one thread in its
43	jermar	196	critical section. Indeed, mutexes in HelenOS are implemented exactly in
45	jermar	197	this way: they are built on top of semaphores. From another point of
		198	view, they can be viewed as spinlocks without busy waiting. Their
		199	semaphore heritage provides good basics for both conditional operation
		200	and operation with timeout. The locking operation is called
57	jermar	201	<code>mutex_lock</code>, the conditional locking operation is called
		202	<code>mutex_trylock</code> and the unlocking operation is called
		203	<code>mutex_unlock</code>.</para>
41	jermar	204	</section>
9	bondari	205
41	jermar	206	<section>
43	jermar	207	<title>Reader/writer locks</title>
9	bondari	208
43	jermar	209	<para>Reader/writer locks, or rwlocks, are by far the most complicated
		210	synchronization primitive within the kernel. The goal of these locks is
45	jermar	211	to improve concurrency of applications, in which threads need to
		212	synchronize access to a shared resource, and that access can be
43	jermar	213	partitioned into a read-only mode and a write mode. Reader/writer locks
		214	should make it possible for several, possibly many, readers to enter the
		215	critical section, provided that no writer is currently in the critical
		216	section, or to be in the critical section contemporarily. Writers are
		217	allowed to enter the critical section only individually, provided that
45	jermar	218	no reader is in the critical section already. Applications, in which the
		219	majority of operations can be done in the read-only mode, can benefit
43	jermar	220	from increased concurrency created by reader/writer locks.</para>
		221
45	jermar	222	<para>During reader/writer lock construction, a decision should be made
43	jermar	223	whether readers will be prefered over writers or whether writers will be
		224	prefered over readers in cases when the lock is not currently held and
		225	both a reader and a writer want to gain the lock. Some operating systems
		226	prefer one group over the other, creating thus a possibility for
		227	starving the unprefered group. In the HelenOS operating system, none of
45	jermar	228	the two groups is prefered. The lock is granted on a first come, first
43	jermar	229	served basis with the additional note that readers are granted the lock
45	jermar	230	in the biggest possible batch.</para>
43	jermar	231
		232	<para>With this policy and the timeout modes of operation, the direct
		233	hand-off becomes much more complicated. For instance, a writer leaving
		234	the critical section must wake up all leading readers in the rwlock's
		235	wait queue or one leading writer or no-one if no thread is waiting.
		236	Similarily, the last reader leaving the critical section must wakeup the
45	jermar	237	sleeping writer if there are any sleeping threads left at all. As
		238	another example, if a writer at the beginning of the rwlock's wait queue
		239	times out and the lock is held by at least one reader, the writer which
		240	has timed out must first wake up all readers that follow him in the
		241	queue prior to signalling the timeout itself and giving up.</para>
43	jermar	242
45	jermar	243	<para>Due to the issues mentioned in the previous paragraph, the
		244	reader/writer lock imlpementation needs to walk the rwlock wait queue's
		245	list of sleeping threads directly, in order to find out the type of
43	jermar	246	access that the queueing threads demand. This makes the code difficult
		247	to understand and dependent on the internal implementation of the wait
		248	queue. Nevertheless, it remains unclear to the authors of HelenOS
		249	whether a simpler but equivalently fair solution exists.</para>
		250
		251	<para>The implementation of rwlocks as it has been already put, makes
		252	use of one single wait queue for both readers and writers, thus avoiding
		253	any possibility of starvation. In fact, rwlocks use a mutex rather than
57	jermar	254	a bare wait queue. This mutex is called <code>exclusive</code> and is
		255	used to synchronize writers. The writer's lock operation,
		256	<code>rwlock_write_lock_timeout</code>, simply tries to acquire the
		257	exclusive mutex. If it succeeds, the writer is granted the rwlock.
44	jermar	258	However, if the operation fails (e.g. times out), the writer must check
		259	for potential readers at the head of the list of sleeping threads
45	jermar	260	associated with the mutex's wait queue and then proceed according to the
44	jermar	261	procedure outlined above.</para>
43	jermar	262
		263	<para>The exclusive mutex plays an important role in reader
		264	synchronization as well. However, a reader doing the reader's lock
57	jermar	265	operation, <code>rwlock_read_lock_timeout</code>, may bypass this mutex
		266	when it detects that:</para>
43	jermar	267
		268	<orderedlist>
		269	<listitem>
45	jermar	270	<para>there are other readers in the critical section and</para>
43	jermar	271	</listitem>
		272
		273	<listitem>
		274	<para>there are no sleeping threads waiting for the exclusive
45	jermar	275	mutex.</para>
43	jermar	276	</listitem>
		277	</orderedlist>
		278
		279	<para>If both conditions are true, the reader will bypass the mutex,
45	jermar	280	increment the number of readers in the critical section and then enter
		281	the critical section. Note that if there are any sleeping threads at the
		282	beginning of the wait queue, the first must be a writer. If the
43	jermar	283	conditions are not fulfilled, the reader normally waits until the
		284	exclusive mutex is granted to it.</para>
41	jermar	285	</section>
9	bondari	286
		287	<section>
41	jermar	288	<title>Condition variables</title>
9	bondari	289
48	jermar	290	<para>Condition variables can be used for waiting until a condition
		291	becomes true. In this respect, they are similar to wait queues. But
		292	contrary to wait queues, condition variables have different semantics
		293	that allows events to be lost when there is no thread waiting for them.
		294	In order to support this, condition variables don't use direct hand-off
		295	and operate in a way similar to the example below. A thread waiting for
		296	the condition becoming true does the following:</para>
		297
62	jermar	298	<example>
		299	<title>Use of <code>condvar_wait_timeout</code>.</title>
		300	<programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	301	while (!<varname>condition</varname>)
		302	<function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
		303	/* <remark>the condition is true, do something</remark> */
62	jermar	304	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
		305	</example>
48	jermar	306
		307	<para>A thread that causes the condition become true signals this event
		308	like this:</para>
		309
62	jermar	310	<example>
		311	<title>Use of <code>condvar_signal</code>.</title>
		312	<programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	313	<varname>condition</varname> = <constant>true</constant>;
		314	<function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */
62	jermar	315	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></example>
48	jermar	316
57	jermar	317	<para>The wait operation, <code>condvar_wait_timeout</code>, always puts
		318	the calling thread to sleep. The thread then sleeps until another thread
		319	invokes <code>condvar_broadcast</code> on the same condition variable or
		320	until it is woken up by <code>condvar_signal</code>. The
		321	<code>condvar_signal</code> operation unblocks the first thread blocking
		322	on the condition variable while the <code>condvar_broadcast</code>
		323	operation unblocks all threads blocking there. If there are no blocking
		324	threads, these two operations have no efect.</para>
48	jermar	325
		326	<para>Note that the threads must synchronize over a dedicated mutex. To
57	jermar	327	prevent race condition between <code>condvar_wait_timeout</code> and
		328	<code>condvar_signal</code> or <code>condvar_broadcast</code>, the mutex
		329	is passed to <code>condvar_wait_timeout</code> which then atomically
		330	puts the calling thread asleep and unlocks the mutex. When the thread
		331	eventually wakes up, <code>condvar_wait</code> regains the mutex and
48	jermar	332	returns.</para>
		333
		334	<para>Also note, that there is no conditional operation for condition
		335	variables. Such an operation would make no sence since condition
		336	variables are defined to forget events for which there is no waiting
57	jermar	337	thread and because <code>condvar_wait</code> must always go to sleep.
		338	The operation with timeout is supported as usually.</para>
48	jermar	339
		340	<para>In HelenOS, condition variables are based on wait queues. As it is
		341	already mentioned above, wait queues remember missed events while
		342	condition variables must not do so. This is reasoned by the fact that
		343	condition variables are designed for scenarios in which an event might
		344	occur very many times without being picked up by any waiting thread. On
		345	the other hand, wait queues would remember any event that had not been
57	jermar	346	picked up by a call to <code>waitq_sleep_timeout</code>. Therefore, if
		347	wait queues were used directly and without any changes to implement
		348	condition variables, the missed_wakeup counter would hurt performance of
		349	the implementation: the <code>while</code> loop in
		350	<code>condvar_wait_timeout</code> would effectively do busy waiting
		351	until all missed wakeups were discarded.</para>
48	jermar	352
		353	<para>The requirement on the wait operation to atomically put the caller
		354	to sleep and release the mutex poses an interesting problem on
57	jermar	355	<code>condvar_wait_timeout</code>. More precisely, the thread should
		356	sleep in the condvar's wait queue prior to releasing the mutex, but it
		357	must not hold the mutex when it is sleeping.</para>
48	jermar	358
		359	<para>Problems described in the two previous paragraphs are addressed in
57	jermar	360	HelenOS by dividing the <code>waitq_sleep_timeout</code> function into
		361	three pieces:</para>
48	jermar	362
		363	<orderedlist>
		364	<listitem>
57	jermar	365	<para><code>waitq_sleep_prepare</code> prepares the thread to go to
		366	sleep by, among other things, locking the wait queue;</para>
48	jermar	367	</listitem>
		368
		369	<listitem>
57	jermar	370	<para><code>waitq_sleep_timeout_unsafe</code> implements the core
		371	blocking logic;</para>
48	jermar	372	</listitem>
		373
		374	<listitem>
57	jermar	375	<para><code>waitq_sleep_finish</code> performs cleanup after
		376	<code>waitq_sleep_timeout_unsafe</code>; after this call, the wait
		377	queue spinlock is guaranteed to be unlocked by the caller</para>
48	jermar	378	</listitem>
		379	</orderedlist>
		380
57	jermar	381	<para>The stock <code>waitq_sleep_timeout</code> is then a mere wrapper
		382	that calls these three functions. It is provided for convenience in
		383	cases where the caller doesn't require such a low level control.
		384	However, the implementation of <code>condvar_wait_timeout</code> does
		385	need this finer-grained control because it has to interleave calls to
		386	these functions by other actions. It carries its operations out in the
		387	following order:</para>
48	jermar	388
		389	<orderedlist>
		390	<listitem>
57	jermar	391	<para>calls <code>waitq_sleep_prepare</code> in order to lock the
		392	condition variable's wait queue,</para>
48	jermar	393	</listitem>
		394
		395	<listitem>
		396	<para>releases the mutex,</para>
		397	</listitem>
		398
		399	<listitem>
		400	<para>clears the counter of missed wakeups,</para>
		401	</listitem>
		402
		403	<listitem>
57	jermar	404	<para>calls <code>waitq_sleep_timeout_unsafe</code>,</para>
48	jermar	405	</listitem>
		406
		407	<listitem>
		408	<para>retakes the mutex,</para>
		409	</listitem>
		410
		411	<listitem>
57	jermar	412	<para>calls <code>waitq_sleep_finish</code>.</para>
48	jermar	413	</listitem>
		414	</orderedlist>
9	bondari	415	</section>
41	jermar	416	</section>
9	bondari	417
41	jermar	418	<section>
		419	<title>Userspace synchronization</title>
9	bondari	420
41	jermar	421	<section>
		422	<title>Futexes</title>
		423
		424	<para></para>
		425	</section>
		426	</section>
		427	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_synchronization.xml @ 171 – Rev 62