WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_synchronization.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
41	jermar	2	<chapter id="sync">
		3	<?dbhtml filename="sync.html"?>
9	bondari	4
41	jermar	5	<title>Mutual exclusion and synchronization</title>
9	bondari	6
41	jermar	7	<section>
		8	<title>Introduction</title>
9	bondari	9
45	jermar	10	<para>The HelenOS operating system is designed to make use of the
		11	parallelism offered by the hardware and to exploit concurrency of both the
		12	kernel and userspace tasks. This is achieved through multiprocessor
		13	support and several levels of multiprogramming such as multitasking,
		14	multithreading and also through userspace pseudo threads. However, such a
		15	highly concurrent environment needs safe and efficient ways to handle
		16	mutual exclusion and synchronization of many execution flows.</para>
41	jermar	17	</section>
		18
		19	<section>
		20	<title>Active kernel primitives</title>
		21
9	bondari	22	<section>
41	jermar	23	<title>Spinlocks</title>
9	bondari	24
45	jermar	25	<para>The basic mutual exclusion primitive is the spinlock. The spinlock
		26	implements active waiting for the availability of a memory lock (i.e.
41	jermar	27	simple variable) in a multiprocessor-safe manner. This safety is
		28	achieved through the use of a specialized, architecture-dependent,
		29	atomic test-and-set operation which either locks the spinlock (i.e. sets
		30	the variable) or, provided that it is already locked, leaves it
		31	unaltered. In any case, the test-and-set operation returns a value, thus
		32	signalling either success (i.e. zero return value) or failure (i.e.
45	jermar	33	non-zero value) in acquiring the lock. Note that this makes a
41	jermar	34	fundamental difference between the naive algorithm that doesn't use the
		35	atomic operation and the spinlock algortihm. While the naive algorithm
45	jermar	36	is prone to race conditions on SMP configurations and thus is completely
41	jermar	37	SMP-unsafe, the spinlock algorithm eliminates the possibility of race
		38	conditions and is suitable for mutual exclusion use.</para>
9	bondari	39
41	jermar	40	<para>The semantics of the test-and-set operation is that the spinlock
		41	remains unavailable until this operation called on the respective
45	jermar	42	spinlock returns zero. HelenOS builds two functions on top of the
		43	test-and-set operation. The first function is the unconditional attempt
		44	to acquire the spinlock and is called
		45	<emphasis>spinlock_lock</emphasis>. It simply loops until the
		46	test-and-set returns a zero value. The other function,
41	jermar	47	<emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
45	jermar	48	and calls the test-and-set only once to find out whether it managed to
41	jermar	49	acquire the spinlock or not. The conditional operation is useful in
		50	situations in which an algorithm cannot acquire more spinlocks in the
		51	proper order and a deadlock cannot be avoided. In such a case, the
		52	algorithm would detect the danger and instead of possibly deadlocking
		53	the system it would simply release some spinlocks it already holds and
		54	retry the whole operation with the hope that it will succeed next time.
45	jermar	55	The unlock function, <emphasis>spinlock_unlock</emphasis>, is quite easy
		56	- it merely clears the spinlock variable.</para>
9	bondari	57
41	jermar	58	<para>Nevertheless, there is a special issue related to hardware
45	jermar	59	optimizations that modern processors implement. Particularly problematic
		60	is the out-of-order execution of instructions within the critical
		61	section protected by a spinlock. The processors are always
41	jermar	62	self-consistent so that they can carry out speculatively executed
		63	instructions in the right order with regard to dependencies among those
		64	instructions. However, the dependency between instructions inside the
		65	critical section and those that implement locking and unlocking of the
45	jermar	66	respective spinlock is not implicit on some processor architectures. As
		67	a result, the processor needs to be explicitly told about each
		68	occurrence of such a dependency. Therefore, HelenOS adds
		69	architecture-specific hooks to all <emphasis>spinlock_lock</emphasis>,
41	jermar	70	<emphasis>spinlock_trylock</emphasis> and
45	jermar	71	<emphasis>spinlock_unlock</emphasis> functions to prevent the
		72	instructions inside the critical section from permeating out. On some
		73	architectures, these hooks can be void because the dependencies are
		74	implicitly there because of the special properties of locking and
		75	unlocking instructions. However, other architectures need to instrument
		76	these hooks with different memory barriers, depending on what operations
		77	could permeate out.</para>
9	bondari	78
41	jermar	79	<para>Spinlocks have one significant drawback: when held for longer time
45	jermar	80	periods, they harm both parallelism and concurrency. The processor
		81	executing <emphasis>spinlock_lock</emphasis> does not do any fruitful
		82	work and is effectively halted until it can grab the lock and proceed.
		83	Similarily, other execution flows cannot execute on the processor that
		84	holds the spinlock because the kernel disables preemption on that
		85	processor when a spinlock is held. The reason behind disabling
		86	preemption is priority inversion problem avoidance. For the same reason,
		87	threads are strongly discouraged from sleeping when they hold a
		88	spinlock.</para>
9	bondari	89
41	jermar	90	<para>To summarize, spinlocks represent very simple and essential mutual
		91	exclusion primitive for SMP systems. On the other hand, spinlocks scale
		92	poorly because of the active loop they are based on. Therefore,
45	jermar	93	spinlocks are used in HelenOS only for short-time mutual exclusion and
41	jermar	94	in cases where the mutual exclusion is required out of thread context.
		95	Lastly, spinlocks are used in the construction of passive
		96	synchronization primitives.</para>
		97	</section>
		98	</section>
9	bondari	99
41	jermar	100	<section>
		101	<title>Passive kernel synchronization</title>
9	bondari	102
41	jermar	103	<section>
43	jermar	104	<title>Wait queues</title>
9	bondari	105
43	jermar	106	<para>A wait queue is the basic passive synchronization primitive on
45	jermar	107	which all other passive synchronization primitives are built. Simply
		108	put, it allows a thread to sleep until an event associated with the
		109	particular wait queue occurs. Multiple threads are notified about
		110	incoming events in a first come, first served fashion. Moreover, should
		111	the event come before any thread waits for it, it is recorded in the
		112	wait queue as a missed wakeup and later forwarded to the first thread
		113	that decides to wait in the queue. The inner structures of the wait
		114	queue are protected by a spinlock.</para>
43	jermar	115
		116	<para>The thread that wants to wait for a wait queue event uses the
		117	<emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
		118	checks the wait queue's counter of missed wakeups and if there are any
		119	missed wakeups, the call returns immediately. The call also returns
		120	immediately if only a conditional wait was requested. Otherwise the
		121	thread is enqueued in the wait queue's list of sleeping threads and its
		122	state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
		123	one of the following events happens:</para>
		124
		125	<orderedlist>
		126	<listitem>
		127	<para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
		128	thread is the first thread in the wait queue's list of sleeping
45	jermar	129	threads;</para>
43	jermar	130	</listitem>
		131
		132	<listitem>
		133	<para>another thread calls
		134	<emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
45	jermar	135	thread;</para>
43	jermar	136	</listitem>
		137
		138	<listitem>
45	jermar	139	<para>the sleep times out provided that none of the previous
		140	occurred within a specified time limit; the limit can be
		141	infinity.</para>
43	jermar	142	</listitem>
		143	</orderedlist>
		144
		145	<para>All five possibilities (immediate return on success, immediate
		146	return on failure, wakeup after sleep, interruption and timeout) are
		147	distinguishable by the return value of
45	jermar	148	<emphasis>waitq_sleep_timeout</emphasis>. Being able to interrupt a
		149	sleeping thread is essential for externally initiated thread
		150	termination. The ability to wait only for a certain amount of time is
		151	used, for instance, to passively delay thread execution by several
		152	microseconds or even seconds in <emphasis>thread_sleep</emphasis>
		153	function. Due to the fact that all other passive kernel synchronization
		154	primitives are based on wait queues, they also have the option of being
		155	interrutped and, more importantly, can timeout. All of them also
		156	implement the conditional operation. Furthemore, this very fundamental
		157	interface reaches up to the implementation of futexes - userspace
		158	synchronization primitive, which makes it possible for a userspace
		159	thread to request a synchronization operation with a timeout or a
		160	conditional operation.</para>
43	jermar	161
		162	<para>From the description above, it should be apparent, that when a
		163	sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
45	jermar	164	<emphasis>waitq_sleep_timeout</emphasis> succeeds immediately, the
		165	thread can be sure that the event has occurred. The thread need not and
		166	should not verify this fact. This approach is called direct hand-off and
		167	is characteristic for all passive HelenOS synchronization primitives,
		168	with the exception as described below.</para>
41	jermar	169	</section>
9	bondari	170
41	jermar	171	<section>
		172	<title>Semaphores</title>
9	bondari	173
43	jermar	174	<para>The interesting point about wait queues is that the number of
		175	missed wakeups is equal to the number of threads that will not block in
		176	<emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
		177	instead. On the other hand, semaphores are synchronization primitives
48	jermar	178	that will let predefined amount of threads into their critical section
		179	and block any other threads above this count. However, these two cases
		180	are exactly the same. Semaphores in HelenOS are therefore implemented as
43	jermar	181	wait queues with a single semantic change: their wait queue is
		182	initialized to have so many missed wakeups as is the number of threads
		183	that the semphore intends to let into its critical section
		184	simultaneously.</para>
		185
		186	<para>In the semaphore language, the wait queue operation
		187	<emphasis>waitq_sleep_timeout</emphasis> corresponds to
		188	<emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
		189	represented by the function <emphasis>semaphore_down_timeout</emphasis>
		190	and by way of similitude the wait queue operation waitq_wakeup
		191	corresponds to semaphore <emphasis>up</emphasis> operation, represented
		192	by the function <emphasis>sempafore_up</emphasis>. The conditional down
		193	operation is called <emphasis>semaphore_trydown</emphasis>.</para>
41	jermar	194	</section>
9	bondari	195
41	jermar	196	<section>
43	jermar	197	<title>Mutexes</title>
9	bondari	198
45	jermar	199	<para>Mutexes are sometimes referred to as binary sempahores. That means
		200	that mutexes are like semaphores that allow only one thread in its
43	jermar	201	critical section. Indeed, mutexes in HelenOS are implemented exactly in
45	jermar	202	this way: they are built on top of semaphores. From another point of
		203	view, they can be viewed as spinlocks without busy waiting. Their
		204	semaphore heritage provides good basics for both conditional operation
		205	and operation with timeout. The locking operation is called
43	jermar	206	<emphasis>mutex_lock</emphasis>, the conditional locking operation is
		207	called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
		208	called <emphasis>mutex_unlock</emphasis>.</para>
41	jermar	209	</section>
9	bondari	210
41	jermar	211	<section>
43	jermar	212	<title>Reader/writer locks</title>
9	bondari	213
43	jermar	214	<para>Reader/writer locks, or rwlocks, are by far the most complicated
		215	synchronization primitive within the kernel. The goal of these locks is
45	jermar	216	to improve concurrency of applications, in which threads need to
		217	synchronize access to a shared resource, and that access can be
43	jermar	218	partitioned into a read-only mode and a write mode. Reader/writer locks
		219	should make it possible for several, possibly many, readers to enter the
		220	critical section, provided that no writer is currently in the critical
		221	section, or to be in the critical section contemporarily. Writers are
		222	allowed to enter the critical section only individually, provided that
45	jermar	223	no reader is in the critical section already. Applications, in which the
		224	majority of operations can be done in the read-only mode, can benefit
43	jermar	225	from increased concurrency created by reader/writer locks.</para>
		226
45	jermar	227	<para>During reader/writer lock construction, a decision should be made
43	jermar	228	whether readers will be prefered over writers or whether writers will be
		229	prefered over readers in cases when the lock is not currently held and
		230	both a reader and a writer want to gain the lock. Some operating systems
		231	prefer one group over the other, creating thus a possibility for
		232	starving the unprefered group. In the HelenOS operating system, none of
45	jermar	233	the two groups is prefered. The lock is granted on a first come, first
43	jermar	234	served basis with the additional note that readers are granted the lock
45	jermar	235	in the biggest possible batch.</para>
43	jermar	236
		237	<para>With this policy and the timeout modes of operation, the direct
		238	hand-off becomes much more complicated. For instance, a writer leaving
		239	the critical section must wake up all leading readers in the rwlock's
		240	wait queue or one leading writer or no-one if no thread is waiting.
		241	Similarily, the last reader leaving the critical section must wakeup the
45	jermar	242	sleeping writer if there are any sleeping threads left at all. As
		243	another example, if a writer at the beginning of the rwlock's wait queue
		244	times out and the lock is held by at least one reader, the writer which
		245	has timed out must first wake up all readers that follow him in the
		246	queue prior to signalling the timeout itself and giving up.</para>
43	jermar	247
45	jermar	248	<para>Due to the issues mentioned in the previous paragraph, the
		249	reader/writer lock imlpementation needs to walk the rwlock wait queue's
		250	list of sleeping threads directly, in order to find out the type of
43	jermar	251	access that the queueing threads demand. This makes the code difficult
		252	to understand and dependent on the internal implementation of the wait
		253	queue. Nevertheless, it remains unclear to the authors of HelenOS
		254	whether a simpler but equivalently fair solution exists.</para>
		255
		256	<para>The implementation of rwlocks as it has been already put, makes
		257	use of one single wait queue for both readers and writers, thus avoiding
		258	any possibility of starvation. In fact, rwlocks use a mutex rather than
		259	a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
		260	and is used to synchronize writers. The writer's lock operation,
		261	<emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
		262	the exclusive mutex. If it succeeds, the writer is granted the rwlock.
44	jermar	263	However, if the operation fails (e.g. times out), the writer must check
		264	for potential readers at the head of the list of sleeping threads
45	jermar	265	associated with the mutex's wait queue and then proceed according to the
44	jermar	266	procedure outlined above.</para>
43	jermar	267
		268	<para>The exclusive mutex plays an important role in reader
		269	synchronization as well. However, a reader doing the reader's lock
		270	operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
		271	this mutex when it detects that:</para>
		272
		273	<orderedlist>
		274	<listitem>
45	jermar	275	<para>there are other readers in the critical section and</para>
43	jermar	276	</listitem>
		277
		278	<listitem>
		279	<para>there are no sleeping threads waiting for the exclusive
45	jermar	280	mutex.</para>
43	jermar	281	</listitem>
		282	</orderedlist>
		283
		284	<para>If both conditions are true, the reader will bypass the mutex,
45	jermar	285	increment the number of readers in the critical section and then enter
		286	the critical section. Note that if there are any sleeping threads at the
		287	beginning of the wait queue, the first must be a writer. If the
43	jermar	288	conditions are not fulfilled, the reader normally waits until the
		289	exclusive mutex is granted to it.</para>
41	jermar	290	</section>
9	bondari	291
		292	<section>
41	jermar	293	<title>Condition variables</title>
9	bondari	294
48	jermar	295	<para>Condition variables can be used for waiting until a condition
		296	becomes true. In this respect, they are similar to wait queues. But
		297	contrary to wait queues, condition variables have different semantics
		298	that allows events to be lost when there is no thread waiting for them.
		299	In order to support this, condition variables don't use direct hand-off
		300	and operate in a way similar to the example below. A thread waiting for
		301	the condition becoming true does the following:</para>
		302
		303	<para><programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
		304	while (!<varname>condition</varname>)
		305	<function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
		306	/* <remark>the condition is true, do something</remark> */
		307	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
		308
		309	<para>A thread that causes the condition become true signals this event
		310	like this:</para>
		311
		312	<para><programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
		313	<varname>condition</varname> = <constant>true</constant>;
		314	<function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */
		315	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting></para>
		316
		317	<para>The wait operation, <emphasis>condvar_wait_timeout</emphasis>,
		318	always puts the calling thread to sleep. The thread then sleeps until
		319	another thread invokes <emphasis>condvar_broadcast</emphasis> on the
		320	same condition variable or until it is woken up by
		321	<emphasis>condvar_signal</emphasis>. The
		322	<emphasis>condvar_signal</emphasis> operation unblocks the first thread
		323	blocking on the condition variable while the
		324	<emphasis>condvar_broadcast</emphasis> operation unblocks all threads
		325	blocking there. If there are no blocking threads, these two operations
		326	have no efect.</para>
		327
		328	<para>Note that the threads must synchronize over a dedicated mutex. To
		329	prevent race condition between <emphasis>condvar_wait_timeout</emphasis>
		330	and <emphasis>condvar_signal</emphasis> or
		331	<emphasis>condvar_broadcast</emphasis>, the mutex is passed to
		332	<emphasis>condvar_wait_timeout</emphasis> which then atomically puts the
		333	calling thread asleep and unlocks the mutex. When the thread eventually
		334	wakes up, <emphasis>condvar_wait</emphasis> regains the mutex and
		335	returns.</para>
		336
		337	<para>Also note, that there is no conditional operation for condition
		338	variables. Such an operation would make no sence since condition
		339	variables are defined to forget events for which there is no waiting
		340	thread and because <emphasis>condvar_wait</emphasis> must always go to
		341	sleep. The operation with timeout is supported as usually.</para>
		342
		343	<para>In HelenOS, condition variables are based on wait queues. As it is
		344	already mentioned above, wait queues remember missed events while
		345	condition variables must not do so. This is reasoned by the fact that
		346	condition variables are designed for scenarios in which an event might
		347	occur very many times without being picked up by any waiting thread. On
		348	the other hand, wait queues would remember any event that had not been
		349	picked up by a call to <emphasis>waitq_sleep_timeout</emphasis>.
		350	Therefore, if wait queues were used directly and without any changes to
		351	implement condition variables, the missed_wakeup counter would hurt
		352	performance of the implementation: the <code>while</code> loop in
		353	<emphasis>condvar_wait_timeout</emphasis> would effectively do busy
		354	waiting until all missed wakeups were discarded.</para>
		355
		356	<para>The requirement on the wait operation to atomically put the caller
		357	to sleep and release the mutex poses an interesting problem on
		358	<emphasis>condvar_wait_timeout</emphasis>. More precisely, the thread
		359	should sleep in the condvar's wait queue prior to releasing the mutex,
		360	but it must not hold the mutex when it is sleeping.</para>
		361
		362	<para>Problems described in the two previous paragraphs are addressed in
		363	HelenOS by dividing the <emphasis>waitq_sleep_timeout</emphasis>
		364	function into three pieces:</para>
		365
		366	<orderedlist>
		367	<listitem>
		368	<para><emphasis>waitq_sleep_prepare</emphasis> prepares the thread
		369	to go to sleep by, among other things, locking the wait
		370	queue;</para>
		371	</listitem>
		372
		373	<listitem>
		374	<para><emphasis>waitq_sleep_timeout_unsafe</emphasis> implements the
		375	core blocking logic;</para>
		376	</listitem>
		377
		378	<listitem>
		379	<para><emphasis>waitq_sleep_finish</emphasis> performs cleanup after
		380	<emphasis>waitq_sleep_timeout_unsafe</emphasis>; after this call,
		381	the wait queue spinlock is guaranteed to be unlocked by the
		382	caller</para>
		383	</listitem>
		384	</orderedlist>
		385
		386	<para>The stock <emphasis>waitq_sleep_timeout</emphasis> is then a mere
		387	wrapper that calls these three functions. It is provided for convenience
		388	in cases where the caller doesn't require such a low level control.
		389	However, the implementation of <emphasis>condvar_wait_timeout</emphasis>
		390	does need this finer-grained control because it has to interleave calls
		391	to these functions by other actions. It carries its operations out in
		392	the following order:</para>
		393
		394	<orderedlist>
		395	<listitem>
		396	<para>calls <emphasis>waitq_sleep_prepare</emphasis> in order to
		397	lock the condition variable's wait queue,</para>
		398	</listitem>
		399
		400	<listitem>
		401	<para>releases the mutex,</para>
		402	</listitem>
		403
		404	<listitem>
		405	<para>clears the counter of missed wakeups,</para>
		406	</listitem>
		407
		408	<listitem>
		409	<para>calls <emphasis>waitq_sleep_timeout_unsafe</emphasis>,</para>
		410	</listitem>
		411
		412	<listitem>
		413	<para>retakes the mutex,</para>
		414	</listitem>
		415
		416	<listitem>
		417	<para>calls <emphasis>waitq_sleep_finish</emphasis>.</para>
		418	</listitem>
		419	</orderedlist>
9	bondari	420	</section>
41	jermar	421	</section>
9	bondari	422
41	jermar	423	<section>
		424	<title>Userspace synchronization</title>
9	bondari	425
41	jermar	426	<section>
		427	<title>Futexes</title>
		428
		429	<para></para>
		430	</section>
		431	</section>
		432	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_synchronization.xml @ 171 – Rev 48