WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_synchronization.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
41	jermar	2	<chapter id="sync">
		3	<?dbhtml filename="sync.html"?>
9	bondari	4
82	jermar	5	<title>Synchronization</title>
9	bondari	6
41	jermar	7	<section>
		8	<title>Introduction</title>
9	bondari	9
45	jermar	10	<para>The HelenOS operating system is designed to make use of the
		11	parallelism offered by the hardware and to exploit concurrency of both the
		12	kernel and userspace tasks. This is achieved through multiprocessor
		13	support and several levels of multiprogramming such as multitasking,
		14	multithreading and also through userspace pseudo threads. However, such a
		15	highly concurrent environment needs safe and efficient ways to handle
		16	mutual exclusion and synchronization of many execution flows.</para>
41	jermar	17	</section>
		18
		19	<section>
		20	<title>Active kernel primitives</title>
		21
9	bondari	22	<section>
72	bondari	23	<indexterm>
		24	<primary>synchronization</primary>
		25
		26	<secondary>- spinlock</secondary>
		27	</indexterm>
		28
41	jermar	29	<title>Spinlocks</title>
9	bondari	30
45	jermar	31	<para>The basic mutual exclusion primitive is the spinlock. The spinlock
		32	implements active waiting for the availability of a memory lock (i.e.
41	jermar	33	simple variable) in a multiprocessor-safe manner. This safety is
		34	achieved through the use of a specialized, architecture-dependent,
		35	atomic test-and-set operation which either locks the spinlock (i.e. sets
		36	the variable) or, provided that it is already locked, leaves it
		37	unaltered. In any case, the test-and-set operation returns a value, thus
		38	signalling either success (i.e. zero return value) or failure (i.e.
45	jermar	39	non-zero value) in acquiring the lock. Note that this makes a
41	jermar	40	fundamental difference between the naive algorithm that doesn't use the
		41	atomic operation and the spinlock algortihm. While the naive algorithm
45	jermar	42	is prone to race conditions on SMP configurations and thus is completely
41	jermar	43	SMP-unsafe, the spinlock algorithm eliminates the possibility of race
		44	conditions and is suitable for mutual exclusion use.</para>
9	bondari	45
41	jermar	46	<para>The semantics of the test-and-set operation is that the spinlock
		47	remains unavailable until this operation called on the respective
45	jermar	48	spinlock returns zero. HelenOS builds two functions on top of the
		49	test-and-set operation. The first function is the unconditional attempt
86	bondari	50	to acquire the spinlock and is called <code>spinlock_lock()</code>. It
57	jermar	51	simply loops until the test-and-set returns a zero value. The other
86	bondari	52	function, <code>spinlock_trylock()</code>, is the conditional lock
57	jermar	53	operation and calls the test-and-set only once to find out whether it
		54	managed to acquire the spinlock or not. The conditional operation is
		55	useful in situations in which an algorithm cannot acquire more spinlocks
		56	in the proper order and a deadlock cannot be avoided. In such a case,
		57	the algorithm would detect the danger and instead of possibly
		58	deadlocking the system it would simply release some spinlocks it already
		59	holds and retry the whole operation with the hope that it will succeed
86	bondari	60	next time. The unlock function, <code>spinlock_unlock()</code>, is quite
57	jermar	61	easy - it merely clears the spinlock variable.</para>
9	bondari	62
41	jermar	63	<para>Nevertheless, there is a special issue related to hardware
45	jermar	64	optimizations that modern processors implement. Particularly problematic
		65	is the out-of-order execution of instructions within the critical
		66	section protected by a spinlock. The processors are always
41	jermar	67	self-consistent so that they can carry out speculatively executed
		68	instructions in the right order with regard to dependencies among those
		69	instructions. However, the dependency between instructions inside the
		70	critical section and those that implement locking and unlocking of the
45	jermar	71	respective spinlock is not implicit on some processor architectures. As
		72	a result, the processor needs to be explicitly told about each
		73	occurrence of such a dependency. Therefore, HelenOS adds
86	bondari	74	architecture-specific hooks to all <code>spinlock_lock()</code>,
		75	<code>spinlock_trylock()</code> and <code>spinlock_unlock()</code> functions
57	jermar	76	to prevent the instructions inside the critical section from permeating
		77	out. On some architectures, these hooks can be void because the
		78	dependencies are implicitly there because of the special properties of
		79	locking and unlocking instructions. However, other architectures need to
		80	instrument these hooks with different memory barriers, depending on what
		81	operations could permeate out.</para>
9	bondari	82
41	jermar	83	<para>Spinlocks have one significant drawback: when held for longer time
45	jermar	84	periods, they harm both parallelism and concurrency. The processor
86	bondari	85	executing <code>spinlock_lock()</code> does not do any fruitful work and
57	jermar	86	is effectively halted until it can grab the lock and proceed.
45	jermar	87	Similarily, other execution flows cannot execute on the processor that
		88	holds the spinlock because the kernel disables preemption on that
		89	processor when a spinlock is held. The reason behind disabling
		90	preemption is priority inversion problem avoidance. For the same reason,
		91	threads are strongly discouraged from sleeping when they hold a
		92	spinlock.</para>
9	bondari	93
41	jermar	94	<para>To summarize, spinlocks represent very simple and essential mutual
		95	exclusion primitive for SMP systems. On the other hand, spinlocks scale
		96	poorly because of the active loop they are based on. Therefore,
45	jermar	97	spinlocks are used in HelenOS only for short-time mutual exclusion and
41	jermar	98	in cases where the mutual exclusion is required out of thread context.
		99	Lastly, spinlocks are used in the construction of passive
		100	synchronization primitives.</para>
		101	</section>
		102	</section>
9	bondari	103
41	jermar	104	<section>
		105	<title>Passive kernel synchronization</title>
9	bondari	106
41	jermar	107	<section>
72	bondari	108	<indexterm>
		109	<primary>synchronization</primary>
		110
73	bondari	111	<secondary>- wait queue</secondary>
72	bondari	112	</indexterm>
		113
43	jermar	114	<title>Wait queues</title>
9	bondari	115
43	jermar	116	<para>A wait queue is the basic passive synchronization primitive on
45	jermar	117	which all other passive synchronization primitives are built. Simply
		118	put, it allows a thread to sleep until an event associated with the
		119	particular wait queue occurs. Multiple threads are notified about
		120	incoming events in a first come, first served fashion. Moreover, should
		121	the event come before any thread waits for it, it is recorded in the
		122	wait queue as a missed wakeup and later forwarded to the first thread
		123	that decides to wait in the queue. The inner structures of the wait
		124	queue are protected by a spinlock.</para>
43	jermar	125
		126	<para>The thread that wants to wait for a wait queue event uses the
86	bondari	127	<code>waitq_sleep_timeout()</code> function. The algorithm then checks the
57	jermar	128	wait queue's counter of missed wakeups and if there are any missed
		129	wakeups, the call returns immediately. The call also returns immediately
		130	if only a conditional wait was requested. Otherwise the thread is
		131	enqueued in the wait queue's list of sleeping threads and its state is
		132	changed to <constant>Sleeping</constant>. It then sleeps until one of
		133	the following events happens:</para>
43	jermar	134
		135	<orderedlist>
		136	<listitem>
86	bondari	137	<para>another thread calls <code>waitq_wakeup()</code> and the thread
57	jermar	138	is the first thread in the wait queue's list of sleeping
45	jermar	139	threads;</para>
43	jermar	140	</listitem>
		141
		142	<listitem>
86	bondari	143	<para>another thread calls <code>waitq_interrupt_sleep()</code> on the
57	jermar	144	sleeping thread;</para>
43	jermar	145	</listitem>
		146
		147	<listitem>
45	jermar	148	<para>the sleep times out provided that none of the previous
		149	occurred within a specified time limit; the limit can be
		150	infinity.</para>
43	jermar	151	</listitem>
		152	</orderedlist>
		153
		154	<para>All five possibilities (immediate return on success, immediate
		155	return on failure, wakeup after sleep, interruption and timeout) are
86	bondari	156	distinguishable by the return value of <code>waitq_sleep_timeout()</code>.
57	jermar	157	Being able to interrupt a sleeping thread is essential for externally
		158	initiated thread termination. The ability to wait only for a certain
		159	amount of time is used, for instance, to passively delay thread
		160	execution by several microseconds or even seconds in
86	bondari	161	<code>thread_sleep()</code> function. Due to the fact that all other
57	jermar	162	passive kernel synchronization primitives are based on wait queues, they
		163	also have the option of being interrutped and, more importantly, can
		164	timeout. All of them also implement the conditional operation.
		165	Furthemore, this very fundamental interface reaches up to the
		166	implementation of futexes - userspace synchronization primitive, which
		167	makes it possible for a userspace thread to request a synchronization
		168	operation with a timeout or a conditional operation.</para>
43	jermar	169
		170	<para>From the description above, it should be apparent, that when a
86	bondari	171	sleeping thread is woken by <code>waitq_wakeup()</code> or when
		172	<code>waitq_sleep_timeout()</code> succeeds immediately, the thread can be
57	jermar	173	sure that the event has occurred. The thread need not and should not
		174	verify this fact. This approach is called direct hand-off and is
		175	characteristic for all passive HelenOS synchronization primitives, with
		176	the exception as described below.</para>
41	jermar	177	</section>
9	bondari	178
41	jermar	179	<section>
72	bondari	180	<indexterm>
		181	<primary>synchronization</primary>
		182
73	bondari	183	<secondary>- semaphore</secondary>
72	bondari	184	</indexterm>
		185
41	jermar	186	<title>Semaphores</title>
9	bondari	187
43	jermar	188	<para>The interesting point about wait queues is that the number of
		189	missed wakeups is equal to the number of threads that will not block in
86	bondari	190	<code>watiq_sleep_timeout()</code> and would immediately succeed instead.
57	jermar	191	On the other hand, semaphores are synchronization primitives that will
		192	let predefined amount of threads into their critical section and block
		193	any other threads above this count. However, these two cases are exactly
		194	the same. Semaphores in HelenOS are therefore implemented as wait queues
		195	with a single semantic change: their wait queue is initialized to have
		196	so many missed wakeups as is the number of threads that the semphore
		197	intends to let into its critical section simultaneously.</para>
43	jermar	198
		199	<para>In the semaphore language, the wait queue operation
86	bondari	200	<code>waitq_sleep_timeout()</code> corresponds to semaphore
57	jermar	201	<code>down</code> operation, represented by the function
86	bondari	202	<code>semaphore_down_timeout()</code> and by way of similitude the wait
57	jermar	203	queue operation waitq_wakeup corresponds to semaphore <code>up</code>
86	bondari	204	operation, represented by the function <code>sempafore_up()</code>. The
57	jermar	205	conditional down operation is called
86	bondari	206	<code>semaphore_trydown()</code>.</para>
41	jermar	207	</section>
9	bondari	208
41	jermar	209	<section>
43	jermar	210	<title>Mutexes</title>
9	bondari	211
72	bondari	212	<indexterm>
		213	<primary>synchronization</primary>
		214
		215	<secondary>- mutex</secondary>
		216	</indexterm>
		217
45	jermar	218	<para>Mutexes are sometimes referred to as binary sempahores. That means
		219	that mutexes are like semaphores that allow only one thread in its
43	jermar	220	critical section. Indeed, mutexes in HelenOS are implemented exactly in
45	jermar	221	this way: they are built on top of semaphores. From another point of
		222	view, they can be viewed as spinlocks without busy waiting. Their
		223	semaphore heritage provides good basics for both conditional operation
		224	and operation with timeout. The locking operation is called
86	bondari	225	<code>mutex_lock()</code>, the conditional locking operation is called
		226	<code>mutex_trylock()</code> and the unlocking operation is called
		227	<code>mutex_unlock()</code>.</para>
41	jermar	228	</section>
9	bondari	229
41	jermar	230	<section>
43	jermar	231	<title>Reader/writer locks</title>
9	bondari	232
72	bondari	233	<indexterm>
		234	<primary>synchronization</primary>
		235
		236	<secondary>- read/write lock</secondary>
		237	</indexterm>
		238
43	jermar	239	<para>Reader/writer locks, or rwlocks, are by far the most complicated
		240	synchronization primitive within the kernel. The goal of these locks is
45	jermar	241	to improve concurrency of applications, in which threads need to
		242	synchronize access to a shared resource, and that access can be
43	jermar	243	partitioned into a read-only mode and a write mode. Reader/writer locks
		244	should make it possible for several, possibly many, readers to enter the
		245	critical section, provided that no writer is currently in the critical
		246	section, or to be in the critical section contemporarily. Writers are
		247	allowed to enter the critical section only individually, provided that
45	jermar	248	no reader is in the critical section already. Applications, in which the
		249	majority of operations can be done in the read-only mode, can benefit
43	jermar	250	from increased concurrency created by reader/writer locks.</para>
		251
45	jermar	252	<para>During reader/writer lock construction, a decision should be made
43	jermar	253	whether readers will be prefered over writers or whether writers will be
		254	prefered over readers in cases when the lock is not currently held and
		255	both a reader and a writer want to gain the lock. Some operating systems
		256	prefer one group over the other, creating thus a possibility for
		257	starving the unprefered group. In the HelenOS operating system, none of
45	jermar	258	the two groups is prefered. The lock is granted on a first come, first
43	jermar	259	served basis with the additional note that readers are granted the lock
45	jermar	260	in the biggest possible batch.</para>
43	jermar	261
		262	<para>With this policy and the timeout modes of operation, the direct
		263	hand-off becomes much more complicated. For instance, a writer leaving
		264	the critical section must wake up all leading readers in the rwlock's
		265	wait queue or one leading writer or no-one if no thread is waiting.
		266	Similarily, the last reader leaving the critical section must wakeup the
45	jermar	267	sleeping writer if there are any sleeping threads left at all. As
		268	another example, if a writer at the beginning of the rwlock's wait queue
		269	times out and the lock is held by at least one reader, the writer which
		270	has timed out must first wake up all readers that follow him in the
		271	queue prior to signalling the timeout itself and giving up.</para>
43	jermar	272
45	jermar	273	<para>Due to the issues mentioned in the previous paragraph, the
		274	reader/writer lock imlpementation needs to walk the rwlock wait queue's
		275	list of sleeping threads directly, in order to find out the type of
43	jermar	276	access that the queueing threads demand. This makes the code difficult
		277	to understand and dependent on the internal implementation of the wait
		278	queue. Nevertheless, it remains unclear to the authors of HelenOS
		279	whether a simpler but equivalently fair solution exists.</para>
		280
		281	<para>The implementation of rwlocks as it has been already put, makes
		282	use of one single wait queue for both readers and writers, thus avoiding
		283	any possibility of starvation. In fact, rwlocks use a mutex rather than
86	bondari	284	a bare wait queue. This mutex is called <emphasis>exclusive</emphasis> and is
57	jermar	285	used to synchronize writers. The writer's lock operation,
86	bondari	286	<code>rwlock_write_lock_timeout()</code>, simply tries to acquire the
57	jermar	287	exclusive mutex. If it succeeds, the writer is granted the rwlock.
44	jermar	288	However, if the operation fails (e.g. times out), the writer must check
		289	for potential readers at the head of the list of sleeping threads
45	jermar	290	associated with the mutex's wait queue and then proceed according to the
44	jermar	291	procedure outlined above.</para>
43	jermar	292
		293	<para>The exclusive mutex plays an important role in reader
		294	synchronization as well. However, a reader doing the reader's lock
86	bondari	295	operation, <code>rwlock_read_lock_timeout()</code>, may bypass this mutex
57	jermar	296	when it detects that:</para>
43	jermar	297
		298	<orderedlist>
		299	<listitem>
45	jermar	300	<para>there are other readers in the critical section and</para>
43	jermar	301	</listitem>
		302
		303	<listitem>
		304	<para>there are no sleeping threads waiting for the exclusive
45	jermar	305	mutex.</para>
43	jermar	306	</listitem>
		307	</orderedlist>
		308
		309	<para>If both conditions are true, the reader will bypass the mutex,
45	jermar	310	increment the number of readers in the critical section and then enter
		311	the critical section. Note that if there are any sleeping threads at the
		312	beginning of the wait queue, the first must be a writer. If the
43	jermar	313	conditions are not fulfilled, the reader normally waits until the
		314	exclusive mutex is granted to it.</para>
41	jermar	315	</section>
9	bondari	316
		317	<section>
41	jermar	318	<title>Condition variables</title>
9	bondari	319
72	bondari	320	<indexterm>
		321	<primary>synchronization</primary>
		322
		323	<secondary>- condition variable</secondary>
		324	</indexterm>
		325
48	jermar	326	<para>Condition variables can be used for waiting until a condition
		327	becomes true. In this respect, they are similar to wait queues. But
		328	contrary to wait queues, condition variables have different semantics
		329	that allows events to be lost when there is no thread waiting for them.
		330	In order to support this, condition variables don't use direct hand-off
		331	and operate in a way similar to the example below. A thread waiting for
		332	the condition becoming true does the following:</para>
		333
62	jermar	334	<example>
86	bondari	335	<title>Use of <code>condvar_wait_timeout()</code>.</title>
72	bondari	336
		337	<programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	338	while (!<varname>condition</varname>)
		339	<function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>);
		340	/* <remark>the condition is true, do something</remark> */
62	jermar	341	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
72	bondari	342	</example>
48	jermar	343
		344	<para>A thread that causes the condition become true signals this event
		345	like this:</para>
		346
62	jermar	347	<example>
72	bondari	348	<title>Use of <code>condvar_signal</code>.</title>
		349
		350	<programlisting><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	351	<varname>condition</varname> = <constant>true</constant>;
		352	<function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */
72	bondari	353	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
		354	</example>
48	jermar	355
86	bondari	356	<para>The wait operation, <code>condvar_wait_timeout()</code>, always puts
57	jermar	357	the calling thread to sleep. The thread then sleeps until another thread
86	bondari	358	invokes <code>condvar_broadcast()</code> on the same condition variable or
		359	until it is woken up by <code>condvar_signal()</code>. The
		360	<code>condvar_signal()</code> operation unblocks the first thread blocking
		361	on the condition variable while the <code>condvar_broadcast()</code>
57	jermar	362	operation unblocks all threads blocking there. If there are no blocking
		363	threads, these two operations have no efect.</para>
48	jermar	364
		365	<para>Note that the threads must synchronize over a dedicated mutex. To
86	bondari	366	prevent race condition between <code>condvar_wait_timeout()</code> and
		367	<code>condvar_signal()</code> or <code>condvar_broadcast()</code>, the mutex
		368	is passed to <code>condvar_wait_timeout()</code> which then atomically
57	jermar	369	puts the calling thread asleep and unlocks the mutex. When the thread
86	bondari	370	eventually wakes up, <code>condvar_wait()</code> regains the mutex and
48	jermar	371	returns.</para>
		372
		373	<para>Also note, that there is no conditional operation for condition
		374	variables. Such an operation would make no sence since condition
		375	variables are defined to forget events for which there is no waiting
86	bondari	376	thread and because <code>condvar_wait()</code> must always go to sleep.
57	jermar	377	The operation with timeout is supported as usually.</para>
48	jermar	378
		379	<para>In HelenOS, condition variables are based on wait queues. As it is
		380	already mentioned above, wait queues remember missed events while
		381	condition variables must not do so. This is reasoned by the fact that
		382	condition variables are designed for scenarios in which an event might
		383	occur very many times without being picked up by any waiting thread. On
		384	the other hand, wait queues would remember any event that had not been
86	bondari	385	picked up by a call to <code>waitq_sleep_timeout()</code>. Therefore, if
57	jermar	386	wait queues were used directly and without any changes to implement
		387	condition variables, the missed_wakeup counter would hurt performance of
		388	the implementation: the <code>while</code> loop in
86	bondari	389	<code>condvar_wait_timeout()</code> would effectively do busy waiting
57	jermar	390	until all missed wakeups were discarded.</para>
48	jermar	391
		392	<para>The requirement on the wait operation to atomically put the caller
		393	to sleep and release the mutex poses an interesting problem on
86	bondari	394	<code>condvar_wait_timeout()</code>. More precisely, the thread should
57	jermar	395	sleep in the condvar's wait queue prior to releasing the mutex, but it
		396	must not hold the mutex when it is sleeping.</para>
48	jermar	397
		398	<para>Problems described in the two previous paragraphs are addressed in
86	bondari	399	HelenOS by dividing the <code>waitq_sleep_timeout()</code> function into
57	jermar	400	three pieces:</para>
48	jermar	401
		402	<orderedlist>
		403	<listitem>
86	bondari	404	<para><code>waitq_sleep_prepare()</code> prepares the thread to go to
57	jermar	405	sleep by, among other things, locking the wait queue;</para>
48	jermar	406	</listitem>
		407
		408	<listitem>
86	bondari	409	<para><code>waitq_sleep_timeout_unsafe()</code> implements the core
57	jermar	410	blocking logic;</para>
48	jermar	411	</listitem>
		412
		413	<listitem>
86	bondari	414	<para><code>waitq_sleep_finish()</code> performs cleanup after
		415	<code>waitq_sleep_timeout_unsafe()</code>; after this call, the wait
57	jermar	416	queue spinlock is guaranteed to be unlocked by the caller</para>
48	jermar	417	</listitem>
		418	</orderedlist>
		419
86	bondari	420	<para>The stock <code>waitq_sleep_timeout()</code> is then a mere wrapper
57	jermar	421	that calls these three functions. It is provided for convenience in
		422	cases where the caller doesn't require such a low level control.
86	bondari	423	However, the implementation of <code>condvar_wait_timeout()</code> does
57	jermar	424	need this finer-grained control because it has to interleave calls to
		425	these functions by other actions. It carries its operations out in the
		426	following order:</para>
48	jermar	427
		428	<orderedlist>
		429	<listitem>
86	bondari	430	<para>calls <code>waitq_sleep_prepare()</code> in order to lock the
57	jermar	431	condition variable's wait queue,</para>
48	jermar	432	</listitem>
		433
		434	<listitem>
		435	<para>releases the mutex,</para>
		436	</listitem>
		437
		438	<listitem>
		439	<para>clears the counter of missed wakeups,</para>
		440	</listitem>
		441
		442	<listitem>
86	bondari	443	<para>calls <code>waitq_sleep_timeout_unsafe()</code>,</para>
48	jermar	444	</listitem>
		445
		446	<listitem>
		447	<para>retakes the mutex,</para>
		448	</listitem>
		449
		450	<listitem>
86	bondari	451	<para>calls <code>waitq_sleep_finish()</code>.</para>
48	jermar	452	</listitem>
		453	</orderedlist>
9	bondari	454	</section>
41	jermar	455	</section>
9	bondari	456
41	jermar	457	<section>
		458	<title>Userspace synchronization</title>
9	bondari	459
41	jermar	460	<section>
		461	<title>Futexes</title>
		462
72	bondari	463	<indexterm>
		464	<primary>synchronization</primary>
		465
		466	<secondary>- futex</secondary>
		467	</indexterm>
		468
81	jermar	469	<para>In a multithreaded environment, userspace threads need an
		470	efficient way to synchronize. HelenOS borrows an idea from Linux<xref
		471	linkend="futex" /> to implement lightweight userspace synchronization
		472	and mutual exclusion primitive called futex. The key idea behind futexes
		473	is that non-contended synchronization is very fast and takes place only
		474	in userspace without kernel's intervention. When more threads contend
		475	for a futex, only one of them wins; other threads go to sleep via a
		476	dedicated syscall.</para>
		477
		478	<para>The userspace part of the futex is a mere integer variable, a
		479	counter, that can be atomically incremented or decremented. The kernel
		480	part is rather more complicated. For each userspace futex counter, there
		481	is a kernel structure describing the futex. This structure
		482	contains:</para>
		483
		484	<itemizedlist>
		485	<listitem>
		486	<para>number of references,</para>
		487	</listitem>
		488
		489	<listitem>
		490	<para>physical address of the userspace futex counter,</para>
		491	</listitem>
		492
		493	<listitem>
		494	<para>hash table link and</para>
		495	</listitem>
		496
		497	<listitem>
		498	<para>a wait queue.</para>
		499	</listitem>
		500	</itemizedlist>
		501
		502	<para>The reference count helps to find out when the futex is no longer
		503	needed and can be deallocated. The physical address is used as a key for
		504	the global futex hash table. Note that the kernel has to use physical
		505	address to identify the futex beacause one futex can be used for
		506	synchronization among different address spaces and can have different
		507	virtual addresses in each of them. Finally, the kernel futex structure
		508	includes a wait queue. The wait queue is used to put threads that didn't
		509	win the futex to sleep until the winner wakes one of them up.</para>
		510
		511	<para>A futex should be initialized by setting its userspace counter to
		512	one before it is used. When locking the futex via userspace library
86	bondari	513	function <code>futex_down_timeout()</code>, the library code atomically
82	jermar	514	decrements the futex counter and tests if it dropped below zero. If it
81	jermar	515	did, then the futex is locked by another thread and the library uses the
		516	<constant>SYS_FUTEX_SLEEP</constant> syscall to put the thread asleep.
		517	If the counter decreased to 0, then there was no contention and the
		518	thread can enter the critical section protected by the futex. When the
		519	thread later leaves that critical section, it, using library function
86	bondari	520	<code>futex_up()</code>, atomically increments the counter. If the counter
81	jermar	521	value increased to one, then there again was no contention and no action
		522	needs to be done. However, if it increased to zero or even a smaller
		523	number, then there are sleeping threads waiting for the futex to become
		524	available. In that case, the library has to use the
		525	<constant>SYS_FUTEX_WAKEUP</constant> syscall to wake one sleeping
		526	thread.</para>
		527
		528	<para>So far, futexes are very elegant in that they don't interfere with
		529	the kernel when there is no contention for them. Another nice aspect of
		530	futexes is that they don't need to be registered anywhere prior to the
		531	first kernel intervention.</para>
		532
		533	<para>Both futex related syscalls, <constant>SYS_FUTEX_SLEEP</constant>
		534	and <constant>SYS_FUTEX_WAKEUP</constant>, respectivelly, are mere
86	bondari	535	wrappers for <code>waitq_sleep_timeout()</code> and
		536	<code>waitq_sleep_wakeup()</code>, respectively, with some housekeeping
81	jermar	537	functionality added. Both syscalls need to translate the userspace
		538	virtual address of the futex counter to physical address in order to
		539	support synchronization accross shared memory. Once the physical address
		540	is known, the kernel checks whether the futexes are already known to it
		541	by searching the global futex hash table for an item with the physical
		542	address of the futex counter as a key. When the search is successful, it
		543	returns an address of the kernel futex structure associated with the
		544	counter. If the hash table does not contain the key, the kernel creates
		545	it and inserts it into the hash table. At the same time, the the current
		546	task's B+tree of known futexes is searched in order to find out if the
		547	task already uses the futex. If it does, no action is taken. Otherwise
		548	the reference count of the futex is incremented, provided that the futex
		549	already existed.</para>
41	jermar	550	</section>
		551	</section>
		552	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_synchronization.xml – Rev 86