WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_synchronization.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
41	jermar	2	<chapter id="sync">
		3	<?dbhtml filename="sync.html"?>
9	bondari	4
82	jermar	5	<title>Synchronization</title>
9	bondari	6
41	jermar	7	<section>
		8	<title>Introduction</title>
9	bondari	9
45	jermar	10	<para>The HelenOS operating system is designed to make use of the
		11	parallelism offered by the hardware and to exploit concurrency of both the
		12	kernel and userspace tasks. This is achieved through multiprocessor
		13	support and several levels of multiprogramming such as multitasking,
		14	multithreading and also through userspace pseudo threads. However, such a
		15	highly concurrent environment needs safe and efficient ways to handle
		16	mutual exclusion and synchronization of many execution flows.</para>
41	jermar	17	</section>
		18
		19	<section>
		20	<title>Active kernel primitives</title>
		21
9	bondari	22	<section>
72	bondari	23	<indexterm>
		24	<primary>synchronization</primary>
		25
		26	<secondary>- spinlock</secondary>
		27	</indexterm>
		28
41	jermar	29	<title>Spinlocks</title>
9	bondari	30
45	jermar	31	<para>The basic mutual exclusion primitive is the spinlock. The spinlock
		32	implements active waiting for the availability of a memory lock (i.e.
41	jermar	33	simple variable) in a multiprocessor-safe manner. This safety is
		34	achieved through the use of a specialized, architecture-dependent,
		35	atomic test-and-set operation which either locks the spinlock (i.e. sets
		36	the variable) or, provided that it is already locked, leaves it
		37	unaltered. In any case, the test-and-set operation returns a value, thus
		38	signalling either success (i.e. zero return value) or failure (i.e.
45	jermar	39	non-zero value) in acquiring the lock. Note that this makes a
41	jermar	40	fundamental difference between the naive algorithm that doesn't use the
		41	atomic operation and the spinlock algortihm. While the naive algorithm
45	jermar	42	is prone to race conditions on SMP configurations and thus is completely
41	jermar	43	SMP-unsafe, the spinlock algorithm eliminates the possibility of race
		44	conditions and is suitable for mutual exclusion use.</para>
9	bondari	45
41	jermar	46	<para>The semantics of the test-and-set operation is that the spinlock
		47	remains unavailable until this operation called on the respective
45	jermar	48	spinlock returns zero. HelenOS builds two functions on top of the
		49	test-and-set operation. The first function is the unconditional attempt
86	bondari	50	to acquire the spinlock and is called <code>spinlock_lock()</code>. It
57	jermar	51	simply loops until the test-and-set returns a zero value. The other
86	bondari	52	function, <code>spinlock_trylock()</code>, is the conditional lock
57	jermar	53	operation and calls the test-and-set only once to find out whether it
		54	managed to acquire the spinlock or not. The conditional operation is
		55	useful in situations in which an algorithm cannot acquire more spinlocks
		56	in the proper order and a deadlock cannot be avoided. In such a case,
		57	the algorithm would detect the danger and instead of possibly
		58	deadlocking the system it would simply release some spinlocks it already
		59	holds and retry the whole operation with the hope that it will succeed
86	bondari	60	next time. The unlock function, <code>spinlock_unlock()</code>, is quite
57	jermar	61	easy - it merely clears the spinlock variable.</para>
9	bondari	62
41	jermar	63	<para>Nevertheless, there is a special issue related to hardware
45	jermar	64	optimizations that modern processors implement. Particularly problematic
		65	is the out-of-order execution of instructions within the critical
		66	section protected by a spinlock. The processors are always
41	jermar	67	self-consistent so that they can carry out speculatively executed
		68	instructions in the right order with regard to dependencies among those
		69	instructions. However, the dependency between instructions inside the
		70	critical section and those that implement locking and unlocking of the
45	jermar	71	respective spinlock is not implicit on some processor architectures. As
		72	a result, the processor needs to be explicitly told about each
		73	occurrence of such a dependency. Therefore, HelenOS adds
86	bondari	74	architecture-specific hooks to all <code>spinlock_lock()</code>,
		75	<code>spinlock_trylock()</code> and <code>spinlock_unlock()</code> functions
57	jermar	76	to prevent the instructions inside the critical section from permeating
		77	out. On some architectures, these hooks can be void because the
		78	dependencies are implicitly there because of the special properties of
		79	locking and unlocking instructions. However, other architectures need to
		80	instrument these hooks with different memory barriers, depending on what
		81	operations could permeate out.</para>
9	bondari	82
41	jermar	83	<para>Spinlocks have one significant drawback: when held for longer time
45	jermar	84	periods, they harm both parallelism and concurrency. The processor
86	bondari	85	executing <code>spinlock_lock()</code> does not do any fruitful work and
57	jermar	86	is effectively halted until it can grab the lock and proceed.
45	jermar	87	Similarily, other execution flows cannot execute on the processor that
		88	holds the spinlock because the kernel disables preemption on that
		89	processor when a spinlock is held. The reason behind disabling
		90	preemption is priority inversion problem avoidance. For the same reason,
		91	threads are strongly discouraged from sleeping when they hold a
		92	spinlock.</para>
9	bondari	93
41	jermar	94	<para>To summarize, spinlocks represent very simple and essential mutual
		95	exclusion primitive for SMP systems. On the other hand, spinlocks scale
		96	poorly because of the active loop they are based on. Therefore,
45	jermar	97	spinlocks are used in HelenOS only for short-time mutual exclusion and
41	jermar	98	in cases where the mutual exclusion is required out of thread context.
		99	Lastly, spinlocks are used in the construction of passive
		100	synchronization primitives.</para>
		101	</section>
		102	</section>
9	bondari	103
41	jermar	104	<section>
		105	<title>Passive kernel synchronization</title>
9	bondari	106
41	jermar	107	<section>
72	bondari	108	<indexterm>
		109	<primary>synchronization</primary>
		110
73	bondari	111	<secondary>- wait queue</secondary>
72	bondari	112	</indexterm>
		113
43	jermar	114	<title>Wait queues</title>
9	bondari	115
43	jermar	116	<para>A wait queue is the basic passive synchronization primitive on
45	jermar	117	which all other passive synchronization primitives are built. Simply
		118	put, it allows a thread to sleep until an event associated with the
		119	particular wait queue occurs. Multiple threads are notified about
		120	incoming events in a first come, first served fashion. Moreover, should
		121	the event come before any thread waits for it, it is recorded in the
		122	wait queue as a missed wakeup and later forwarded to the first thread
		123	that decides to wait in the queue. The inner structures of the wait
		124	queue are protected by a spinlock.</para>
43	jermar	125
		126	<para>The thread that wants to wait for a wait queue event uses the
86	bondari	127	<code>waitq_sleep_timeout()</code> function. The algorithm then checks the
57	jermar	128	wait queue's counter of missed wakeups and if there are any missed
		129	wakeups, the call returns immediately. The call also returns immediately
		130	if only a conditional wait was requested. Otherwise the thread is
		131	enqueued in the wait queue's list of sleeping threads and its state is
		132	changed to <constant>Sleeping</constant>. It then sleeps until one of
		133	the following events happens:</para>
43	jermar	134
		135	<orderedlist>
		136	<listitem>
86	bondari	137	<para>another thread calls <code>waitq_wakeup()</code> and the thread
57	jermar	138	is the first thread in the wait queue's list of sleeping
45	jermar	139	threads;</para>
43	jermar	140	</listitem>
		141
		142	<listitem>
86	bondari	143	<para>another thread calls <code>waitq_interrupt_sleep()</code> on the
57	jermar	144	sleeping thread;</para>
43	jermar	145	</listitem>
		146
		147	<listitem>
45	jermar	148	<para>the sleep times out provided that none of the previous
		149	occurred within a specified time limit; the limit can be
		150	infinity.</para>
43	jermar	151	</listitem>
		152	</orderedlist>
		153
		154	<para>All five possibilities (immediate return on success, immediate
		155	return on failure, wakeup after sleep, interruption and timeout) are
86	bondari	156	distinguishable by the return value of <code>waitq_sleep_timeout()</code>.
57	jermar	157	Being able to interrupt a sleeping thread is essential for externally
		158	initiated thread termination. The ability to wait only for a certain
		159	amount of time is used, for instance, to passively delay thread
		160	execution by several microseconds or even seconds in
86	bondari	161	<code>thread_sleep()</code> function. Due to the fact that all other
57	jermar	162	passive kernel synchronization primitives are based on wait queues, they
		163	also have the option of being interrutped and, more importantly, can
		164	timeout. All of them also implement the conditional operation.
		165	Furthemore, this very fundamental interface reaches up to the
		166	implementation of futexes - userspace synchronization primitive, which
		167	makes it possible for a userspace thread to request a synchronization
		168	operation with a timeout or a conditional operation.</para>
43	jermar	169
		170	<para>From the description above, it should be apparent, that when a
86	bondari	171	sleeping thread is woken by <code>waitq_wakeup()</code> or when
		172	<code>waitq_sleep_timeout()</code> succeeds immediately, the thread can be
57	jermar	173	sure that the event has occurred. The thread need not and should not
		174	verify this fact. This approach is called direct hand-off and is
		175	characteristic for all passive HelenOS synchronization primitives, with
		176	the exception as described below.</para>
41	jermar	177	</section>
9	bondari	178
41	jermar	179	<section>
72	bondari	180	<indexterm>
		181	<primary>synchronization</primary>
		182
73	bondari	183	<secondary>- semaphore</secondary>
72	bondari	184	</indexterm>
		185
41	jermar	186	<title>Semaphores</title>
9	bondari	187
43	jermar	188	<para>The interesting point about wait queues is that the number of
		189	missed wakeups is equal to the number of threads that will not block in
86	bondari	190	<code>watiq_sleep_timeout()</code> and would immediately succeed instead.
57	jermar	191	On the other hand, semaphores are synchronization primitives that will
		192	let predefined amount of threads into their critical section and block
		193	any other threads above this count. However, these two cases are exactly
		194	the same. Semaphores in HelenOS are therefore implemented as wait queues
		195	with a single semantic change: their wait queue is initialized to have
		196	so many missed wakeups as is the number of threads that the semphore
		197	intends to let into its critical section simultaneously.</para>
43	jermar	198
		199	<para>In the semaphore language, the wait queue operation
86	bondari	200	<code>waitq_sleep_timeout()</code> corresponds to semaphore
57	jermar	201	<code>down</code> operation, represented by the function
86	bondari	202	<code>semaphore_down_timeout()</code> and by way of similitude the wait
57	jermar	203	queue operation waitq_wakeup corresponds to semaphore <code>up</code>
86	bondari	204	operation, represented by the function <code>sempafore_up()</code>. The
57	jermar	205	conditional down operation is called
86	bondari	206	<code>semaphore_trydown()</code>.</para>
41	jermar	207	</section>
9	bondari	208
41	jermar	209	<section>
43	jermar	210	<title>Mutexes</title>
9	bondari	211
72	bondari	212	<indexterm>
		213	<primary>synchronization</primary>
		214
		215	<secondary>- mutex</secondary>
		216	</indexterm>
		217
45	jermar	218	<para>Mutexes are sometimes referred to as binary sempahores. That means
		219	that mutexes are like semaphores that allow only one thread in its
43	jermar	220	critical section. Indeed, mutexes in HelenOS are implemented exactly in
45	jermar	221	this way: they are built on top of semaphores. From another point of
		222	view, they can be viewed as spinlocks without busy waiting. Their
		223	semaphore heritage provides good basics for both conditional operation
		224	and operation with timeout. The locking operation is called
86	bondari	225	<code>mutex_lock()</code>, the conditional locking operation is called
		226	<code>mutex_trylock()</code> and the unlocking operation is called
		227	<code>mutex_unlock()</code>.</para>
41	jermar	228	</section>
9	bondari	229
41	jermar	230	<section>
43	jermar	231	<title>Reader/writer locks</title>
9	bondari	232
72	bondari	233	<indexterm>
		234	<primary>synchronization</primary>
		235
		236	<secondary>- read/write lock</secondary>
		237	</indexterm>
		238
43	jermar	239	<para>Reader/writer locks, or rwlocks, are by far the most complicated
		240	synchronization primitive within the kernel. The goal of these locks is
45	jermar	241	to improve concurrency of applications, in which threads need to
		242	synchronize access to a shared resource, and that access can be
43	jermar	243	partitioned into a read-only mode and a write mode. Reader/writer locks
		244	should make it possible for several, possibly many, readers to enter the
		245	critical section, provided that no writer is currently in the critical
		246	section, or to be in the critical section contemporarily. Writers are
		247	allowed to enter the critical section only individually, provided that
45	jermar	248	no reader is in the critical section already. Applications, in which the
		249	majority of operations can be done in the read-only mode, can benefit
43	jermar	250	from increased concurrency created by reader/writer locks.</para>
		251
45	jermar	252	<para>During reader/writer lock construction, a decision should be made
43	jermar	253	whether readers will be prefered over writers or whether writers will be
		254	prefered over readers in cases when the lock is not currently held and
		255	both a reader and a writer want to gain the lock. Some operating systems
		256	prefer one group over the other, creating thus a possibility for
		257	starving the unprefered group. In the HelenOS operating system, none of
45	jermar	258	the two groups is prefered. The lock is granted on a first come, first
43	jermar	259	served basis with the additional note that readers are granted the lock
45	jermar	260	in the biggest possible batch.</para>
43	jermar	261
		262	<para>With this policy and the timeout modes of operation, the direct
		263	hand-off becomes much more complicated. For instance, a writer leaving
		264	the critical section must wake up all leading readers in the rwlock's
		265	wait queue or one leading writer or no-one if no thread is waiting.
		266	Similarily, the last reader leaving the critical section must wakeup the
45	jermar	267	sleeping writer if there are any sleeping threads left at all. As
		268	another example, if a writer at the beginning of the rwlock's wait queue
		269	times out and the lock is held by at least one reader, the writer which
		270	has timed out must first wake up all readers that follow him in the
		271	queue prior to signalling the timeout itself and giving up.</para>
43	jermar	272
45	jermar	273	<para>Due to the issues mentioned in the previous paragraph, the
		274	reader/writer lock imlpementation needs to walk the rwlock wait queue's
		275	list of sleeping threads directly, in order to find out the type of
43	jermar	276	access that the queueing threads demand. This makes the code difficult
		277	to understand and dependent on the internal implementation of the wait
		278	queue. Nevertheless, it remains unclear to the authors of HelenOS
		279	whether a simpler but equivalently fair solution exists.</para>
		280
		281	<para>The implementation of rwlocks as it has been already put, makes
		282	use of one single wait queue for both readers and writers, thus avoiding
		283	any possibility of starvation. In fact, rwlocks use a mutex rather than
86	bondari	284	a bare wait queue. This mutex is called <emphasis>exclusive</emphasis> and is
57	jermar	285	used to synchronize writers. The writer's lock operation,
86	bondari	286	<code>rwlock_write_lock_timeout()</code>, simply tries to acquire the
57	jermar	287	exclusive mutex. If it succeeds, the writer is granted the rwlock.
44	jermar	288	However, if the operation fails (e.g. times out), the writer must check
		289	for potential readers at the head of the list of sleeping threads
45	jermar	290	associated with the mutex's wait queue and then proceed according to the
44	jermar	291	procedure outlined above.</para>
43	jermar	292
		293	<para>The exclusive mutex plays an important role in reader
		294	synchronization as well. However, a reader doing the reader's lock
86	bondari	295	operation, <code>rwlock_read_lock_timeout()</code>, may bypass this mutex
57	jermar	296	when it detects that:</para>
43	jermar	297
		298	<orderedlist>
		299	<listitem>
45	jermar	300	<para>there are other readers in the critical section and</para>
43	jermar	301	</listitem>
		302
		303	<listitem>
		304	<para>there are no sleeping threads waiting for the exclusive
45	jermar	305	mutex.</para>
43	jermar	306	</listitem>
		307	</orderedlist>
		308
		309	<para>If both conditions are true, the reader will bypass the mutex,
45	jermar	310	increment the number of readers in the critical section and then enter
		311	the critical section. Note that if there are any sleeping threads at the
		312	beginning of the wait queue, the first must be a writer. If the
43	jermar	313	conditions are not fulfilled, the reader normally waits until the
		314	exclusive mutex is granted to it.</para>
41	jermar	315	</section>
9	bondari	316
		317	<section>
41	jermar	318	<title>Condition variables</title>
9	bondari	319
72	bondari	320	<indexterm>
		321	<primary>synchronization</primary>
		322
		323	<secondary>- condition variable</secondary>
		324	</indexterm>
		325
48	jermar	326	<para>Condition variables can be used for waiting until a condition
		327	becomes true. In this respect, they are similar to wait queues. But
		328	contrary to wait queues, condition variables have different semantics
		329	that allows events to be lost when there is no thread waiting for them.
		330	In order to support this, condition variables don't use direct hand-off
		331	and operate in a way similar to the example below. A thread waiting for
		332	the condition becoming true does the following:</para>
		333
62	jermar	334	<example>
86	bondari	335	<title>Use of <code>condvar_wait_timeout()</code>.</title>
72	bondari	336
		337	<programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	338	while (!<varname>condition</varname>)
102	bondari	339	<function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>); /* <remark>the condition is true, do something</remark> */
62	jermar	340	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
72	bondari	341	</example>
48	jermar	342
		343	<para>A thread that causes the condition become true signals this event
		344	like this:</para>
		345
62	jermar	346	<example>
72	bondari	347	<title>Use of <code>condvar_signal</code>.</title>
		348
102	bondari	349	<programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	350	<varname>condition</varname> = <constant>true</constant>;
		351	<function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */
72	bondari	352	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
		353	</example>
48	jermar	354
86	bondari	355	<para>The wait operation, <code>condvar_wait_timeout()</code>, always puts
57	jermar	356	the calling thread to sleep. The thread then sleeps until another thread
86	bondari	357	invokes <code>condvar_broadcast()</code> on the same condition variable or
		358	until it is woken up by <code>condvar_signal()</code>. The
		359	<code>condvar_signal()</code> operation unblocks the first thread blocking
		360	on the condition variable while the <code>condvar_broadcast()</code>
57	jermar	361	operation unblocks all threads blocking there. If there are no blocking
		362	threads, these two operations have no efect.</para>
48	jermar	363
		364	<para>Note that the threads must synchronize over a dedicated mutex. To
86	bondari	365	prevent race condition between <code>condvar_wait_timeout()</code> and
		366	<code>condvar_signal()</code> or <code>condvar_broadcast()</code>, the mutex
		367	is passed to <code>condvar_wait_timeout()</code> which then atomically
57	jermar	368	puts the calling thread asleep and unlocks the mutex. When the thread
86	bondari	369	eventually wakes up, <code>condvar_wait()</code> regains the mutex and
48	jermar	370	returns.</para>
		371
		372	<para>Also note, that there is no conditional operation for condition
		373	variables. Such an operation would make no sence since condition
		374	variables are defined to forget events for which there is no waiting
86	bondari	375	thread and because <code>condvar_wait()</code> must always go to sleep.
57	jermar	376	The operation with timeout is supported as usually.</para>
48	jermar	377
		378	<para>In HelenOS, condition variables are based on wait queues. As it is
		379	already mentioned above, wait queues remember missed events while
		380	condition variables must not do so. This is reasoned by the fact that
		381	condition variables are designed for scenarios in which an event might
		382	occur very many times without being picked up by any waiting thread. On
		383	the other hand, wait queues would remember any event that had not been
86	bondari	384	picked up by a call to <code>waitq_sleep_timeout()</code>. Therefore, if
57	jermar	385	wait queues were used directly and without any changes to implement
		386	condition variables, the missed_wakeup counter would hurt performance of
		387	the implementation: the <code>while</code> loop in
86	bondari	388	<code>condvar_wait_timeout()</code> would effectively do busy waiting
57	jermar	389	until all missed wakeups were discarded.</para>
48	jermar	390
		391	<para>The requirement on the wait operation to atomically put the caller
		392	to sleep and release the mutex poses an interesting problem on
86	bondari	393	<code>condvar_wait_timeout()</code>. More precisely, the thread should
57	jermar	394	sleep in the condvar's wait queue prior to releasing the mutex, but it
		395	must not hold the mutex when it is sleeping.</para>
48	jermar	396
		397	<para>Problems described in the two previous paragraphs are addressed in
86	bondari	398	HelenOS by dividing the <code>waitq_sleep_timeout()</code> function into
57	jermar	399	three pieces:</para>
48	jermar	400
		401	<orderedlist>
		402	<listitem>
86	bondari	403	<para><code>waitq_sleep_prepare()</code> prepares the thread to go to
57	jermar	404	sleep by, among other things, locking the wait queue;</para>
48	jermar	405	</listitem>
		406
		407	<listitem>
86	bondari	408	<para><code>waitq_sleep_timeout_unsafe()</code> implements the core
57	jermar	409	blocking logic;</para>
48	jermar	410	</listitem>
		411
		412	<listitem>
86	bondari	413	<para><code>waitq_sleep_finish()</code> performs cleanup after
		414	<code>waitq_sleep_timeout_unsafe()</code>; after this call, the wait
57	jermar	415	queue spinlock is guaranteed to be unlocked by the caller</para>
48	jermar	416	</listitem>
		417	</orderedlist>
		418
86	bondari	419	<para>The stock <code>waitq_sleep_timeout()</code> is then a mere wrapper
57	jermar	420	that calls these three functions. It is provided for convenience in
		421	cases where the caller doesn't require such a low level control.
86	bondari	422	However, the implementation of <code>condvar_wait_timeout()</code> does
57	jermar	423	need this finer-grained control because it has to interleave calls to
		424	these functions by other actions. It carries its operations out in the
		425	following order:</para>
48	jermar	426
		427	<orderedlist>
		428	<listitem>
86	bondari	429	<para>calls <code>waitq_sleep_prepare()</code> in order to lock the
57	jermar	430	condition variable's wait queue,</para>
48	jermar	431	</listitem>
		432
		433	<listitem>
		434	<para>releases the mutex,</para>
		435	</listitem>
		436
		437	<listitem>
		438	<para>clears the counter of missed wakeups,</para>
		439	</listitem>
		440
		441	<listitem>
86	bondari	442	<para>calls <code>waitq_sleep_timeout_unsafe()</code>,</para>
48	jermar	443	</listitem>
		444
		445	<listitem>
		446	<para>retakes the mutex,</para>
		447	</listitem>
		448
		449	<listitem>
86	bondari	450	<para>calls <code>waitq_sleep_finish()</code>.</para>
48	jermar	451	</listitem>
		452	</orderedlist>
9	bondari	453	</section>
41	jermar	454	</section>
9	bondari	455
41	jermar	456	<section>
		457	<title>Userspace synchronization</title>
9	bondari	458
41	jermar	459	<section>
		460	<title>Futexes</title>
		461
72	bondari	462	<indexterm>
		463	<primary>synchronization</primary>
		464
		465	<secondary>- futex</secondary>
		466	</indexterm>
		467
81	jermar	468	<para>In a multithreaded environment, userspace threads need an
		469	efficient way to synchronize. HelenOS borrows an idea from Linux<xref
		470	linkend="futex" /> to implement lightweight userspace synchronization
		471	and mutual exclusion primitive called futex. The key idea behind futexes
		472	is that non-contended synchronization is very fast and takes place only
		473	in userspace without kernel's intervention. When more threads contend
		474	for a futex, only one of them wins; other threads go to sleep via a
		475	dedicated syscall.</para>
		476
		477	<para>The userspace part of the futex is a mere integer variable, a
		478	counter, that can be atomically incremented or decremented. The kernel
		479	part is rather more complicated. For each userspace futex counter, there
		480	is a kernel structure describing the futex. This structure
		481	contains:</para>
		482
		483	<itemizedlist>
		484	<listitem>
		485	<para>number of references,</para>
		486	</listitem>
		487
		488	<listitem>
		489	<para>physical address of the userspace futex counter,</para>
		490	</listitem>
		491
		492	<listitem>
		493	<para>hash table link and</para>
		494	</listitem>
		495
		496	<listitem>
		497	<para>a wait queue.</para>
		498	</listitem>
		499	</itemizedlist>
		500
		501	<para>The reference count helps to find out when the futex is no longer
		502	needed and can be deallocated. The physical address is used as a key for
		503	the global futex hash table. Note that the kernel has to use physical
		504	address to identify the futex beacause one futex can be used for
		505	synchronization among different address spaces and can have different
		506	virtual addresses in each of them. Finally, the kernel futex structure
		507	includes a wait queue. The wait queue is used to put threads that didn't
		508	win the futex to sleep until the winner wakes one of them up.</para>
		509
		510	<para>A futex should be initialized by setting its userspace counter to
		511	one before it is used. When locking the futex via userspace library
86	bondari	512	function <code>futex_down_timeout()</code>, the library code atomically
82	jermar	513	decrements the futex counter and tests if it dropped below zero. If it
81	jermar	514	did, then the futex is locked by another thread and the library uses the
		515	<constant>SYS_FUTEX_SLEEP</constant> syscall to put the thread asleep.
		516	If the counter decreased to 0, then there was no contention and the
		517	thread can enter the critical section protected by the futex. When the
		518	thread later leaves that critical section, it, using library function
86	bondari	519	<code>futex_up()</code>, atomically increments the counter. If the counter
81	jermar	520	value increased to one, then there again was no contention and no action
		521	needs to be done. However, if it increased to zero or even a smaller
		522	number, then there are sleeping threads waiting for the futex to become
		523	available. In that case, the library has to use the
		524	<constant>SYS_FUTEX_WAKEUP</constant> syscall to wake one sleeping
		525	thread.</para>
		526
		527	<para>So far, futexes are very elegant in that they don't interfere with
		528	the kernel when there is no contention for them. Another nice aspect of
		529	futexes is that they don't need to be registered anywhere prior to the
		530	first kernel intervention.</para>
		531
		532	<para>Both futex related syscalls, <constant>SYS_FUTEX_SLEEP</constant>
		533	and <constant>SYS_FUTEX_WAKEUP</constant>, respectivelly, are mere
86	bondari	534	wrappers for <code>waitq_sleep_timeout()</code> and
		535	<code>waitq_sleep_wakeup()</code>, respectively, with some housekeeping
81	jermar	536	functionality added. Both syscalls need to translate the userspace
		537	virtual address of the futex counter to physical address in order to
		538	support synchronization accross shared memory. Once the physical address
		539	is known, the kernel checks whether the futexes are already known to it
		540	by searching the global futex hash table for an item with the physical
		541	address of the futex counter as a key. When the search is successful, it
		542	returns an address of the kernel futex structure associated with the
		543	counter. If the hash table does not contain the key, the kernel creates
		544	it and inserts it into the hash table. At the same time, the the current
		545	task's B+tree of known futexes is searched in order to find out if the
		546	task already uses the futex. If it does, no action is taken. Otherwise
		547	the reference count of the futex is incremented, provided that the futex
		548	already existed.</para>
41	jermar	549	</section>
		550	</section>
		551	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_synchronization.xml @ 171 – Rev 102