WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_synchronization.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
41	jermar	2	<chapter id="sync">
		3	<?dbhtml filename="sync.html"?>
9	bondari	4
82	jermar	5	<title>Synchronization</title>
9	bondari	6
41	jermar	7	<section>
		8	<title>Introduction</title>
9	bondari	9
45	jermar	10	<para>The HelenOS operating system is designed to make use of the
		11	parallelism offered by the hardware and to exploit concurrency of both the
		12	kernel and userspace tasks. This is achieved through multiprocessor
		13	support and several levels of multiprogramming such as multitasking,
171	jermar	14	multithreading and also through userspace fibrils. However, such a highly
		15	concurrent environment needs safe and efficient ways to handle mutual
		16	exclusion and synchronization of many execution flows.</para>
41	jermar	17	</section>
		18
		19	<section>
131	jermar	20	<title>Active Kernel Primitives</title>
41	jermar	21
9	bondari	22	<section>
72	bondari	23	<indexterm>
		24	<primary>synchronization</primary>
		25
		26	<secondary>- spinlock</secondary>
		27	</indexterm>
		28
41	jermar	29	<title>Spinlocks</title>
9	bondari	30
45	jermar	31	<para>The basic mutual exclusion primitive is the spinlock. The spinlock
		32	implements active waiting for the availability of a memory lock (i.e.
41	jermar	33	simple variable) in a multiprocessor-safe manner. This safety is
		34	achieved through the use of a specialized, architecture-dependent,
		35	atomic test-and-set operation which either locks the spinlock (i.e. sets
		36	the variable) or, provided that it is already locked, leaves it
		37	unaltered. In any case, the test-and-set operation returns a value, thus
		38	signalling either success (i.e. zero return value) or failure (i.e.
45	jermar	39	non-zero value) in acquiring the lock. Note that this makes a
41	jermar	40	fundamental difference between the naive algorithm that doesn't use the
		41	atomic operation and the spinlock algortihm. While the naive algorithm
45	jermar	42	is prone to race conditions on SMP configurations and thus is completely
41	jermar	43	SMP-unsafe, the spinlock algorithm eliminates the possibility of race
		44	conditions and is suitable for mutual exclusion use.</para>
9	bondari	45
41	jermar	46	<para>The semantics of the test-and-set operation is that the spinlock
		47	remains unavailable until this operation called on the respective
45	jermar	48	spinlock returns zero. HelenOS builds two functions on top of the
		49	test-and-set operation. The first function is the unconditional attempt
86	bondari	50	to acquire the spinlock and is called <code>spinlock_lock()</code>. It
57	jermar	51	simply loops until the test-and-set returns a zero value. The other
86	bondari	52	function, <code>spinlock_trylock()</code>, is the conditional lock
57	jermar	53	operation and calls the test-and-set only once to find out whether it
		54	managed to acquire the spinlock or not. The conditional operation is
		55	useful in situations in which an algorithm cannot acquire more spinlocks
		56	in the proper order and a deadlock cannot be avoided. In such a case,
		57	the algorithm would detect the danger and instead of possibly
		58	deadlocking the system it would simply release some spinlocks it already
		59	holds and retry the whole operation with the hope that it will succeed
86	bondari	60	next time. The unlock function, <code>spinlock_unlock()</code>, is quite
57	jermar	61	easy - it merely clears the spinlock variable.</para>
9	bondari	62
41	jermar	63	<para>Nevertheless, there is a special issue related to hardware
45	jermar	64	optimizations that modern processors implement. Particularly problematic
		65	is the out-of-order execution of instructions within the critical
		66	section protected by a spinlock. The processors are always
41	jermar	67	self-consistent so that they can carry out speculatively executed
		68	instructions in the right order with regard to dependencies among those
		69	instructions. However, the dependency between instructions inside the
		70	critical section and those that implement locking and unlocking of the
45	jermar	71	respective spinlock is not implicit on some processor architectures. As
		72	a result, the processor needs to be explicitly told about each
		73	occurrence of such a dependency. Therefore, HelenOS adds
86	bondari	74	architecture-specific hooks to all <code>spinlock_lock()</code>,
131	jermar	75	<code>spinlock_trylock()</code> and <code>spinlock_unlock()</code>
		76	functions to prevent the instructions inside the critical section from
		77	permeating out. On some architectures, these hooks can be void because
		78	the dependencies are implicitly there because of the special properties
		79	of locking and unlocking instructions. However, other architectures need
		80	to instrument these hooks with different memory barriers, depending on
		81	what operations could permeate out.</para>
9	bondari	82
41	jermar	83	<para>Spinlocks have one significant drawback: when held for longer time
45	jermar	84	periods, they harm both parallelism and concurrency. The processor
86	bondari	85	executing <code>spinlock_lock()</code> does not do any fruitful work and
57	jermar	86	is effectively halted until it can grab the lock and proceed.
45	jermar	87	Similarily, other execution flows cannot execute on the processor that
		88	holds the spinlock because the kernel disables preemption on that
		89	processor when a spinlock is held. The reason behind disabling
		90	preemption is priority inversion problem avoidance. For the same reason,
		91	threads are strongly discouraged from sleeping when they hold a
		92	spinlock.</para>
9	bondari	93
41	jermar	94	<para>To summarize, spinlocks represent very simple and essential mutual
		95	exclusion primitive for SMP systems. On the other hand, spinlocks scale
		96	poorly because of the active loop they are based on. Therefore,
45	jermar	97	spinlocks are used in HelenOS only for short-time mutual exclusion and
41	jermar	98	in cases where the mutual exclusion is required out of thread context.
		99	Lastly, spinlocks are used in the construction of passive
		100	synchronization primitives.</para>
		101	</section>
		102	</section>
9	bondari	103
41	jermar	104	<section>
131	jermar	105	<title>Passive Kernel Synchronization</title>
9	bondari	106
41	jermar	107	<section>
72	bondari	108	<indexterm>
		109	<primary>synchronization</primary>
		110
73	bondari	111	<secondary>- wait queue</secondary>
72	bondari	112	</indexterm>
		113
131	jermar	114	<title>Wait Queues</title>
9	bondari	115
43	jermar	116	<para>A wait queue is the basic passive synchronization primitive on
45	jermar	117	which all other passive synchronization primitives are built. Simply
		118	put, it allows a thread to sleep until an event associated with the
		119	particular wait queue occurs. Multiple threads are notified about
		120	incoming events in a first come, first served fashion. Moreover, should
		121	the event come before any thread waits for it, it is recorded in the
		122	wait queue as a missed wakeup and later forwarded to the first thread
		123	that decides to wait in the queue. The inner structures of the wait
		124	queue are protected by a spinlock.</para>
43	jermar	125
		126	<para>The thread that wants to wait for a wait queue event uses the
131	jermar	127	<code>waitq_sleep_timeout()</code> function. The algorithm then checks
		128	the wait queue's counter of missed wakeups and if there are any missed
57	jermar	129	wakeups, the call returns immediately. The call also returns immediately
		130	if only a conditional wait was requested. Otherwise the thread is
		131	enqueued in the wait queue's list of sleeping threads and its state is
		132	changed to <constant>Sleeping</constant>. It then sleeps until one of
		133	the following events happens:</para>
43	jermar	134
		135	<orderedlist>
		136	<listitem>
131	jermar	137	<para>another thread calls <code>waitq_wakeup()</code> and the
		138	thread is the first thread in the wait queue's list of sleeping
45	jermar	139	threads;</para>
43	jermar	140	</listitem>
		141
		142	<listitem>
131	jermar	143	<para>another thread calls <code>waitq_interrupt_sleep()</code> on
		144	the sleeping thread;</para>
43	jermar	145	</listitem>
		146
		147	<listitem>
45	jermar	148	<para>the sleep times out provided that none of the previous
		149	occurred within a specified time limit; the limit can be
		150	infinity.</para>
43	jermar	151	</listitem>
		152	</orderedlist>
		153
		154	<para>All five possibilities (immediate return on success, immediate
		155	return on failure, wakeup after sleep, interruption and timeout) are
131	jermar	156	distinguishable by the return value of
		157	<code>waitq_sleep_timeout()</code>. Being able to interrupt a sleeping
		158	thread is essential for externally initiated thread termination. The
		159	ability to wait only for a certain amount of time is used, for instance,
		160	to passively delay thread execution by several microseconds or even
		161	seconds in <code>thread_sleep()</code> function. Due to the fact that
		162	all other passive kernel synchronization primitives are based on wait
		163	queues, they also have the option of being interrutped and, more
		164	importantly, can timeout. All of them also implement the conditional
		165	operation. Furthemore, this very fundamental interface reaches up to the
57	jermar	166	implementation of futexes - userspace synchronization primitive, which
		167	makes it possible for a userspace thread to request a synchronization
		168	operation with a timeout or a conditional operation.</para>
43	jermar	169
		170	<para>From the description above, it should be apparent, that when a
86	bondari	171	sleeping thread is woken by <code>waitq_wakeup()</code> or when
131	jermar	172	<code>waitq_sleep_timeout()</code> succeeds immediately, the thread can
		173	be sure that the event has occurred. The thread need not and should not
57	jermar	174	verify this fact. This approach is called direct hand-off and is
		175	characteristic for all passive HelenOS synchronization primitives, with
		176	the exception as described below.</para>
41	jermar	177	</section>
9	bondari	178
41	jermar	179	<section>
72	bondari	180	<indexterm>
		181	<primary>synchronization</primary>
		182
73	bondari	183	<secondary>- semaphore</secondary>
72	bondari	184	</indexterm>
		185
41	jermar	186	<title>Semaphores</title>
9	bondari	187
43	jermar	188	<para>The interesting point about wait queues is that the number of
		189	missed wakeups is equal to the number of threads that will not block in
131	jermar	190	<code>watiq_sleep_timeout()</code> and would immediately succeed
		191	instead. On the other hand, semaphores are synchronization primitives
		192	that will let predefined amount of threads into their critical section
		193	and block any other threads above this count. However, these two cases
		194	are exactly the same. Semaphores in HelenOS are therefore implemented as
		195	wait queues with a single semantic change: their wait queue is
		196	initialized to have so many missed wakeups as is the number of threads
		197	that the semphore intends to let into its critical section
		198	simultaneously.</para>
43	jermar	199
		200	<para>In the semaphore language, the wait queue operation
86	bondari	201	<code>waitq_sleep_timeout()</code> corresponds to semaphore
57	jermar	202	<code>down</code> operation, represented by the function
86	bondari	203	<code>semaphore_down_timeout()</code> and by way of similitude the wait
57	jermar	204	queue operation waitq_wakeup corresponds to semaphore <code>up</code>
86	bondari	205	operation, represented by the function <code>sempafore_up()</code>. The
57	jermar	206	conditional down operation is called
86	bondari	207	<code>semaphore_trydown()</code>.</para>
41	jermar	208	</section>
9	bondari	209
41	jermar	210	<section>
43	jermar	211	<title>Mutexes</title>
9	bondari	212
72	bondari	213	<indexterm>
		214	<primary>synchronization</primary>
		215
		216	<secondary>- mutex</secondary>
		217	</indexterm>
		218
45	jermar	219	<para>Mutexes are sometimes referred to as binary sempahores. That means
		220	that mutexes are like semaphores that allow only one thread in its
43	jermar	221	critical section. Indeed, mutexes in HelenOS are implemented exactly in
45	jermar	222	this way: they are built on top of semaphores. From another point of
		223	view, they can be viewed as spinlocks without busy waiting. Their
		224	semaphore heritage provides good basics for both conditional operation
		225	and operation with timeout. The locking operation is called
86	bondari	226	<code>mutex_lock()</code>, the conditional locking operation is called
		227	<code>mutex_trylock()</code> and the unlocking operation is called
		228	<code>mutex_unlock()</code>.</para>
41	jermar	229	</section>
9	bondari	230
41	jermar	231	<section>
131	jermar	232	<title>Reader/Writer Locks</title>
9	bondari	233
72	bondari	234	<indexterm>
		235	<primary>synchronization</primary>
		236
		237	<secondary>- read/write lock</secondary>
		238	</indexterm>
		239
43	jermar	240	<para>Reader/writer locks, or rwlocks, are by far the most complicated
		241	synchronization primitive within the kernel. The goal of these locks is
45	jermar	242	to improve concurrency of applications, in which threads need to
		243	synchronize access to a shared resource, and that access can be
43	jermar	244	partitioned into a read-only mode and a write mode. Reader/writer locks
		245	should make it possible for several, possibly many, readers to enter the
		246	critical section, provided that no writer is currently in the critical
		247	section, or to be in the critical section contemporarily. Writers are
		248	allowed to enter the critical section only individually, provided that
45	jermar	249	no reader is in the critical section already. Applications, in which the
		250	majority of operations can be done in the read-only mode, can benefit
43	jermar	251	from increased concurrency created by reader/writer locks.</para>
		252
45	jermar	253	<para>During reader/writer lock construction, a decision should be made
43	jermar	254	whether readers will be prefered over writers or whether writers will be
		255	prefered over readers in cases when the lock is not currently held and
		256	both a reader and a writer want to gain the lock. Some operating systems
		257	prefer one group over the other, creating thus a possibility for
		258	starving the unprefered group. In the HelenOS operating system, none of
45	jermar	259	the two groups is prefered. The lock is granted on a first come, first
43	jermar	260	served basis with the additional note that readers are granted the lock
45	jermar	261	in the biggest possible batch.</para>
43	jermar	262
		263	<para>With this policy and the timeout modes of operation, the direct
		264	hand-off becomes much more complicated. For instance, a writer leaving
		265	the critical section must wake up all leading readers in the rwlock's
		266	wait queue or one leading writer or no-one if no thread is waiting.
		267	Similarily, the last reader leaving the critical section must wakeup the
45	jermar	268	sleeping writer if there are any sleeping threads left at all. As
		269	another example, if a writer at the beginning of the rwlock's wait queue
		270	times out and the lock is held by at least one reader, the writer which
		271	has timed out must first wake up all readers that follow him in the
		272	queue prior to signalling the timeout itself and giving up.</para>
43	jermar	273
45	jermar	274	<para>Due to the issues mentioned in the previous paragraph, the
		275	reader/writer lock imlpementation needs to walk the rwlock wait queue's
		276	list of sleeping threads directly, in order to find out the type of
43	jermar	277	access that the queueing threads demand. This makes the code difficult
		278	to understand and dependent on the internal implementation of the wait
		279	queue. Nevertheless, it remains unclear to the authors of HelenOS
		280	whether a simpler but equivalently fair solution exists.</para>
		281
		282	<para>The implementation of rwlocks as it has been already put, makes
		283	use of one single wait queue for both readers and writers, thus avoiding
		284	any possibility of starvation. In fact, rwlocks use a mutex rather than
131	jermar	285	a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
		286	and is used to synchronize writers. The writer's lock operation,
86	bondari	287	<code>rwlock_write_lock_timeout()</code>, simply tries to acquire the
57	jermar	288	exclusive mutex. If it succeeds, the writer is granted the rwlock.
44	jermar	289	However, if the operation fails (e.g. times out), the writer must check
		290	for potential readers at the head of the list of sleeping threads
45	jermar	291	associated with the mutex's wait queue and then proceed according to the
44	jermar	292	procedure outlined above.</para>
43	jermar	293
		294	<para>The exclusive mutex plays an important role in reader
		295	synchronization as well. However, a reader doing the reader's lock
131	jermar	296	operation, <code>rwlock_read_lock_timeout()</code>, may bypass this
		297	mutex when it detects that:</para>
43	jermar	298
		299	<orderedlist>
		300	<listitem>
45	jermar	301	<para>there are other readers in the critical section and</para>
43	jermar	302	</listitem>
		303
		304	<listitem>
		305	<para>there are no sleeping threads waiting for the exclusive
45	jermar	306	mutex.</para>
43	jermar	307	</listitem>
		308	</orderedlist>
		309
		310	<para>If both conditions are true, the reader will bypass the mutex,
45	jermar	311	increment the number of readers in the critical section and then enter
		312	the critical section. Note that if there are any sleeping threads at the
		313	beginning of the wait queue, the first must be a writer. If the
43	jermar	314	conditions are not fulfilled, the reader normally waits until the
		315	exclusive mutex is granted to it.</para>
41	jermar	316	</section>
9	bondari	317
		318	<section>
131	jermar	319	<title>Condition Variables</title>
9	bondari	320
72	bondari	321	<indexterm>
		322	<primary>synchronization</primary>
		323
		324	<secondary>- condition variable</secondary>
		325	</indexterm>
		326
48	jermar	327	<para>Condition variables can be used for waiting until a condition
		328	becomes true. In this respect, they are similar to wait queues. But
		329	contrary to wait queues, condition variables have different semantics
		330	that allows events to be lost when there is no thread waiting for them.
		331	In order to support this, condition variables don't use direct hand-off
		332	and operate in a way similar to the example below. A thread waiting for
		333	the condition becoming true does the following:</para>
		334
62	jermar	335	<example>
86	bondari	336	<title>Use of <code>condvar_wait_timeout()</code>.</title>
72	bondari	337
		338	<programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	339	while (!<varname>condition</varname>)
102	bondari	340	<function>condvar_wait_timeout</function>(<varname>cv</varname>, <varname>mtx</varname>); /* <remark>the condition is true, do something</remark> */
62	jermar	341	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
72	bondari	342	</example>
48	jermar	343
		344	<para>A thread that causes the condition become true signals this event
		345	like this:</para>
		346
62	jermar	347	<example>
141	jermar	348	<title>Use of <code>condvar_signal()</code>.</title>
72	bondari	349
102	bondari	350	<programlisting language="C"><function>mutex_lock</function>(<varname>mtx</varname>);
48	jermar	351	<varname>condition</varname> = <constant>true</constant>;
		352	<function>condvar_signal</function>(<varname>cv</varname>); /* <remark>condvar_broadcast(cv);</remark> */
72	bondari	353	<function>mutex_unlock</function>(<varname>mtx</varname>);</programlisting>
		354	</example>
48	jermar	355
131	jermar	356	<para>The wait operation, <code>condvar_wait_timeout()</code>, always
		357	puts the calling thread to sleep. The thread then sleeps until another
		358	thread invokes <code>condvar_broadcast()</code> on the same condition
		359	variable or until it is woken up by <code>condvar_signal()</code>. The
		360	<code>condvar_signal()</code> operation unblocks the first thread
		361	blocking on the condition variable while the
		362	<code>condvar_broadcast()</code> operation unblocks all threads blocking
		363	there. If there are no blocking threads, these two operations have no
		364	efect.</para>
48	jermar	365
		366	<para>Note that the threads must synchronize over a dedicated mutex. To
86	bondari	367	prevent race condition between <code>condvar_wait_timeout()</code> and
131	jermar	368	<code>condvar_signal()</code> or <code>condvar_broadcast()</code>, the
		369	mutex is passed to <code>condvar_wait_timeout()</code> which then
		370	atomically puts the calling thread asleep and unlocks the mutex. When
		371	the thread eventually wakes up, <code>condvar_wait()</code> regains the
		372	mutex and returns.</para>
48	jermar	373
		374	<para>Also note, that there is no conditional operation for condition
		375	variables. Such an operation would make no sence since condition
		376	variables are defined to forget events for which there is no waiting
86	bondari	377	thread and because <code>condvar_wait()</code> must always go to sleep.
57	jermar	378	The operation with timeout is supported as usually.</para>
48	jermar	379
		380	<para>In HelenOS, condition variables are based on wait queues. As it is
		381	already mentioned above, wait queues remember missed events while
		382	condition variables must not do so. This is reasoned by the fact that
		383	condition variables are designed for scenarios in which an event might
		384	occur very many times without being picked up by any waiting thread. On
		385	the other hand, wait queues would remember any event that had not been
86	bondari	386	picked up by a call to <code>waitq_sleep_timeout()</code>. Therefore, if
57	jermar	387	wait queues were used directly and without any changes to implement
		388	condition variables, the missed_wakeup counter would hurt performance of
		389	the implementation: the <code>while</code> loop in
86	bondari	390	<code>condvar_wait_timeout()</code> would effectively do busy waiting
57	jermar	391	until all missed wakeups were discarded.</para>
48	jermar	392
		393	<para>The requirement on the wait operation to atomically put the caller
		394	to sleep and release the mutex poses an interesting problem on
86	bondari	395	<code>condvar_wait_timeout()</code>. More precisely, the thread should
57	jermar	396	sleep in the condvar's wait queue prior to releasing the mutex, but it
		397	must not hold the mutex when it is sleeping.</para>
48	jermar	398
		399	<para>Problems described in the two previous paragraphs are addressed in
86	bondari	400	HelenOS by dividing the <code>waitq_sleep_timeout()</code> function into
57	jermar	401	three pieces:</para>
48	jermar	402
		403	<orderedlist>
		404	<listitem>
131	jermar	405	<para><code>waitq_sleep_prepare()</code> prepares the thread to go
		406	to sleep by, among other things, locking the wait queue;</para>
48	jermar	407	</listitem>
		408
		409	<listitem>
86	bondari	410	<para><code>waitq_sleep_timeout_unsafe()</code> implements the core
57	jermar	411	blocking logic;</para>
48	jermar	412	</listitem>
		413
		414	<listitem>
86	bondari	415	<para><code>waitq_sleep_finish()</code> performs cleanup after
		416	<code>waitq_sleep_timeout_unsafe()</code>; after this call, the wait
57	jermar	417	queue spinlock is guaranteed to be unlocked by the caller</para>
48	jermar	418	</listitem>
		419	</orderedlist>
		420
131	jermar	421	<para>The stock <code>waitq_sleep_timeout()</code> is then a mere
		422	wrapper that calls these three functions. It is provided for convenience
		423	in cases where the caller doesn't require such a low level control.
86	bondari	424	However, the implementation of <code>condvar_wait_timeout()</code> does
57	jermar	425	need this finer-grained control because it has to interleave calls to
		426	these functions by other actions. It carries its operations out in the
		427	following order:</para>
48	jermar	428
		429	<orderedlist>
		430	<listitem>
86	bondari	431	<para>calls <code>waitq_sleep_prepare()</code> in order to lock the
57	jermar	432	condition variable's wait queue,</para>
48	jermar	433	</listitem>
		434
		435	<listitem>
		436	<para>releases the mutex,</para>
		437	</listitem>
		438
		439	<listitem>
		440	<para>clears the counter of missed wakeups,</para>
		441	</listitem>
		442
		443	<listitem>
86	bondari	444	<para>calls <code>waitq_sleep_timeout_unsafe()</code>,</para>
48	jermar	445	</listitem>
		446
		447	<listitem>
		448	<para>retakes the mutex,</para>
		449	</listitem>
		450
		451	<listitem>
86	bondari	452	<para>calls <code>waitq_sleep_finish()</code>.</para>
48	jermar	453	</listitem>
		454	</orderedlist>
9	bondari	455	</section>
41	jermar	456	</section>
9	bondari	457
41	jermar	458	<section>
131	jermar	459	<title>Userspace Synchronization</title>
9	bondari	460
41	jermar	461	<section>
		462	<title>Futexes</title>
		463
72	bondari	464	<indexterm>
		465	<primary>synchronization</primary>
		466
		467	<secondary>- futex</secondary>
		468	</indexterm>
		469
81	jermar	470	<para>In a multithreaded environment, userspace threads need an
		471	efficient way to synchronize. HelenOS borrows an idea from Linux<xref
		472	linkend="futex" /> to implement lightweight userspace synchronization
		473	and mutual exclusion primitive called futex. The key idea behind futexes
		474	is that non-contended synchronization is very fast and takes place only
		475	in userspace without kernel's intervention. When more threads contend
		476	for a futex, only one of them wins; other threads go to sleep via a
		477	dedicated syscall.</para>
		478
		479	<para>The userspace part of the futex is a mere integer variable, a
		480	counter, that can be atomically incremented or decremented. The kernel
		481	part is rather more complicated. For each userspace futex counter, there
		482	is a kernel structure describing the futex. This structure
		483	contains:</para>
		484
		485	<itemizedlist>
		486	<listitem>
		487	<para>number of references,</para>
		488	</listitem>
		489
		490	<listitem>
		491	<para>physical address of the userspace futex counter,</para>
		492	</listitem>
		493
		494	<listitem>
		495	<para>hash table link and</para>
		496	</listitem>
		497
		498	<listitem>
		499	<para>a wait queue.</para>
		500	</listitem>
		501	</itemizedlist>
		502
		503	<para>The reference count helps to find out when the futex is no longer
		504	needed and can be deallocated. The physical address is used as a key for
		505	the global futex hash table. Note that the kernel has to use physical
		506	address to identify the futex beacause one futex can be used for
		507	synchronization among different address spaces and can have different
		508	virtual addresses in each of them. Finally, the kernel futex structure
		509	includes a wait queue. The wait queue is used to put threads that didn't
		510	win the futex to sleep until the winner wakes one of them up.</para>
		511
		512	<para>A futex should be initialized by setting its userspace counter to
		513	one before it is used. When locking the futex via userspace library
86	bondari	514	function <code>futex_down_timeout()</code>, the library code atomically
82	jermar	515	decrements the futex counter and tests if it dropped below zero. If it
81	jermar	516	did, then the futex is locked by another thread and the library uses the
		517	<constant>SYS_FUTEX_SLEEP</constant> syscall to put the thread asleep.
		518	If the counter decreased to 0, then there was no contention and the
		519	thread can enter the critical section protected by the futex. When the
		520	thread later leaves that critical section, it, using library function
131	jermar	521	<code>futex_up()</code>, atomically increments the counter. If the
		522	counter value increased to one, then there again was no contention and
		523	no action needs to be done. However, if it increased to zero or even a
		524	smaller number, then there are sleeping threads waiting for the futex to
		525	become available. In that case, the library has to use the
81	jermar	526	<constant>SYS_FUTEX_WAKEUP</constant> syscall to wake one sleeping
		527	thread.</para>
		528
		529	<para>So far, futexes are very elegant in that they don't interfere with
		530	the kernel when there is no contention for them. Another nice aspect of
		531	futexes is that they don't need to be registered anywhere prior to the
		532	first kernel intervention.</para>
		533
		534	<para>Both futex related syscalls, <constant>SYS_FUTEX_SLEEP</constant>
		535	and <constant>SYS_FUTEX_WAKEUP</constant>, respectivelly, are mere
86	bondari	536	wrappers for <code>waitq_sleep_timeout()</code> and
		537	<code>waitq_sleep_wakeup()</code>, respectively, with some housekeeping
81	jermar	538	functionality added. Both syscalls need to translate the userspace
		539	virtual address of the futex counter to physical address in order to
		540	support synchronization accross shared memory. Once the physical address
		541	is known, the kernel checks whether the futexes are already known to it
		542	by searching the global futex hash table for an item with the physical
		543	address of the futex counter as a key. When the search is successful, it
		544	returns an address of the kernel futex structure associated with the
		545	counter. If the hash table does not contain the key, the kernel creates
		546	it and inserts it into the hash table. At the same time, the the current
		547	task's B+tree of known futexes is searched in order to find out if the
		548	task already uses the futex. If it does, no action is taken. Otherwise
		549	the reference count of the futex is incremented, provided that the futex
		550	already existed.</para>
41	jermar	551	</section>
		552	</section>
		553	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_synchronization.xml – Rev 171