WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_synchronization.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
41	jermar	2	<chapter id="sync">
		3	<?dbhtml filename="sync.html"?>
9	bondari	4
41	jermar	5	<title>Mutual exclusion and synchronization</title>
9	bondari	6
41	jermar	7	<section>
		8	<title>Introduction</title>
9	bondari	9
41	jermar	10	<para>The HelenOS operating system is designed to make use of parallelism
		11	offered by hardware and to exploit concurrency of both the kernel and
		12	userspace tasks. This is achieved through multiprocessor support and
		13	several levels of multiprogramming (i.e. multitasking, multithreading and
		14	through userspace pseudo threads). However, such a highly concurrent
		15	environment needs safe and efficient ways to handle mutual exclusion and
		16	synchronization of many execution flows.</para>
		17	</section>
		18
		19	<section>
		20	<title>Active kernel primitives</title>
		21
9	bondari	22	<section>
41	jermar	23	<title>Spinlocks</title>
9	bondari	24
41	jermar	25	<para>The basic mutual exclusion primitive is the spinlock. Spinlock
		26	implements busy waiting for an availability of a memory lock (i.e.
		27	simple variable) in a multiprocessor-safe manner. This safety is
		28	achieved through the use of a specialized, architecture-dependent,
		29	atomic test-and-set operation which either locks the spinlock (i.e. sets
		30	the variable) or, provided that it is already locked, leaves it
		31	unaltered. In any case, the test-and-set operation returns a value, thus
		32	signalling either success (i.e. zero return value) or failure (i.e.
		33	non-zero value) in acquiring the lock. Note that this makes the
		34	fundamental difference between the naive algorithm that doesn't use the
		35	atomic operation and the spinlock algortihm. While the naive algorithm
		36	is prone to race conditions on SMP configuratinos and thus is completely
		37	SMP-unsafe, the spinlock algorithm eliminates the possibility of race
		38	conditions and is suitable for mutual exclusion use.</para>
9	bondari	39
41	jermar	40	<para>The semantics of the test-and-set operation is that the spinlock
		41	remains unavailable until this operation called on the respective
		42	spinlock returns zero. HelenOS builds two functions on top of
		43	test-and-set operation. The first is the unconditional attempt to
		44	acquire the spinlock and is called <emphasis>spinlock_lock</emphasis>.
		45	It simply loops until test-and-set returns zero. The other operation,
		46	<emphasis>spinlock_trylock</emphasis>, is the conditional lock operation
		47	and calls the test-and-set only once to find out wheter it managed to
		48	acquire the spinlock or not. The conditional operation is useful in
		49	situations in which an algorithm cannot acquire more spinlocks in the
		50	proper order and a deadlock cannot be avoided. In such a case, the
		51	algorithm would detect the danger and instead of possibly deadlocking
		52	the system it would simply release some spinlocks it already holds and
		53	retry the whole operation with the hope that it will succeed next time.
		54	The unlock operation, <emphasis>spinlock_unlock</emphasis>, is quite
		55	easy - it merely clears the spinlock variable.</para>
9	bondari	56
41	jermar	57	<para>Nevertheless, there is a special issue related to hardware
		58	optimizations that modern processors implement. Particularily
		59	problematic is the out-of-order execution of instructions within the
		60	critical section protected by a spinlock. The processors are always
		61	self-consistent so that they can carry out speculatively executed
		62	instructions in the right order with regard to dependencies among those
		63	instructions. However, the dependency between instructions inside the
		64	critical section and those that implement locking and unlocking of the
		65	respective spinlock is not implicit on some processor architectures and
		66	the processor needs to be explicitly told about each occurrence of such
		67	a dependency. Therefore, HelenOS adds architecture-specific hooks to all
		68	<emphasis>spinlock_lock</emphasis>,
		69	<emphasis>spinlock_trylock</emphasis> and
		70	<emphasis>spinlock_unlock</emphasis> to prevent the instructions inside
		71	the critical section from bleeding out. On some architectures, these
		72	hooks can be a no-op because the dependencies are implicitly there
		73	because of the special properties of locking and unlocking instructions.
		74	However, other architectures need to instrument these hooks with
		75	different memory barriers, depending on what operations can bleed
		76	out.</para>
9	bondari	77
41	jermar	78	<para>Spinlocks have one significant drawback: when held for longer time
		79	periods, they harm both parallelism and concurrency. Processor executing
		80	<emphasis>spinlock_lock</emphasis> does not do any fruitful work and is
		81	effectively halted until it can grab the lock and proceed. Similarily,
		82	other threads cannot execute on the processor that holds the spinlock
		83	because the kernel disables preemption on that processor when a spinlock
		84	is held. The reason behind disabling preemption is priority inversion
		85	problem avoidance. For the same reason, threads are strongly discouraged
		86	from sleeping when they hold a spinlock.</para>
9	bondari	87
41	jermar	88	<para>To summarize, spinlocks represent very simple and essential mutual
		89	exclusion primitive for SMP systems. On the other hand, spinlocks scale
		90	poorly because of the active loop they are based on. Therefore,
		91	spinlocks are used in HelenOS only for a short-time mutual exclusion and
		92	in cases where the mutual exclusion is required out of thread context.
		93	Lastly, spinlocks are used in the construction of passive
		94	synchronization primitives.</para>
		95	</section>
		96	</section>
9	bondari	97
41	jermar	98	<section>
		99	<title>Passive kernel synchronization</title>
9	bondari	100
41	jermar	101	<section>
43	jermar	102	<title>Wait queues</title>
9	bondari	103
43	jermar	104	<para>A wait queue is the basic passive synchronization primitive on
		105	which all other passive synchronization primitives build. Simply put, it
		106	allows a thread to sleep until an event associated with the particular
		107	wait queue occurs. Multiple threads are notified about incoming events
		108	in first come, first served fashion. Moreover, should the event come
		109	before any thread waits for it, it is recorded in the wait queue as a
		110	missed wakeup and later forwarded to the first thread that decides to
		111	wait in the queue. The inner structures of the wait queue are protected
		112	by a spinlock.</para>
		113
		114	<para>The thread that wants to wait for a wait queue event uses the
		115	<emphasis>waitq_sleep_timeout</emphasis> function. The algorithm then
		116	checks the wait queue's counter of missed wakeups and if there are any
		117	missed wakeups, the call returns immediately. The call also returns
		118	immediately if only a conditional wait was requested. Otherwise the
		119	thread is enqueued in the wait queue's list of sleeping threads and its
		120	state is changed to <emphasis>Sleeping</emphasis>. It then sleeps until
		121	one of the following events happens:</para>
		122
		123	<orderedlist>
		124	<listitem>
		125	<para>another thread calls <emphasis>waitq_wakeup</emphasis> and the
		126	thread is the first thread in the wait queue's list of sleeping
		127	threads</para>
		128	</listitem>
		129
		130	<listitem>
		131	<para>another thread calls
		132	<emphasis>waitq_interrupt_sleep</emphasis> on the sleeping
		133	thread</para>
		134	</listitem>
		135
		136	<listitem>
		137	<para>the sleep timeouts provided that none of the previous occurred
		138	within a specified time limit; the limit can be infinity</para>
		139	</listitem>
		140	</orderedlist>
		141
		142	<para>All five possibilities (immediate return on success, immediate
		143	return on failure, wakeup after sleep, interruption and timeout) are
		144	distinguishable by the return value of
		145	<emphasis>waitq_sleep_timeout</emphasis>. The ability to interrupt a
		146	sleeping thread is essential for externally initiated thread termination
		147	and the ability to wait only for a certain amount of time is used, for
		148	instance, to passively delay thread execution by several microseconds or
		149	even seconds in <emphasis>thread_sleep</emphasis> function. Because all
		150	other passive kernel synchronization primitives are based on wait
		151	queues, they also have the option of being interrutped and, more
		152	importantly, can timeout. All of them also implement the conditional
		153	operation. Furthemore, this very fundamental interface reaches up to the
		154	implementation of futexes - userspace synchronization primitive, which
		155	makes it possible for a userspace thread to request synchronization
		156	operation with a timeout or a conditional operation.</para>
		157
		158	<para>From the description above, it should be apparent, that when a
		159	sleeping thread is woken by <emphasis>waitq_wakeup</emphasis> or when
		160	<emphasis>waitq_sleep_timeout</emphasis> succeeds immediatelly, the
		161	thread can be sure the event has come and the thread need not and should
		162	not verify this fact. This approach is called direct hand-off and is
		163	characteristic for all passive HelenOS synchronization primitives with
		164	one exception described below.</para>
41	jermar	165	</section>
9	bondari	166
41	jermar	167	<section>
		168	<title>Semaphores</title>
9	bondari	169
43	jermar	170	<para>The interesting point about wait queues is that the number of
		171	missed wakeups is equal to the number of threads that will not block in
		172	<emphasis>watiq_sleep_timeout</emphasis> and would immediately succeed
		173	instead. On the other hand, semaphores are synchronization primitives
		174	that will let predefined amount of threads in its critical section and
		175	block any other threads above this count. However, these two cases are
		176	exactly the same. Semaphores in HelenOS are therefore implemented as
		177	wait queues with a single semantic change: their wait queue is
		178	initialized to have so many missed wakeups as is the number of threads
		179	that the semphore intends to let into its critical section
		180	simultaneously.</para>
		181
		182	<para>In the semaphore language, the wait queue operation
		183	<emphasis>waitq_sleep_timeout</emphasis> corresponds to
		184	<emphasis><emphasis>semaphore</emphasis> down</emphasis> operation,
		185	represented by the function <emphasis>semaphore_down_timeout</emphasis>
		186	and by way of similitude the wait queue operation waitq_wakeup
		187	corresponds to semaphore <emphasis>up</emphasis> operation, represented
		188	by the function <emphasis>sempafore_up</emphasis>. The conditional down
		189	operation is called <emphasis>semaphore_trydown</emphasis>.</para>
41	jermar	190	</section>
9	bondari	191
41	jermar	192	<section>
43	jermar	193	<title>Mutexes</title>
9	bondari	194
43	jermar	195	<para>Mutexes are are sometimes referred to as binary sempahores. That
		196	means that mutexes are like semaphores that allow only one thread in its
		197	critical section. Indeed, mutexes in HelenOS are implemented exactly in
		198	this way: they are built atop semaphores. From another point of view,
		199	they can be viewed as spinlocks without busy waiting. Their semaphore
		200	heritage provides good basics for both conditional operation and
		201	operation with timeout. The locking operation is called
		202	<emphasis>mutex_lock</emphasis>, the conditional locking operation is
		203	called <emphasis>mutex_trylock</emphasis> and the unlocking operation is
		204	called <emphasis>mutex_unlock</emphasis>.</para>
41	jermar	205	</section>
9	bondari	206
41	jermar	207	<section>
43	jermar	208	<title>Reader/writer locks</title>
9	bondari	209
43	jermar	210	<para>Reader/writer locks, or rwlocks, are by far the most complicated
		211	synchronization primitive within the kernel. The goal of these locks is
		212	to improve concurrency of applications in which threads need to
		213	synchronize access to a shared resource and that access can be
		214	partitioned into a read-only mode and a write mode. Reader/writer locks
		215	should make it possible for several, possibly many, readers to enter the
		216	critical section, provided that no writer is currently in the critical
		217	section, or to be in the critical section contemporarily. Writers are
		218	allowed to enter the critical section only individually, provided that
		219	no reader is in the critical section already. Applications in which the
		220	majority of operations can be done in the read-only mode can benefit
		221	from increased concurrency created by reader/writer locks.</para>
		222
		223	<para>During reader/writer locks construction, a decision should be made
		224	whether readers will be prefered over writers or whether writers will be
		225	prefered over readers in cases when the lock is not currently held and
		226	both a reader and a writer want to gain the lock. Some operating systems
		227	prefer one group over the other, creating thus a possibility for
		228	starving the unprefered group. In the HelenOS operating system, none of
		229	the two groups is prefered. The lock is granted on the first come, first
		230	served basis with the additional note that readers are granted the lock
		231	in biggest possible batches.</para>
		232
		233	<para>With this policy and the timeout modes of operation, the direct
		234	hand-off becomes much more complicated. For instance, a writer leaving
		235	the critical section must wake up all leading readers in the rwlock's
		236	wait queue or one leading writer or no-one if no thread is waiting.
		237	Similarily, the last reader leaving the critical section must wakeup the
		238	sleeping writer, if there are any sleeping threads at all. As another
		239	example, if a writer at the beginning of the rwlock's wait queue
		240	timeouts and the lock is held by at least one reader, the timeouting
		241	writer must first wake up all readers that follow him in the queue prior
		242	to signalling the timeout itself and giving up.</para>
		243
		244	<para>Because of the issues mentioned in the previous paragraph, the
		245	reader/writer locks imlpementation needs to walk the rwlock wait queue's
		246	list of sleeping threads directly in order to find out the type of
		247	access that the queueing threads demand. This makes the code difficult
		248	to understand and dependent on the internal implementation of the wait
		249	queue. Nevertheless, it remains unclear to the authors of HelenOS
		250	whether a simpler but equivalently fair solution exists.</para>
		251
		252	<para>The implementation of rwlocks as it has been already put, makes
		253	use of one single wait queue for both readers and writers, thus avoiding
		254	any possibility of starvation. In fact, rwlocks use a mutex rather than
		255	a bare wait queue. This mutex is called <emphasis>exclusive</emphasis>
		256	and is used to synchronize writers. The writer's lock operation,
		257	<emphasis>rwlock_write_lock_timeout</emphasis>, simply tries to acquire
		258	the exclusive mutex. If it succeeds, the writer is granted the rwlock.
44	jermar	259	However, if the operation fails (e.g. times out), the writer must check
		260	for potential readers at the head of the list of sleeping threads
		261	associated with the mutex's wait queue and proceed according to the
		262	procedure outlined above.</para>
43	jermar	263
		264	<para>The exclusive mutex plays an important role in reader
		265	synchronization as well. However, a reader doing the reader's lock
		266	operation, <emphasis>rwlock_read_lock_timeout</emphasis>, may bypass
		267	this mutex when it detects that:</para>
		268
		269	<orderedlist>
		270	<listitem>
		271	<para>there are other readers in the critical section</para>
		272	</listitem>
		273
		274	<listitem>
		275	<para>there are no sleeping threads waiting for the exclusive
		276	mutex</para>
		277	</listitem>
		278	</orderedlist>
		279
		280	<para>If both conditions are true, the reader will bypass the mutex,
		281	increment the number of readers in the critical section and enter the
		282	critical section. Note that if there are any sleeping threads at the
		283	beginning of the wait queue, the first of them must be a writer. If the
		284	conditions are not fulfilled, the reader normally waits until the
		285	exclusive mutex is granted to it.</para>
41	jermar	286	</section>
9	bondari	287
		288	<section>
41	jermar	289	<title>Condition variables</title>
9	bondari	290
41	jermar	291	<para>Condvars explanation</para>
9	bondari	292	</section>
41	jermar	293	</section>
9	bondari	294
41	jermar	295	<section>
		296	<title>Userspace synchronization</title>
9	bondari	297
41	jermar	298	<section>
		299	<title>Futexes</title>
		300
		301	<para></para>
		302	</section>
		303	</section>
		304	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_synchronization.xml @ 171 – Rev 44