WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_scheduling.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
57	jermar	2	<chapter id="scheduling">
		3	<?dbhtml filename="scheduling.html"?>
9	bondari	4
57	jermar	5	<title>Scheduling</title>
9	bondari	6
57	jermar	7	<para>One of the key aims of the operating system is to create and support
		8	the impression that several activities are executing contemporarily. This is
		9	true for both uniprocessor as well as multiprocessor systems. In the case of
169	jermar	10	multiprocessor systems, the activities are truly happening in parallel. The
57	jermar	11	scheduler helps to materialize this impression by planning threads on as
142	jermar	12	many processors as possible and, when this strategy reaches its limits, by
57	jermar	13	quickly switching among threads executing on a single processor.</para>
		14
		15	<section>
		16	<title>Contexts</title>
		17
140	palkovsky	18	<para>The term <emphasis>context</emphasis> refers to the set of processor
		19	resources that define the current state of the computation or the
		20	environment and the kernel understands it in several more or less narrow
		21	sences:</para>
57	jermar	22
		23	<itemizedlist>
		24	<listitem>
		25	<para>synchronous register context,</para>
		26	</listitem>
		27
		28	<listitem>
		29	<para>asynchronous register context,</para>
		30	</listitem>
		31
		32	<listitem>
		33	<para>FPU context and</para>
		34	</listitem>
		35
		36	<listitem>
		37	<para>memory management context.</para>
		38	</listitem>
		39	</itemizedlist>
		40
76	palkovsky	41	<para>The most narrow sense refers to the the synchronous register
57	jermar	42	context. It includes all the preserved registers as defined by the
		43	architecture. To highlight some, the program counter and stack pointer
76	palkovsky	44	take part in the synchronous register context. These registers must be
		45	preserved across a procedure call and during synchronous context
		46	switches.</para>
57	jermar	47
		48	<para>The next type of the context understood by the kernel is the
		49	asynchronous register context. On an interrupt, the interrupted execution
		50	flow's state must be guaranteed to be eventually completely restored.
		51	Therefore the interrupt context includes, among other things, the scratch
		52	registers as defined by the architecture. As a special optimization and if
		53	certain conditions are met, it need not include the architecture's
		54	preserved registers. The condition mentioned in the previous sentence is
		55	that the low-level assembly language interrupt routines don't modify the
		56	preserved registers. The handlers usually call a higher-level C routine.
		57	The preserved registers are then saved on the stack by the compiler
		58	generated code of the higher-level function. In HelenOS, several
		59	architectures can be compiled with this optimization.</para>
		60
		61	<para>Although the kernel does not do any floating point
		62	arithmetics<footnote>
		63	<para>Some architectures (e.g. ia64) inevitably use a fixed set of
64	jermar	64	floating point registers to carry out their normal operations.</para>
57	jermar	65	</footnote>, it must protect FPU context of userspace threads against
		66	destruction by other threads. Moreover, only a fraction of userspace
		67	programs use the floating point unit. HelenOS contains a generic framework
64	jermar	68	for switching FPU context only when the switch is forced (i.e. a thread
		69	uses a floating point instruction and its FPU context is not loaded in the
		70	processor).</para>
57	jermar	71
		72	<para>The last member of the context family is the memory management
		73	context. It includes memory management registers that identify address
		74	spaces on hardware level (i.e. ASIDs and page tables pointers).</para>
		75
9	bondari	76	<section>
131	jermar	77	<title>Synchronous Context Switches</title>
9	bondari	78
57	jermar	79	<para>The scheduler, but also other pieces of the kernel, make heavy use
		80	of synchronous context switches, because it is a natural vehicle not
		81	only for changes in control flow, but also for switching between two
		82	kernel stacks. Two functions figure in a synchronous context switch
133	jermar	83	implementation: <code>context_save()</code> and
		84	<code>context_restore()</code>. Note that these two functions break the
57	jermar	85	natural perception of the linear C code execution flow starting at
		86	function's entry point and ending on one of the function's exit
		87	points.</para>
		88
133	jermar	89	<para>When the <code>context_save()</code> function is called, the
57	jermar	90	synchronous context is saved in a memory structure passed to it. After
133	jermar	91	executing <code>context_save()</code>, the caller is returned 1 as a
57	jermar	92	return value. The execution of instructions continues as normally until
133	jermar	93	<code>context_restore()</code> is called. For the caller, it seems like
57	jermar	94	the call never returns<footnote>
		95	<para>Which might be a source of problems with variable liveliness
133	jermar	96	after <code>context_restore()</code>.</para>
57	jermar	97	</footnote>. Nevertheless, a synchronous register context, which is
133	jermar	98	saved in a memory structure passed to <code>context_restore()</code>, is
57	jermar	99	restored, thus transfering the control flow to the place of occurrence
133	jermar	100	of the corresponding call to <code>context_save()</code>. From the
57	jermar	101	perspective of the caller of the corresponding
133	jermar	102	<code>context_save()</code>, it looks like a return from
		103	<code>context_save()</code>. However, this time a return value of 0 is
57	jermar	104	returned.</para>
9	bondari	105	</section>
57	jermar	106	</section>
9	bondari	107
57	jermar	108	<section>
58	jermar	109	<title>Threads</title>
9	bondari	110
142	jermar	111	<para>A thread is the basic executable entity with some code and a stack.
58	jermar	112	While the code, implemented by a C language function, can be shared by
		113	several threads, the stack is always private to each instance of the
		114	thread. Each thread belongs to exactly one task through which it shares
		115	address space with its sibling threads. Threads that execute purely in the
		116	kernel don't have any userspace memory allocated. However, when a thread
		117	has ambitions to run in userspace, it must be allocated a userspace stack.
		118	The distinction between the purely kernel threads and threads running also
		119	in userspace is made by refering to the former group as to kernel threads
		120	and to the latter group as to userspace threads. Both kernel and userspace
		121	threads are visible to the scheduler and can become a subject of kernel
140	palkovsky	122	preemption and thread migration anytime when preemption is not
		123	disabled.</para>
58	jermar	124
64	jermar	125	<formalpara>
131	jermar	126	<title>Thread States</title>
62	jermar	127
75	jermar	128	<para>In each moment, a thread exists in one of six possible thread
140	palkovsky	129	states. When the thread is created and first inserted into the
64	jermar	130	scheduler's run queues or when a thread is migrated to a new processor,
140	palkovsky	131	it is put into the <constant>Entering</constant> state. After some time
		132	elapses, the scheduler picks up the thread and starts executing it. A
		133	thread being currently executed on a processor is in the
75	jermar	134	<constant>Running</constant> state. From there, the thread has three
		135	possibilities. It either runs until it is preemtped, in which case the
		136	state changes to <constant>Ready</constant>, goes to the
		137	<constant>Sleeping</constant> state by going to sleep or enters the
		138	<constant>Exiting</constant> state when it reaches termination. When the
		139	thread exits, its kernel structure usually stays in memory, until the
133	jermar	140	thread is detached by another thread using <code>thread_detach()</code>
75	jermar	141	function. Terminated but undetached threads are in the
171	jermar	142	<constant>Lingering</constant> state. When the thread is detached or
75	jermar	143	detaches itself during its life, it is destroyed in the
171	jermar	144	<constant>Exiting</constant> state and the
		145	<constant>Lingering</constant> state is not reached.<figure float="1">
64	jermar	146	<title>Transitions among thread states.</title>
61	jermar	147
64	jermar	148	<mediaobject id="thread_states" xreflabel="">
87	bondari	149	<imageobject role="pdf">
131	jermar	150	<imagedata fileref="images/thread_states.pdf" format="PDF" />
77	bondari	151	</imageobject>
		152
64	jermar	153	<imageobject role="html">
		154	<imagedata fileref="images/thread_states.png" format="PNG" />
		155	</imageobject>
62	jermar	156
64	jermar	157	<imageobject role="fop">
131	jermar	158	<imagedata fileref="images/thread_states.svg" format="SVG" />
64	jermar	159	</imageobject>
		160	</mediaobject>
		161	</figure></para>
		162	</formalpara>
58	jermar	163
64	jermar	164	<formalpara>
171	jermar	165	<title>Fibrils</title>
58	jermar	166
64	jermar	167	<para>HelenOS userspace layer knows even smaller units of execution.
171	jermar	168	Each userspace thread can make use of an arbitrary number of fibrils.
		169	These fibrils have their own synchronous register context, userspace
		170	code and stack. They live their own life within the userspace thread and
		171	the scheduler does not have any idea about them because they are
		172	completely implemented by the userspace library. This implies several
		173	things:<itemizedlist>
64	jermar	174	<listitem>
171	jermar	175	<para>fibrils schedule themselves cooperatively within the time
		176	slice given to their userspace thread,</para>
64	jermar	177	</listitem>
58	jermar	178
64	jermar	179	<listitem>
171	jermar	180	<para>fibrils share FPU context of their containing thread
64	jermar	181	and</para>
		182	</listitem>
		183
		184	<listitem>
171	jermar	185	<para>all fibrils of one userspace thread block when one of them
		186	goes to sleep.</para>
64	jermar	187	</listitem>
		188	</itemizedlist></para>
		189	</formalpara>
59	jermar	190	</section>
58	jermar	191
59	jermar	192	<section>
		193	<title>Scheduler</title>
		194
		195	<section>
131	jermar	196	<title>Run Queues</title>
59	jermar	197
		198	<para>There is an array of several run queues on each processor. The
		199	current version of HelenOS uses 16 run queues implemented by 16 doubly
		200	linked lists. Each of the run queues is associated with thread priority.
		201	The lower the run queue index in the array is, the higher is the
		202	priority of threads linked in that run queue and the shorter is the time
		203	in which those threads will execute. When kernel code wants to access
		204	the run queue, it must first acquire its lock.</para>
		205	</section>
		206
		207	<section>
131	jermar	208	<title>Scheduler Operation</title>
59	jermar	209
		210	<para>The scheduler is invoked either explicitly when a thread calls the
133	jermar	211	<code>scheduler()</code> function (e.g. goes to sleep or merely wants to
59	jermar	212	relinquish the processor for a while) or implicitly on a periodic basis
		213	when the generic clock interrupt preempts the current thread. After its
		214	invocation, the scheduler saves the synchronous register context of the
		215	current thread and switches to its private stack. Afterwards, a new
		216	thread is selected according to the scheduling policy. If there is no
		217	suitable thread, the processor is idle and no thread executes on it.
		218	Note that the act of switching to the private scheduler stack is
		219	essential. If the processor kept running using the stack of the
		220	preempted thread it could damage it because the old thread can be
		221	migrated to another processor and scheduled there. In the worst case
62	jermar	222	scenario, two execution flows would be using the same stack.</para>
59	jermar	223
140	palkovsky	224	<para>The scheduling policy is implemented in the function
159	jermar	225	<code>find_best_thread()</code>. This function walks the processor run
59	jermar	226	queues from lower towards higher indices and looks for a thread. If the
		227	visited run queue is empty, it simply searches the next run queue. If it
		228	is known in advance that there are no ready threads waiting for
133	jermar	229	execution, <code>find_best_thread()</code> interruptibly halts the
59	jermar	230	processor or busy waits until some threads arrive. This process repeats
133	jermar	231	until <code>find_best_thread()</code> succeeds.</para>
59	jermar	232
		233	<para>After the best thread is chosen, the scheduler switches to the
		234	thread's task and memory management context. Finally, the saved
		235	synchronous register context is restored and the thread runs. Each
		236	scheduled thread is given a time slice depending on its priority (i.e.
		237	run queue). The higher priority, the shorter timeslice. To summarize,
		238	this policy schedules threads with high priorities more frequently but
		239	gives them smaller time slices. On the other hand, lower priority
		240	threads are scheduled less frequently, but run for longer periods of
		241	time.</para>
		242
		243	<para>When a thread uses its entire time slice, it is preempted and put
		244	back into the run queue that immediately follows the previous run queue
		245	from which the thread ran. Threads that are woken up from a sleep are
		246	put into the biggest priority run queue. Low priority threads are
		247	therefore those that don't go to sleep so often and just occupy the
		248	processor.</para>
		249
		250	<para>In order to avoid complete starvation of the low priority threads,
		251	from time to time, the scheduler will provide them with a bonus of one
		252	point priority increase. In other words, the scheduler will now and then
		253	move the entire run queues one level up.</para>
		254	</section>
		255
		256	<section>
131	jermar	257	<title>Processor Load Balancing</title>
59	jermar	258
		259	<para>Normally, for the sake of cache locality, threads are scheduled on
		260	one of the processors and don't leave it. Nevertheless, a situation in
		261	which one processor is heavily overloaded while others sit idle can
76	palkovsky	262	occur. HelenOS deploys special kernel threads to help mitigate this
59	jermar	263	problem. Each processor is associated with one load balancing thread
76	palkovsky	264	called <code>kcpulb</code> that wakes up regularly to see whether its
		265	processor is underbalanced or not. If it is, the thread attempts to
59	jermar	266	migrate threads from other overloaded processors to its own processor's
		267	run queues. When the job is done or there is no need for load balancing,
		268	the thread goes to sleep.</para>
		269
		270	<para>The balancing threads operate very gently and try to migrate low
		271	priority threads first; one <code>kcpulb</code> never takes from one
		272	processor twice in a row. The load balancing threads as well as threads
		273	that were just stolen cannot be migrated. The <code>kcpulb</code>
		274	threads are wired to their processors and cannot be migrated whatsoever.
		275	The ordinary threads are protected only until they are
		276	rescheduled.</para>
		277	</section>
57	jermar	278	</section>
		279	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_scheduling.xml @ 184 – Rev 171