WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_time.xml

Rev	Author	Line No.	Line
52	jermar	1	<?xml version="1.0" encoding="UTF-8"?>
55	jermar	2	<chapter id="time">
		3	<?dbhtml filename="time.html"?>
52	jermar	4
131	jermar	5	<title>Time Management</title>
52	jermar	6
139	palkovsky	7	<para>Time is one of the dimensions in which kernel as well as the whole
		8	system operates. It is of special importance to many kernel subsytems.
55	jermar	9	Knowledge of time makes it possible for the scheduler to preemptively plan
		10	threads for execution. Different parts of the kernel can request execution
139	palkovsky	11	of their callback function with a specified delay. A good example of such
55	jermar	12	kernel code is the synchronization subsystem which uses this functionality
		13	to implement timeouting versions of synchronization primitives.</para>
		14
		15	<section>
131	jermar	16	<title>System Clock</title>
53	jermar	17
55	jermar	18	<para>Every hardware architecture supported by HelenOS must support some
		19	kind of a device that can be programmed to yield periodic time signals
		20	(i.e. clock interrupts). Some architectures have external clock that is
		21	merely programmed by the kernel to interrupt the processor multiple times
		22	in a second. This is the case of ia32 and amd64 architectures<footnote>
		23	<para>When running in uniprocessor mode.</para>
		24	</footnote>, which use i8254 or a compatible chip to achieve the
		25	goal.</para>
		26
		27	<para>Other architectures' processors typically contain two registers. The
		28	first register is usually called a compare or a match register and can be
		29	set to an arbitrary value by the operating system. The contents of the
		30	compare register then stays unaltered until it is written by the kernel
		31	again. The second register, often called a counter register, can be also
		32	written by the kernel, but the processor automatically increments it after
		33	every executed instruction or in some fixed relation to processor speed.
		34	The point is that a clock interrupt is generated whenever the values of
		35	the counter and the compare registers match. Sometimes, the scheme of two
		36	registers is modified so that only one register is needed. Such a
		37	register, called a decrementer, then counts towards zero and an interrupt
		38	is generated when zero is reached.</para>
		39
		40	<para>In any case, the initial value of the decrementer or the initial
		41	difference between the counter and the compare registers, respectively,
		42	must be set accordingly to a known relation between the real time and the
		43	speed of the decrementer or the counter register, respectively.</para>
		44
		45	<para>The rest of this section will, for the sake of clarity, focus on the
		46	two-register scheme. The decrementer scheme is very similar.</para>
		47
58	jermar	48	<para>The kernel must reinitialize one of the two registers after each
		49	clock interrupt in order to schedule next interrupt. However this step is
		50	tricky and must be done with caution. Imagine that the clock interrupt is
		51	masked either because the kernel is servicing another interrupt or because
		52	the processor locally disabled interrupts for a while. If the clock
139	palkovsky	53	interrupt occurs during this period, it will be pending until the
		54	interrupts are enabled again. Theoretically, it could happen an arbitrary
		55	counter register ticks later. Which is worse, the ideal time period
		56	between two non-delayed clock interrupts can also elapse arbitrary number
		57	of times before the delayed interrupt gets serviced. The
		58	architecture-specific part of the clock interrupt driver must avoid time
		59	drifts caused by such behaviour by taking proactive
		60	counter-measures.</para>
55	jermar	61
		62	<para>Let us assume that the kernel wants each clock interrupt be
		63	generated every <constant>TICKCONST</constant> ticks. This value
		64	represents the ideal number of ticks between two non-delayed clock
		65	interrupts and has some known relation to real time. On each clock
		66	interrupt, the kernel computes and writes down the expected value of the
		67	counter register as it hopes to read it on the next clock interrupt. When
		68	that interrupt comes, the kernel reads the counter register again and
		69	compares it with the written down value. If the difference is smaller than
		70	or equal to <constant>TICKCONST</constant>, then the time drift is none or
		71	small and the next interrupt is scheduled earlier with a penalty of so
		72	many ticks as is the value of the difference. However, if the difference
		73	is bigger, then at least one clock signal was missed. In that case, the
		74	missed clock signal is remembered in the special counter. If there are
		75	more missed signals, each of them is recorded there. The next interrupt is
		76	scheduled with respect to the difference similarily to the former case.
		77	This time, the penalty is taken modulo <constant>TICKCONST</constant>. The
		78	effect of missed clock signals is remedied in the generic clock interrupt
		79	handler.</para>
		80	</section>
		81
		82	<section>
		83	<title>Timeouts</title>
		84
		85	<para>Kernel subsystems can register a callback function to be executed
		86	with a specified delay. Such a registration is represented by a kernel
		87	structure called <classname>timeout</classname>. Timeouts are registered
131	jermar	88	via <code>timeout_register</code> function. This function takes a pointer
55	jermar	89	to a timeout structure, a callback function, a parameter of the callback
		90	function and a delay in microseconds as parameters. After the structure is
		91	initialized with all these values, it is sorted into the processor's list
56	jermar	92	of active timeouts, according to the number of clock interrupts remaining
		93	to their expiration and relatively to already listed timeouts.</para>
55	jermar	94
131	jermar	95	<para>Timeouts can be unregistered via <code>timeout_unregister</code>.
		96	This function can, as opposed to <code>timeout_register</code>, fail when
55	jermar	97	it is too late to remove the timeout from the list of active
		98	timeouts.</para>
		99
		100	<para>Timeouts are nearing their expiration in the list of active timeouts
		101	which exists on every processor in the system. The expiration counters are
		102	decremented on each clock interrupt by the generic clock interrupt
		103	handler. Due to the relative ordering of timeouts in the list, it is
		104	sufficient to decrement expiration counter only of the first timeout in
		105	the list. Timeouts with expiration counter equal to zero are removed from
		106	the list and their callback function is called with respective
		107	parameter.</para>
		108	</section>
		109
		110	<section>
131	jermar	111	<title>Generic Clock Interrupt Handler</title>
55	jermar	112
		113	<para>On each clock interrupt, the architecture specific part of the clock
		114	interrupt handler makes a call to the generic clock interrupt handler
131	jermar	115	implemented by the <code>clock</code> function. The generic handler takes
55	jermar	116	care of several mission critical goals:</para>
		117
		118	<itemizedlist>
		119	<listitem>
		120	<para>expiration of timeouts,</para>
		121	</listitem>
		122
		123	<listitem>
		124	<para>updating time of the day counters for userspace and</para>
		125	</listitem>
		126
		127	<listitem>
		128	<para>preemption of threads.</para>
		129	</listitem>
		130	</itemizedlist>
		131
131	jermar	132	<para>The <code>clock</code> function checks for expired timeouts and
56	jermar	133	decrements unexpired timeout expiration counters exactly one more times
		134	than is the number of missed clock signals (i.e. at least once and
		135	possibly more times, depending on the missed clock signals counter). The
		136	time of the day counters are also updated one more times than is the
		137	number of missed clock signals. And finally, the remaining timeslice of
		138	the running thread is decremented with respect to this counter as well. By
		139	considering its value, the kernel performs actions that would otherwise be
		140	lost due to an occasional excessive time drift described in previous
		141	paragraphs.</para>
55	jermar	142	</section>
56	jermar	143
		144	<section>
131	jermar	145	<title>Time Source for Userspace</title>
56	jermar	146
		147	<para>In HelenOS, userspace tasks don't communicate with the kernel in
85	palkovsky	148	order to read the system time. Instead, a mechanism that shares kernel
		149	time of the the day counters with userspace address spaces is deployed. On
		150	the kernel side, during system initialization, HelenOS allocates a frame
		151	of physical memory and stores the time of the day counters there. The
		152	counters have the following structure:</para>
56	jermar	153
		154	<itemizedlist>
		155	<listitem>
		156	<para>first 32-bit counter for seconds,</para>
		157	</listitem>
		158
		159	<listitem>
		160	<para>32-bit counter for microseconds and</para>
		161	</listitem>
		162
		163	<listitem>
		164	<para>second 32-bit counter for seconds.</para>
		165	</listitem>
		166	</itemizedlist>
		167
		168	<para>One of the userspace tasks with capabilities of memory manager (e.g.
85	palkovsky	169	ns) asks the kernel to map this frame into its address space. Other
		170	non-privileged tasks then use IPC to receive read-only share of this
		171	memory. Reading time in a userspace task is therefore just a matter of
		172	reading memory.</para>
56	jermar	173
		174	<para>There are two interesting points about this. First, the counters are
		175	32-bit even on 64-bit machines. The goal is to provide subsecond precision
		176	with the possibility to span roughly 136 years. Note that a single 64-bit
		177	microsecond counter could not be usually read atomically on 32-bit
85	palkovsky	178	platforms. Unfortunately, on 32-bit platforms it is usually impossible to
		179	read atomically two 32-bit counters either. However, a generic protocol is
		180	used to guarantee that sequentially read times will create a
		181	non-decreasing sequence.</para>
56	jermar	182
85	palkovsky	183	<para>The problematic part is incrementing seconds counter and clearing
		184	microseconds counter together once every second. Seconds must be
		185	incremented and microseconds must be reset. However, without any
		186	synchronization, the two kernel stores and the two userspace reads can
		187	arbitrarily interleave. Furthemore, the reader has no chance to detect
		188	that the counters were updated only paritally. Therefore three counters
		189	are used in HelenOS.</para>
56	jermar	190
		191	<para>If seconds need to be updated, the kernel increments the first
		192	second counter, issues a write memory barrier operation, updates the
		193	microsecond counter, issues another write memory barrier operation and
85	palkovsky	194	increments the second second counter. When only microseconds needs to be
56	jermar	195	updated, no special action is taken by the kernel. On the other hand, the
85	palkovsky	196	userspace task must always read all three counters in reversed order. A
		197	read memory barrier operation must be issued between each two reads. A
56	jermar	198	non-atomic read is detected when the two second counters differ. The
85	palkovsky	199	userspace library solves this situation by returning higher of them with
		200	microseconds set to zero.</para>
56	jermar	201	</section>
52	jermar	202	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_time.xml @ 97 – Rev 139