Rev 131 | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
52 | jermar | 1 | <?xml version="1.0" encoding="UTF-8"?> |
55 | jermar | 2 | <chapter id="time"> |
3 | <?dbhtml filename="time.html"?> |
||
52 | jermar | 4 | |
131 | jermar | 5 | <title>Time Management</title> |
52 | jermar | 6 | |
139 | palkovsky | 7 | <para>Time is one of the dimensions in which kernel as well as the whole |
8 | system operates. It is of special importance to many kernel subsytems. |
||
55 | jermar | 9 | Knowledge of time makes it possible for the scheduler to preemptively plan |
10 | threads for execution. Different parts of the kernel can request execution |
||
139 | palkovsky | 11 | of their callback function with a specified delay. A good example of such |
55 | jermar | 12 | kernel code is the synchronization subsystem which uses this functionality |
13 | to implement timeouting versions of synchronization primitives.</para> |
||
14 | |||
15 | <section> |
||
131 | jermar | 16 | <title>System Clock</title> |
53 | jermar | 17 | |
55 | jermar | 18 | <para>Every hardware architecture supported by HelenOS must support some |
19 | kind of a device that can be programmed to yield periodic time signals |
||
20 | (i.e. clock interrupts). Some architectures have external clock that is |
||
21 | merely programmed by the kernel to interrupt the processor multiple times |
||
22 | in a second. This is the case of ia32 and amd64 architectures<footnote> |
||
23 | <para>When running in uniprocessor mode.</para> |
||
24 | </footnote>, which use i8254 or a compatible chip to achieve the |
||
25 | goal.</para> |
||
26 | |||
27 | <para>Other architectures' processors typically contain two registers. The |
||
28 | first register is usually called a compare or a match register and can be |
||
29 | set to an arbitrary value by the operating system. The contents of the |
||
30 | compare register then stays unaltered until it is written by the kernel |
||
31 | again. The second register, often called a counter register, can be also |
||
32 | written by the kernel, but the processor automatically increments it after |
||
33 | every executed instruction or in some fixed relation to processor speed. |
||
34 | The point is that a clock interrupt is generated whenever the values of |
||
35 | the counter and the compare registers match. Sometimes, the scheme of two |
||
36 | registers is modified so that only one register is needed. Such a |
||
37 | register, called a decrementer, then counts towards zero and an interrupt |
||
38 | is generated when zero is reached.</para> |
||
39 | |||
40 | <para>In any case, the initial value of the decrementer or the initial |
||
41 | difference between the counter and the compare registers, respectively, |
||
42 | must be set accordingly to a known relation between the real time and the |
||
43 | speed of the decrementer or the counter register, respectively.</para> |
||
44 | |||
45 | <para>The rest of this section will, for the sake of clarity, focus on the |
||
46 | two-register scheme. The decrementer scheme is very similar.</para> |
||
47 | |||
58 | jermar | 48 | <para>The kernel must reinitialize one of the two registers after each |
49 | clock interrupt in order to schedule next interrupt. However this step is |
||
50 | tricky and must be done with caution. Imagine that the clock interrupt is |
||
51 | masked either because the kernel is servicing another interrupt or because |
||
52 | the processor locally disabled interrupts for a while. If the clock |
||
139 | palkovsky | 53 | interrupt occurs during this period, it will be pending until the |
54 | interrupts are enabled again. Theoretically, it could happen an arbitrary |
||
55 | counter register ticks later. Which is worse, the ideal time period |
||
56 | between two non-delayed clock interrupts can also elapse arbitrary number |
||
57 | of times before the delayed interrupt gets serviced. The |
||
58 | architecture-specific part of the clock interrupt driver must avoid time |
||
59 | drifts caused by such behaviour by taking proactive |
||
60 | counter-measures.</para> |
||
55 | jermar | 61 | |
62 | <para>Let us assume that the kernel wants each clock interrupt be |
||
63 | generated every <constant>TICKCONST</constant> ticks. This value |
||
64 | represents the ideal number of ticks between two non-delayed clock |
||
65 | interrupts and has some known relation to real time. On each clock |
||
66 | interrupt, the kernel computes and writes down the expected value of the |
||
67 | counter register as it hopes to read it on the next clock interrupt. When |
||
68 | that interrupt comes, the kernel reads the counter register again and |
||
69 | compares it with the written down value. If the difference is smaller than |
||
70 | or equal to <constant>TICKCONST</constant>, then the time drift is none or |
||
71 | small and the next interrupt is scheduled earlier with a penalty of so |
||
72 | many ticks as is the value of the difference. However, if the difference |
||
73 | is bigger, then at least one clock signal was missed. In that case, the |
||
74 | missed clock signal is remembered in the special counter. If there are |
||
75 | more missed signals, each of them is recorded there. The next interrupt is |
||
76 | scheduled with respect to the difference similarily to the former case. |
||
77 | This time, the penalty is taken modulo <constant>TICKCONST</constant>. The |
||
78 | effect of missed clock signals is remedied in the generic clock interrupt |
||
79 | handler.</para> |
||
80 | </section> |
||
81 | |||
82 | <section> |
||
83 | <title>Timeouts</title> |
||
84 | |||
85 | <para>Kernel subsystems can register a callback function to be executed |
||
86 | with a specified delay. Such a registration is represented by a kernel |
||
87 | structure called <classname>timeout</classname>. Timeouts are registered |
||
131 | jermar | 88 | via <code>timeout_register</code> function. This function takes a pointer |
55 | jermar | 89 | to a timeout structure, a callback function, a parameter of the callback |
90 | function and a delay in microseconds as parameters. After the structure is |
||
91 | initialized with all these values, it is sorted into the processor's list |
||
56 | jermar | 92 | of active timeouts, according to the number of clock interrupts remaining |
93 | to their expiration and relatively to already listed timeouts.</para> |
||
55 | jermar | 94 | |
131 | jermar | 95 | <para>Timeouts can be unregistered via <code>timeout_unregister</code>. |
96 | This function can, as opposed to <code>timeout_register</code>, fail when |
||
55 | jermar | 97 | it is too late to remove the timeout from the list of active |
98 | timeouts.</para> |
||
99 | |||
100 | <para>Timeouts are nearing their expiration in the list of active timeouts |
||
101 | which exists on every processor in the system. The expiration counters are |
||
102 | decremented on each clock interrupt by the generic clock interrupt |
||
103 | handler. Due to the relative ordering of timeouts in the list, it is |
||
104 | sufficient to decrement expiration counter only of the first timeout in |
||
105 | the list. Timeouts with expiration counter equal to zero are removed from |
||
106 | the list and their callback function is called with respective |
||
107 | parameter.</para> |
||
108 | </section> |
||
109 | |||
110 | <section> |
||
131 | jermar | 111 | <title>Generic Clock Interrupt Handler</title> |
55 | jermar | 112 | |
113 | <para>On each clock interrupt, the architecture specific part of the clock |
||
114 | interrupt handler makes a call to the generic clock interrupt handler |
||
131 | jermar | 115 | implemented by the <code>clock</code> function. The generic handler takes |
55 | jermar | 116 | care of several mission critical goals:</para> |
117 | |||
118 | <itemizedlist> |
||
119 | <listitem> |
||
120 | <para>expiration of timeouts,</para> |
||
121 | </listitem> |
||
122 | |||
123 | <listitem> |
||
124 | <para>updating time of the day counters for userspace and</para> |
||
125 | </listitem> |
||
126 | |||
127 | <listitem> |
||
128 | <para>preemption of threads.</para> |
||
129 | </listitem> |
||
130 | </itemizedlist> |
||
131 | |||
131 | jermar | 132 | <para>The <code>clock</code> function checks for expired timeouts and |
56 | jermar | 133 | decrements unexpired timeout expiration counters exactly one more times |
134 | than is the number of missed clock signals (i.e. at least once and |
||
135 | possibly more times, depending on the missed clock signals counter). The |
||
136 | time of the day counters are also updated one more times than is the |
||
137 | number of missed clock signals. And finally, the remaining timeslice of |
||
138 | the running thread is decremented with respect to this counter as well. By |
||
139 | considering its value, the kernel performs actions that would otherwise be |
||
140 | lost due to an occasional excessive time drift described in previous |
||
141 | paragraphs.</para> |
||
55 | jermar | 142 | </section> |
56 | jermar | 143 | |
144 | <section> |
||
131 | jermar | 145 | <title>Time Source for Userspace</title> |
56 | jermar | 146 | |
147 | <para>In HelenOS, userspace tasks don't communicate with the kernel in |
||
85 | palkovsky | 148 | order to read the system time. Instead, a mechanism that shares kernel |
149 | time of the the day counters with userspace address spaces is deployed. On |
||
150 | the kernel side, during system initialization, HelenOS allocates a frame |
||
151 | of physical memory and stores the time of the day counters there. The |
||
152 | counters have the following structure:</para> |
||
56 | jermar | 153 | |
154 | <itemizedlist> |
||
155 | <listitem> |
||
156 | <para>first 32-bit counter for seconds,</para> |
||
157 | </listitem> |
||
158 | |||
159 | <listitem> |
||
160 | <para>32-bit counter for microseconds and</para> |
||
161 | </listitem> |
||
162 | |||
163 | <listitem> |
||
164 | <para>second 32-bit counter for seconds.</para> |
||
165 | </listitem> |
||
166 | </itemizedlist> |
||
167 | |||
168 | <para>One of the userspace tasks with capabilities of memory manager (e.g. |
||
85 | palkovsky | 169 | ns) asks the kernel to map this frame into its address space. Other |
170 | non-privileged tasks then use IPC to receive read-only share of this |
||
171 | memory. Reading time in a userspace task is therefore just a matter of |
||
172 | reading memory.</para> |
||
56 | jermar | 173 | |
174 | <para>There are two interesting points about this. First, the counters are |
||
175 | 32-bit even on 64-bit machines. The goal is to provide subsecond precision |
||
176 | with the possibility to span roughly 136 years. Note that a single 64-bit |
||
177 | microsecond counter could not be usually read atomically on 32-bit |
||
85 | palkovsky | 178 | platforms. Unfortunately, on 32-bit platforms it is usually impossible to |
179 | read atomically two 32-bit counters either. However, a generic protocol is |
||
180 | used to guarantee that sequentially read times will create a |
||
181 | non-decreasing sequence.</para> |
||
56 | jermar | 182 | |
85 | palkovsky | 183 | <para>The problematic part is incrementing seconds counter and clearing |
184 | microseconds counter together once every second. Seconds must be |
||
185 | incremented and microseconds must be reset. However, without any |
||
186 | synchronization, the two kernel stores and the two userspace reads can |
||
187 | arbitrarily interleave. Furthemore, the reader has no chance to detect |
||
188 | that the counters were updated only paritally. Therefore three counters |
||
189 | are used in HelenOS.</para> |
||
56 | jermar | 190 | |
191 | <para>If seconds need to be updated, the kernel increments the first |
||
192 | second counter, issues a write memory barrier operation, updates the |
||
193 | microsecond counter, issues another write memory barrier operation and |
||
85 | palkovsky | 194 | increments the second second counter. When only microseconds needs to be |
56 | jermar | 195 | updated, no special action is taken by the kernel. On the other hand, the |
85 | palkovsky | 196 | userspace task must always read all three counters in reversed order. A |
197 | read memory barrier operation must be issued between each two reads. A |
||
56 | jermar | 198 | non-atomic read is detected when the two second counters differ. The |
85 | palkovsky | 199 | userspace library solves this situation by returning higher of them with |
200 | microseconds set to zero.</para> |
||
56 | jermar | 201 | </section> |
52 | jermar | 202 | </chapter> |