Rev 56 | Rev 127 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
52 | jermar | 1 | <?xml version="1.0" encoding="UTF-8"?> |
55 | jermar | 2 | <chapter id="time"> |
3 | <?dbhtml filename="time.html"?> |
||
52 | jermar | 4 | |
55 | jermar | 5 | <title>Time management</title> |
52 | jermar | 6 | |
55 | jermar | 7 | <para>Time is one of the dimensions in which kernel, as well as the whole |
8 | system, operates. It is of special importance to many kernel subsytems. |
||
9 | Knowledge of time makes it possible for the scheduler to preemptively plan |
||
10 | threads for execution. Different parts of the kernel can request execution |
||
11 | of their callback function with some specified delay. A good example of such |
||
12 | kernel code is the synchronization subsystem which uses this functionality |
||
13 | to implement timeouting versions of synchronization primitives.</para> |
||
14 | |||
15 | <section> |
||
53 | jermar | 16 | <title>System clock</title> |
17 | |||
55 | jermar | 18 | <para>Every hardware architecture supported by HelenOS must support some |
19 | kind of a device that can be programmed to yield periodic time signals |
||
20 | (i.e. clock interrupts). Some architectures have external clock that is |
||
21 | merely programmed by the kernel to interrupt the processor multiple times |
||
22 | in a second. This is the case of ia32 and amd64 architectures<footnote> |
||
23 | <para>When running in uniprocessor mode.</para> |
||
24 | </footnote>, which use i8254 or a compatible chip to achieve the |
||
25 | goal.</para> |
||
26 | |||
27 | <para>Other architectures' processors typically contain two registers. The |
||
28 | first register is usually called a compare or a match register and can be |
||
29 | set to an arbitrary value by the operating system. The contents of the |
||
30 | compare register then stays unaltered until it is written by the kernel |
||
31 | again. The second register, often called a counter register, can be also |
||
32 | written by the kernel, but the processor automatically increments it after |
||
33 | every executed instruction or in some fixed relation to processor speed. |
||
34 | The point is that a clock interrupt is generated whenever the values of |
||
35 | the counter and the compare registers match. Sometimes, the scheme of two |
||
36 | registers is modified so that only one register is needed. Such a |
||
37 | register, called a decrementer, then counts towards zero and an interrupt |
||
38 | is generated when zero is reached.</para> |
||
39 | |||
40 | <para>In any case, the initial value of the decrementer or the initial |
||
41 | difference between the counter and the compare registers, respectively, |
||
42 | must be set accordingly to a known relation between the real time and the |
||
43 | speed of the decrementer or the counter register, respectively.</para> |
||
44 | |||
45 | <para>The rest of this section will, for the sake of clarity, focus on the |
||
46 | two-register scheme. The decrementer scheme is very similar.</para> |
||
47 | |||
58 | jermar | 48 | <para>The kernel must reinitialize one of the two registers after each |
49 | clock interrupt in order to schedule next interrupt. However this step is |
||
50 | tricky and must be done with caution. Imagine that the clock interrupt is |
||
51 | masked either because the kernel is servicing another interrupt or because |
||
52 | the processor locally disabled interrupts for a while. If the clock |
||
53 | interrupt occurs during this period, it will be pending until interrupts |
||
54 | are enabled again. In theory, that could happen arbitrary counter register |
||
55 | ticks later. Which is worse, the ideal time period between two non-delayed |
||
56 | clock interrupts can also elapse arbitrary number of times before the |
||
57 | delayed interrupt gets serviced. The architecture-specific part of the |
||
58 | clock interrupt driver must avoid time drifts caused by this by taking |
||
59 | proactive counter-measures.</para> |
||
55 | jermar | 60 | |
61 | <para>Let us assume that the kernel wants each clock interrupt be |
||
62 | generated every <constant>TICKCONST</constant> ticks. This value |
||
63 | represents the ideal number of ticks between two non-delayed clock |
||
64 | interrupts and has some known relation to real time. On each clock |
||
65 | interrupt, the kernel computes and writes down the expected value of the |
||
66 | counter register as it hopes to read it on the next clock interrupt. When |
||
67 | that interrupt comes, the kernel reads the counter register again and |
||
68 | compares it with the written down value. If the difference is smaller than |
||
69 | or equal to <constant>TICKCONST</constant>, then the time drift is none or |
||
70 | small and the next interrupt is scheduled earlier with a penalty of so |
||
71 | many ticks as is the value of the difference. However, if the difference |
||
72 | is bigger, then at least one clock signal was missed. In that case, the |
||
73 | missed clock signal is remembered in the special counter. If there are |
||
74 | more missed signals, each of them is recorded there. The next interrupt is |
||
75 | scheduled with respect to the difference similarily to the former case. |
||
76 | This time, the penalty is taken modulo <constant>TICKCONST</constant>. The |
||
77 | effect of missed clock signals is remedied in the generic clock interrupt |
||
78 | handler.</para> |
||
79 | </section> |
||
80 | |||
81 | <section> |
||
82 | <title>Timeouts</title> |
||
83 | |||
84 | <para>Kernel subsystems can register a callback function to be executed |
||
85 | with a specified delay. Such a registration is represented by a kernel |
||
86 | structure called <classname>timeout</classname>. Timeouts are registered |
||
87 | via <code>timeout_register</code> function. This function takes a pointer |
||
88 | to a timeout structure, a callback function, a parameter of the callback |
||
89 | function and a delay in microseconds as parameters. After the structure is |
||
90 | initialized with all these values, it is sorted into the processor's list |
||
56 | jermar | 91 | of active timeouts, according to the number of clock interrupts remaining |
92 | to their expiration and relatively to already listed timeouts.</para> |
||
55 | jermar | 93 | |
94 | <para>Timeouts can be unregistered via <code>timeout_unregister</code>. |
||
95 | This function can, as opposed to <code>timeout_register</code>, fail when |
||
96 | it is too late to remove the timeout from the list of active |
||
97 | timeouts.</para> |
||
98 | |||
99 | <para>Timeouts are nearing their expiration in the list of active timeouts |
||
100 | which exists on every processor in the system. The expiration counters are |
||
101 | decremented on each clock interrupt by the generic clock interrupt |
||
102 | handler. Due to the relative ordering of timeouts in the list, it is |
||
103 | sufficient to decrement expiration counter only of the first timeout in |
||
104 | the list. Timeouts with expiration counter equal to zero are removed from |
||
105 | the list and their callback function is called with respective |
||
106 | parameter.</para> |
||
107 | </section> |
||
108 | |||
109 | <section> |
||
110 | <title>Generic clock interrupt handler</title> |
||
111 | |||
112 | <para>On each clock interrupt, the architecture specific part of the clock |
||
113 | interrupt handler makes a call to the generic clock interrupt handler |
||
114 | implemented by the <code>clock</code> function. The generic handler takes |
||
115 | care of several mission critical goals:</para> |
||
116 | |||
117 | <itemizedlist> |
||
118 | <listitem> |
||
119 | <para>expiration of timeouts,</para> |
||
120 | </listitem> |
||
121 | |||
122 | <listitem> |
||
123 | <para>updating time of the day counters for userspace and</para> |
||
124 | </listitem> |
||
125 | |||
126 | <listitem> |
||
127 | <para>preemption of threads.</para> |
||
128 | </listitem> |
||
129 | </itemizedlist> |
||
130 | |||
56 | jermar | 131 | <para>The <code>clock</code> function checks for expired timeouts and |
132 | decrements unexpired timeout expiration counters exactly one more times |
||
133 | than is the number of missed clock signals (i.e. at least once and |
||
134 | possibly more times, depending on the missed clock signals counter). The |
||
135 | time of the day counters are also updated one more times than is the |
||
136 | number of missed clock signals. And finally, the remaining timeslice of |
||
137 | the running thread is decremented with respect to this counter as well. By |
||
138 | considering its value, the kernel performs actions that would otherwise be |
||
139 | lost due to an occasional excessive time drift described in previous |
||
140 | paragraphs.</para> |
||
55 | jermar | 141 | </section> |
56 | jermar | 142 | |
143 | <section> |
||
144 | <title>Time source for userspace</title> |
||
145 | |||
146 | <para>In HelenOS, userspace tasks don't communicate with the kernel in |
||
147 | order to read system time. Instead, a mechanism that shares kernel time of |
||
148 | the day counters with userspace address spaces is deployed. On the kernel |
||
149 | side, during system initialization, HelenOS allocates a frame of physical |
||
150 | memory and stores the time of the day counters there. The counters have |
||
151 | the following structure:</para> |
||
152 | |||
153 | <itemizedlist> |
||
154 | <listitem> |
||
155 | <para>first 32-bit counter for seconds,</para> |
||
156 | </listitem> |
||
157 | |||
158 | <listitem> |
||
159 | <para>32-bit counter for microseconds and</para> |
||
160 | </listitem> |
||
161 | |||
162 | <listitem> |
||
163 | <para>second 32-bit counter for seconds.</para> |
||
164 | </listitem> |
||
165 | </itemizedlist> |
||
166 | |||
167 | <para>One of the userspace tasks with capabilities of memory manager (e.g. |
||
168 | ns) asks the kernel to map this frame into its address space. Other, |
||
169 | non-privileged, tasks then use IPC to communicate read-only sharing of |
||
170 | this memory. Reading time in a userspace task is therefore just a matter |
||
171 | of reading memory.</para> |
||
172 | |||
173 | <para>There are two interesting points about this. First, the counters are |
||
174 | 32-bit even on 64-bit machines. The goal is to provide subsecond precision |
||
175 | with the possibility to span roughly 136 years. Note that a single 64-bit |
||
176 | microsecond counter could not be usually read atomically on 32-bit |
||
177 | platforms. Now the second point is that 32-bit platforms cannot atomically |
||
178 | read two 32-bit counters either. However, a generic protocol is used to |
||
179 | guarantee that sequentially read times will create a non-decreasing |
||
180 | sequence.</para> |
||
181 | |||
182 | <para>The problematic part is updating both seconds and microseconds once |
||
183 | in a second. Seconds must be incremented and microseconds must be reset. |
||
184 | However, without any synchronization, the two kernel stores and the two |
||
185 | userspace reads can arbitrarily interleave. Furthemore, the reader has no |
||
186 | chance to detect that the counters were updated only from half. Therefore |
||
187 | three counters are used in HelenOS.</para> |
||
188 | |||
189 | <para>If seconds need to be updated, the kernel increments the first |
||
190 | second counter, issues a write memory barrier operation, updates the |
||
191 | microsecond counter, issues another write memory barrier operation and |
||
192 | increments the second second counter. When only microseconds need to be |
||
193 | updated, no special action is taken by the kernel. On the other hand, the |
||
194 | userspace task must always read all three counters and in reversed order. |
||
195 | A read memory barrier operation must be issued between each two reads. A |
||
196 | non-atomic read is detected when the two second counters differ. The |
||
197 | userspace library solves this situation by returning zero instead of the |
||
198 | value read from the microsecond counter.</para> |
||
199 | </section> |
||
52 | jermar | 200 | </chapter> |