Rev 131 | Only display areas with differences | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 131 | Rev 139 | ||
---|---|---|---|
1 | <?xml version="1.0" encoding="UTF-8"?> |
1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <chapter id="time"> |
2 | <chapter id="time"> |
3 | <?dbhtml filename="time.html"?> |
3 | <?dbhtml filename="time.html"?> |
4 | 4 | ||
5 | <title>Time Management</title> |
5 | <title>Time Management</title> |
6 | 6 | ||
7 | <para>Time is one of the dimensions in which kernel, as well as the whole |
7 | <para>Time is one of the dimensions in which kernel as well as the whole |
8 | system, operates. It is of special importance to many kernel subsytems. |
8 | system operates. It is of special importance to many kernel subsytems. |
9 | Knowledge of time makes it possible for the scheduler to preemptively plan |
9 | Knowledge of time makes it possible for the scheduler to preemptively plan |
10 | threads for execution. Different parts of the kernel can request execution |
10 | threads for execution. Different parts of the kernel can request execution |
11 | of their callback function with some specified delay. A good example of such |
11 | of their callback function with a specified delay. A good example of such |
12 | kernel code is the synchronization subsystem which uses this functionality |
12 | kernel code is the synchronization subsystem which uses this functionality |
13 | to implement timeouting versions of synchronization primitives.</para> |
13 | to implement timeouting versions of synchronization primitives.</para> |
14 | 14 | ||
15 | <section> |
15 | <section> |
16 | <title>System Clock</title> |
16 | <title>System Clock</title> |
17 | 17 | ||
18 | <para>Every hardware architecture supported by HelenOS must support some |
18 | <para>Every hardware architecture supported by HelenOS must support some |
19 | kind of a device that can be programmed to yield periodic time signals |
19 | kind of a device that can be programmed to yield periodic time signals |
20 | (i.e. clock interrupts). Some architectures have external clock that is |
20 | (i.e. clock interrupts). Some architectures have external clock that is |
21 | merely programmed by the kernel to interrupt the processor multiple times |
21 | merely programmed by the kernel to interrupt the processor multiple times |
22 | in a second. This is the case of ia32 and amd64 architectures<footnote> |
22 | in a second. This is the case of ia32 and amd64 architectures<footnote> |
23 | <para>When running in uniprocessor mode.</para> |
23 | <para>When running in uniprocessor mode.</para> |
24 | </footnote>, which use i8254 or a compatible chip to achieve the |
24 | </footnote>, which use i8254 or a compatible chip to achieve the |
25 | goal.</para> |
25 | goal.</para> |
26 | 26 | ||
27 | <para>Other architectures' processors typically contain two registers. The |
27 | <para>Other architectures' processors typically contain two registers. The |
28 | first register is usually called a compare or a match register and can be |
28 | first register is usually called a compare or a match register and can be |
29 | set to an arbitrary value by the operating system. The contents of the |
29 | set to an arbitrary value by the operating system. The contents of the |
30 | compare register then stays unaltered until it is written by the kernel |
30 | compare register then stays unaltered until it is written by the kernel |
31 | again. The second register, often called a counter register, can be also |
31 | again. The second register, often called a counter register, can be also |
32 | written by the kernel, but the processor automatically increments it after |
32 | written by the kernel, but the processor automatically increments it after |
33 | every executed instruction or in some fixed relation to processor speed. |
33 | every executed instruction or in some fixed relation to processor speed. |
34 | The point is that a clock interrupt is generated whenever the values of |
34 | The point is that a clock interrupt is generated whenever the values of |
35 | the counter and the compare registers match. Sometimes, the scheme of two |
35 | the counter and the compare registers match. Sometimes, the scheme of two |
36 | registers is modified so that only one register is needed. Such a |
36 | registers is modified so that only one register is needed. Such a |
37 | register, called a decrementer, then counts towards zero and an interrupt |
37 | register, called a decrementer, then counts towards zero and an interrupt |
38 | is generated when zero is reached.</para> |
38 | is generated when zero is reached.</para> |
39 | 39 | ||
40 | <para>In any case, the initial value of the decrementer or the initial |
40 | <para>In any case, the initial value of the decrementer or the initial |
41 | difference between the counter and the compare registers, respectively, |
41 | difference between the counter and the compare registers, respectively, |
42 | must be set accordingly to a known relation between the real time and the |
42 | must be set accordingly to a known relation between the real time and the |
43 | speed of the decrementer or the counter register, respectively.</para> |
43 | speed of the decrementer or the counter register, respectively.</para> |
44 | 44 | ||
45 | <para>The rest of this section will, for the sake of clarity, focus on the |
45 | <para>The rest of this section will, for the sake of clarity, focus on the |
46 | two-register scheme. The decrementer scheme is very similar.</para> |
46 | two-register scheme. The decrementer scheme is very similar.</para> |
47 | 47 | ||
48 | <para>The kernel must reinitialize one of the two registers after each |
48 | <para>The kernel must reinitialize one of the two registers after each |
49 | clock interrupt in order to schedule next interrupt. However this step is |
49 | clock interrupt in order to schedule next interrupt. However this step is |
50 | tricky and must be done with caution. Imagine that the clock interrupt is |
50 | tricky and must be done with caution. Imagine that the clock interrupt is |
51 | masked either because the kernel is servicing another interrupt or because |
51 | masked either because the kernel is servicing another interrupt or because |
52 | the processor locally disabled interrupts for a while. If the clock |
52 | the processor locally disabled interrupts for a while. If the clock |
53 | interrupt occurs during this period, it will be pending until interrupts |
53 | interrupt occurs during this period, it will be pending until the |
54 | are enabled again. In theory, that could happen arbitrary counter register |
54 | interrupts are enabled again. Theoretically, it could happen an arbitrary |
55 | ticks later. Which is worse, the ideal time period between two non-delayed |
55 | counter register ticks later. Which is worse, the ideal time period |
56 | clock interrupts can also elapse arbitrary number of times before the |
56 | between two non-delayed clock interrupts can also elapse arbitrary number |
57 | delayed interrupt gets serviced. The architecture-specific part of the |
57 | of times before the delayed interrupt gets serviced. The |
58 | clock interrupt driver must avoid time drifts caused by this by taking |
58 | architecture-specific part of the clock interrupt driver must avoid time |
- | 59 | drifts caused by such behaviour by taking proactive |
|
59 | proactive counter-measures.</para> |
60 | counter-measures.</para> |
60 | 61 | ||
61 | <para>Let us assume that the kernel wants each clock interrupt be |
62 | <para>Let us assume that the kernel wants each clock interrupt be |
62 | generated every <constant>TICKCONST</constant> ticks. This value |
63 | generated every <constant>TICKCONST</constant> ticks. This value |
63 | represents the ideal number of ticks between two non-delayed clock |
64 | represents the ideal number of ticks between two non-delayed clock |
64 | interrupts and has some known relation to real time. On each clock |
65 | interrupts and has some known relation to real time. On each clock |
65 | interrupt, the kernel computes and writes down the expected value of the |
66 | interrupt, the kernel computes and writes down the expected value of the |
66 | counter register as it hopes to read it on the next clock interrupt. When |
67 | counter register as it hopes to read it on the next clock interrupt. When |
67 | that interrupt comes, the kernel reads the counter register again and |
68 | that interrupt comes, the kernel reads the counter register again and |
68 | compares it with the written down value. If the difference is smaller than |
69 | compares it with the written down value. If the difference is smaller than |
69 | or equal to <constant>TICKCONST</constant>, then the time drift is none or |
70 | or equal to <constant>TICKCONST</constant>, then the time drift is none or |
70 | small and the next interrupt is scheduled earlier with a penalty of so |
71 | small and the next interrupt is scheduled earlier with a penalty of so |
71 | many ticks as is the value of the difference. However, if the difference |
72 | many ticks as is the value of the difference. However, if the difference |
72 | is bigger, then at least one clock signal was missed. In that case, the |
73 | is bigger, then at least one clock signal was missed. In that case, the |
73 | missed clock signal is remembered in the special counter. If there are |
74 | missed clock signal is remembered in the special counter. If there are |
74 | more missed signals, each of them is recorded there. The next interrupt is |
75 | more missed signals, each of them is recorded there. The next interrupt is |
75 | scheduled with respect to the difference similarily to the former case. |
76 | scheduled with respect to the difference similarily to the former case. |
76 | This time, the penalty is taken modulo <constant>TICKCONST</constant>. The |
77 | This time, the penalty is taken modulo <constant>TICKCONST</constant>. The |
77 | effect of missed clock signals is remedied in the generic clock interrupt |
78 | effect of missed clock signals is remedied in the generic clock interrupt |
78 | handler.</para> |
79 | handler.</para> |
79 | </section> |
80 | </section> |
80 | 81 | ||
81 | <section> |
82 | <section> |
82 | <title>Timeouts</title> |
83 | <title>Timeouts</title> |
83 | 84 | ||
84 | <para>Kernel subsystems can register a callback function to be executed |
85 | <para>Kernel subsystems can register a callback function to be executed |
85 | with a specified delay. Such a registration is represented by a kernel |
86 | with a specified delay. Such a registration is represented by a kernel |
86 | structure called <classname>timeout</classname>. Timeouts are registered |
87 | structure called <classname>timeout</classname>. Timeouts are registered |
87 | via <code>timeout_register</code> function. This function takes a pointer |
88 | via <code>timeout_register</code> function. This function takes a pointer |
88 | to a timeout structure, a callback function, a parameter of the callback |
89 | to a timeout structure, a callback function, a parameter of the callback |
89 | function and a delay in microseconds as parameters. After the structure is |
90 | function and a delay in microseconds as parameters. After the structure is |
90 | initialized with all these values, it is sorted into the processor's list |
91 | initialized with all these values, it is sorted into the processor's list |
91 | of active timeouts, according to the number of clock interrupts remaining |
92 | of active timeouts, according to the number of clock interrupts remaining |
92 | to their expiration and relatively to already listed timeouts.</para> |
93 | to their expiration and relatively to already listed timeouts.</para> |
93 | 94 | ||
94 | <para>Timeouts can be unregistered via <code>timeout_unregister</code>. |
95 | <para>Timeouts can be unregistered via <code>timeout_unregister</code>. |
95 | This function can, as opposed to <code>timeout_register</code>, fail when |
96 | This function can, as opposed to <code>timeout_register</code>, fail when |
96 | it is too late to remove the timeout from the list of active |
97 | it is too late to remove the timeout from the list of active |
97 | timeouts.</para> |
98 | timeouts.</para> |
98 | 99 | ||
99 | <para>Timeouts are nearing their expiration in the list of active timeouts |
100 | <para>Timeouts are nearing their expiration in the list of active timeouts |
100 | which exists on every processor in the system. The expiration counters are |
101 | which exists on every processor in the system. The expiration counters are |
101 | decremented on each clock interrupt by the generic clock interrupt |
102 | decremented on each clock interrupt by the generic clock interrupt |
102 | handler. Due to the relative ordering of timeouts in the list, it is |
103 | handler. Due to the relative ordering of timeouts in the list, it is |
103 | sufficient to decrement expiration counter only of the first timeout in |
104 | sufficient to decrement expiration counter only of the first timeout in |
104 | the list. Timeouts with expiration counter equal to zero are removed from |
105 | the list. Timeouts with expiration counter equal to zero are removed from |
105 | the list and their callback function is called with respective |
106 | the list and their callback function is called with respective |
106 | parameter.</para> |
107 | parameter.</para> |
107 | </section> |
108 | </section> |
108 | 109 | ||
109 | <section> |
110 | <section> |
110 | <title>Generic Clock Interrupt Handler</title> |
111 | <title>Generic Clock Interrupt Handler</title> |
111 | 112 | ||
112 | <para>On each clock interrupt, the architecture specific part of the clock |
113 | <para>On each clock interrupt, the architecture specific part of the clock |
113 | interrupt handler makes a call to the generic clock interrupt handler |
114 | interrupt handler makes a call to the generic clock interrupt handler |
114 | implemented by the <code>clock</code> function. The generic handler takes |
115 | implemented by the <code>clock</code> function. The generic handler takes |
115 | care of several mission critical goals:</para> |
116 | care of several mission critical goals:</para> |
116 | 117 | ||
117 | <itemizedlist> |
118 | <itemizedlist> |
118 | <listitem> |
119 | <listitem> |
119 | <para>expiration of timeouts,</para> |
120 | <para>expiration of timeouts,</para> |
120 | </listitem> |
121 | </listitem> |
121 | 122 | ||
122 | <listitem> |
123 | <listitem> |
123 | <para>updating time of the day counters for userspace and</para> |
124 | <para>updating time of the day counters for userspace and</para> |
124 | </listitem> |
125 | </listitem> |
125 | 126 | ||
126 | <listitem> |
127 | <listitem> |
127 | <para>preemption of threads.</para> |
128 | <para>preemption of threads.</para> |
128 | </listitem> |
129 | </listitem> |
129 | </itemizedlist> |
130 | </itemizedlist> |
130 | 131 | ||
131 | <para>The <code>clock</code> function checks for expired timeouts and |
132 | <para>The <code>clock</code> function checks for expired timeouts and |
132 | decrements unexpired timeout expiration counters exactly one more times |
133 | decrements unexpired timeout expiration counters exactly one more times |
133 | than is the number of missed clock signals (i.e. at least once and |
134 | than is the number of missed clock signals (i.e. at least once and |
134 | possibly more times, depending on the missed clock signals counter). The |
135 | possibly more times, depending on the missed clock signals counter). The |
135 | time of the day counters are also updated one more times than is the |
136 | time of the day counters are also updated one more times than is the |
136 | number of missed clock signals. And finally, the remaining timeslice of |
137 | number of missed clock signals. And finally, the remaining timeslice of |
137 | the running thread is decremented with respect to this counter as well. By |
138 | the running thread is decremented with respect to this counter as well. By |
138 | considering its value, the kernel performs actions that would otherwise be |
139 | considering its value, the kernel performs actions that would otherwise be |
139 | lost due to an occasional excessive time drift described in previous |
140 | lost due to an occasional excessive time drift described in previous |
140 | paragraphs.</para> |
141 | paragraphs.</para> |
141 | </section> |
142 | </section> |
142 | 143 | ||
143 | <section> |
144 | <section> |
144 | <title>Time Source for Userspace</title> |
145 | <title>Time Source for Userspace</title> |
145 | 146 | ||
146 | <para>In HelenOS, userspace tasks don't communicate with the kernel in |
147 | <para>In HelenOS, userspace tasks don't communicate with the kernel in |
147 | order to read the system time. Instead, a mechanism that shares kernel |
148 | order to read the system time. Instead, a mechanism that shares kernel |
148 | time of the the day counters with userspace address spaces is deployed. On |
149 | time of the the day counters with userspace address spaces is deployed. On |
149 | the kernel side, during system initialization, HelenOS allocates a frame |
150 | the kernel side, during system initialization, HelenOS allocates a frame |
150 | of physical memory and stores the time of the day counters there. The |
151 | of physical memory and stores the time of the day counters there. The |
151 | counters have the following structure:</para> |
152 | counters have the following structure:</para> |
152 | 153 | ||
153 | <itemizedlist> |
154 | <itemizedlist> |
154 | <listitem> |
155 | <listitem> |
155 | <para>first 32-bit counter for seconds,</para> |
156 | <para>first 32-bit counter for seconds,</para> |
156 | </listitem> |
157 | </listitem> |
157 | 158 | ||
158 | <listitem> |
159 | <listitem> |
159 | <para>32-bit counter for microseconds and</para> |
160 | <para>32-bit counter for microseconds and</para> |
160 | </listitem> |
161 | </listitem> |
161 | 162 | ||
162 | <listitem> |
163 | <listitem> |
163 | <para>second 32-bit counter for seconds.</para> |
164 | <para>second 32-bit counter for seconds.</para> |
164 | </listitem> |
165 | </listitem> |
165 | </itemizedlist> |
166 | </itemizedlist> |
166 | 167 | ||
167 | <para>One of the userspace tasks with capabilities of memory manager (e.g. |
168 | <para>One of the userspace tasks with capabilities of memory manager (e.g. |
168 | ns) asks the kernel to map this frame into its address space. Other |
169 | ns) asks the kernel to map this frame into its address space. Other |
169 | non-privileged tasks then use IPC to receive read-only share of this |
170 | non-privileged tasks then use IPC to receive read-only share of this |
170 | memory. Reading time in a userspace task is therefore just a matter of |
171 | memory. Reading time in a userspace task is therefore just a matter of |
171 | reading memory.</para> |
172 | reading memory.</para> |
172 | 173 | ||
173 | <para>There are two interesting points about this. First, the counters are |
174 | <para>There are two interesting points about this. First, the counters are |
174 | 32-bit even on 64-bit machines. The goal is to provide subsecond precision |
175 | 32-bit even on 64-bit machines. The goal is to provide subsecond precision |
175 | with the possibility to span roughly 136 years. Note that a single 64-bit |
176 | with the possibility to span roughly 136 years. Note that a single 64-bit |
176 | microsecond counter could not be usually read atomically on 32-bit |
177 | microsecond counter could not be usually read atomically on 32-bit |
177 | platforms. Unfortunately, on 32-bit platforms it is usually impossible to |
178 | platforms. Unfortunately, on 32-bit platforms it is usually impossible to |
178 | read atomically two 32-bit counters either. However, a generic protocol is |
179 | read atomically two 32-bit counters either. However, a generic protocol is |
179 | used to guarantee that sequentially read times will create a |
180 | used to guarantee that sequentially read times will create a |
180 | non-decreasing sequence.</para> |
181 | non-decreasing sequence.</para> |
181 | 182 | ||
182 | <para>The problematic part is incrementing seconds counter and clearing |
183 | <para>The problematic part is incrementing seconds counter and clearing |
183 | microseconds counter together once every second. Seconds must be |
184 | microseconds counter together once every second. Seconds must be |
184 | incremented and microseconds must be reset. However, without any |
185 | incremented and microseconds must be reset. However, without any |
185 | synchronization, the two kernel stores and the two userspace reads can |
186 | synchronization, the two kernel stores and the two userspace reads can |
186 | arbitrarily interleave. Furthemore, the reader has no chance to detect |
187 | arbitrarily interleave. Furthemore, the reader has no chance to detect |
187 | that the counters were updated only paritally. Therefore three counters |
188 | that the counters were updated only paritally. Therefore three counters |
188 | are used in HelenOS.</para> |
189 | are used in HelenOS.</para> |
189 | 190 | ||
190 | <para>If seconds need to be updated, the kernel increments the first |
191 | <para>If seconds need to be updated, the kernel increments the first |
191 | second counter, issues a write memory barrier operation, updates the |
192 | second counter, issues a write memory barrier operation, updates the |
192 | microsecond counter, issues another write memory barrier operation and |
193 | microsecond counter, issues another write memory barrier operation and |
193 | increments the second second counter. When only microseconds needs to be |
194 | increments the second second counter. When only microseconds needs to be |
194 | updated, no special action is taken by the kernel. On the other hand, the |
195 | updated, no special action is taken by the kernel. On the other hand, the |
195 | userspace task must always read all three counters in reversed order. A |
196 | userspace task must always read all three counters in reversed order. A |
196 | read memory barrier operation must be issued between each two reads. A |
197 | read memory barrier operation must be issued between each two reads. A |
197 | non-atomic read is detected when the two second counters differ. The |
198 | non-atomic read is detected when the two second counters differ. The |
198 | userspace library solves this situation by returning higher of them with |
199 | userspace library solves this situation by returning higher of them with |
199 | microseconds set to zero.</para> |
200 | microseconds set to zero.</para> |
200 | </section> |
201 | </section> |
201 | </chapter> |
202 | </chapter> |