Rev 131 | Details | Compare with Previous | Last modification | View Log | RSS feed
| Rev | Author | Line No. | Line |
|---|---|---|---|
| 52 | jermar | 1 | <?xml version="1.0" encoding="UTF-8"?> |
| 55 | jermar | 2 | <chapter id="time"> |
| 3 | <?dbhtml filename="time.html"?> |
||
| 52 | jermar | 4 | |
| 131 | jermar | 5 | <title>Time Management</title> |
| 52 | jermar | 6 | |
| 139 | palkovsky | 7 | <para>Time is one of the dimensions in which kernel as well as the whole |
| 8 | system operates. It is of special importance to many kernel subsytems. |
||
| 55 | jermar | 9 | Knowledge of time makes it possible for the scheduler to preemptively plan |
| 10 | threads for execution. Different parts of the kernel can request execution |
||
| 139 | palkovsky | 11 | of their callback function with a specified delay. A good example of such |
| 55 | jermar | 12 | kernel code is the synchronization subsystem which uses this functionality |
| 13 | to implement timeouting versions of synchronization primitives.</para> |
||
| 14 | |||
| 15 | <section> |
||
| 131 | jermar | 16 | <title>System Clock</title> |
| 53 | jermar | 17 | |
| 55 | jermar | 18 | <para>Every hardware architecture supported by HelenOS must support some |
| 19 | kind of a device that can be programmed to yield periodic time signals |
||
| 20 | (i.e. clock interrupts). Some architectures have external clock that is |
||
| 21 | merely programmed by the kernel to interrupt the processor multiple times |
||
| 22 | in a second. This is the case of ia32 and amd64 architectures<footnote> |
||
| 23 | <para>When running in uniprocessor mode.</para> |
||
| 24 | </footnote>, which use i8254 or a compatible chip to achieve the |
||
| 25 | goal.</para> |
||
| 26 | |||
| 27 | <para>Other architectures' processors typically contain two registers. The |
||
| 28 | first register is usually called a compare or a match register and can be |
||
| 29 | set to an arbitrary value by the operating system. The contents of the |
||
| 30 | compare register then stays unaltered until it is written by the kernel |
||
| 31 | again. The second register, often called a counter register, can be also |
||
| 32 | written by the kernel, but the processor automatically increments it after |
||
| 33 | every executed instruction or in some fixed relation to processor speed. |
||
| 34 | The point is that a clock interrupt is generated whenever the values of |
||
| 35 | the counter and the compare registers match. Sometimes, the scheme of two |
||
| 36 | registers is modified so that only one register is needed. Such a |
||
| 37 | register, called a decrementer, then counts towards zero and an interrupt |
||
| 38 | is generated when zero is reached.</para> |
||
| 39 | |||
| 40 | <para>In any case, the initial value of the decrementer or the initial |
||
| 41 | difference between the counter and the compare registers, respectively, |
||
| 42 | must be set accordingly to a known relation between the real time and the |
||
| 43 | speed of the decrementer or the counter register, respectively.</para> |
||
| 44 | |||
| 45 | <para>The rest of this section will, for the sake of clarity, focus on the |
||
| 46 | two-register scheme. The decrementer scheme is very similar.</para> |
||
| 47 | |||
| 58 | jermar | 48 | <para>The kernel must reinitialize one of the two registers after each |
| 49 | clock interrupt in order to schedule next interrupt. However this step is |
||
| 50 | tricky and must be done with caution. Imagine that the clock interrupt is |
||
| 51 | masked either because the kernel is servicing another interrupt or because |
||
| 52 | the processor locally disabled interrupts for a while. If the clock |
||
| 139 | palkovsky | 53 | interrupt occurs during this period, it will be pending until the |
| 54 | interrupts are enabled again. Theoretically, it could happen an arbitrary |
||
| 55 | counter register ticks later. Which is worse, the ideal time period |
||
| 56 | between two non-delayed clock interrupts can also elapse arbitrary number |
||
| 57 | of times before the delayed interrupt gets serviced. The |
||
| 58 | architecture-specific part of the clock interrupt driver must avoid time |
||
| 59 | drifts caused by such behaviour by taking proactive |
||
| 60 | counter-measures.</para> |
||
| 55 | jermar | 61 | |
| 62 | <para>Let us assume that the kernel wants each clock interrupt be |
||
| 63 | generated every <constant>TICKCONST</constant> ticks. This value |
||
| 64 | represents the ideal number of ticks between two non-delayed clock |
||
| 65 | interrupts and has some known relation to real time. On each clock |
||
| 66 | interrupt, the kernel computes and writes down the expected value of the |
||
| 67 | counter register as it hopes to read it on the next clock interrupt. When |
||
| 68 | that interrupt comes, the kernel reads the counter register again and |
||
| 69 | compares it with the written down value. If the difference is smaller than |
||
| 70 | or equal to <constant>TICKCONST</constant>, then the time drift is none or |
||
| 71 | small and the next interrupt is scheduled earlier with a penalty of so |
||
| 72 | many ticks as is the value of the difference. However, if the difference |
||
| 73 | is bigger, then at least one clock signal was missed. In that case, the |
||
| 74 | missed clock signal is remembered in the special counter. If there are |
||
| 75 | more missed signals, each of them is recorded there. The next interrupt is |
||
| 76 | scheduled with respect to the difference similarily to the former case. |
||
| 77 | This time, the penalty is taken modulo <constant>TICKCONST</constant>. The |
||
| 78 | effect of missed clock signals is remedied in the generic clock interrupt |
||
| 79 | handler.</para> |
||
| 80 | </section> |
||
| 81 | |||
| 82 | <section> |
||
| 83 | <title>Timeouts</title> |
||
| 84 | |||
| 85 | <para>Kernel subsystems can register a callback function to be executed |
||
| 86 | with a specified delay. Such a registration is represented by a kernel |
||
| 87 | structure called <classname>timeout</classname>. Timeouts are registered |
||
| 131 | jermar | 88 | via <code>timeout_register</code> function. This function takes a pointer |
| 55 | jermar | 89 | to a timeout structure, a callback function, a parameter of the callback |
| 90 | function and a delay in microseconds as parameters. After the structure is |
||
| 91 | initialized with all these values, it is sorted into the processor's list |
||
| 56 | jermar | 92 | of active timeouts, according to the number of clock interrupts remaining |
| 93 | to their expiration and relatively to already listed timeouts.</para> |
||
| 55 | jermar | 94 | |
| 131 | jermar | 95 | <para>Timeouts can be unregistered via <code>timeout_unregister</code>. |
| 96 | This function can, as opposed to <code>timeout_register</code>, fail when |
||
| 55 | jermar | 97 | it is too late to remove the timeout from the list of active |
| 98 | timeouts.</para> |
||
| 99 | |||
| 100 | <para>Timeouts are nearing their expiration in the list of active timeouts |
||
| 101 | which exists on every processor in the system. The expiration counters are |
||
| 102 | decremented on each clock interrupt by the generic clock interrupt |
||
| 103 | handler. Due to the relative ordering of timeouts in the list, it is |
||
| 104 | sufficient to decrement expiration counter only of the first timeout in |
||
| 105 | the list. Timeouts with expiration counter equal to zero are removed from |
||
| 106 | the list and their callback function is called with respective |
||
| 107 | parameter.</para> |
||
| 108 | </section> |
||
| 109 | |||
| 110 | <section> |
||
| 131 | jermar | 111 | <title>Generic Clock Interrupt Handler</title> |
| 55 | jermar | 112 | |
| 113 | <para>On each clock interrupt, the architecture specific part of the clock |
||
| 114 | interrupt handler makes a call to the generic clock interrupt handler |
||
| 131 | jermar | 115 | implemented by the <code>clock</code> function. The generic handler takes |
| 55 | jermar | 116 | care of several mission critical goals:</para> |
| 117 | |||
| 118 | <itemizedlist> |
||
| 119 | <listitem> |
||
| 120 | <para>expiration of timeouts,</para> |
||
| 121 | </listitem> |
||
| 122 | |||
| 123 | <listitem> |
||
| 124 | <para>updating time of the day counters for userspace and</para> |
||
| 125 | </listitem> |
||
| 126 | |||
| 127 | <listitem> |
||
| 128 | <para>preemption of threads.</para> |
||
| 129 | </listitem> |
||
| 130 | </itemizedlist> |
||
| 131 | |||
| 131 | jermar | 132 | <para>The <code>clock</code> function checks for expired timeouts and |
| 56 | jermar | 133 | decrements unexpired timeout expiration counters exactly one more times |
| 134 | than is the number of missed clock signals (i.e. at least once and |
||
| 135 | possibly more times, depending on the missed clock signals counter). The |
||
| 136 | time of the day counters are also updated one more times than is the |
||
| 137 | number of missed clock signals. And finally, the remaining timeslice of |
||
| 138 | the running thread is decremented with respect to this counter as well. By |
||
| 139 | considering its value, the kernel performs actions that would otherwise be |
||
| 140 | lost due to an occasional excessive time drift described in previous |
||
| 141 | paragraphs.</para> |
||
| 55 | jermar | 142 | </section> |
| 56 | jermar | 143 | |
| 144 | <section> |
||
| 131 | jermar | 145 | <title>Time Source for Userspace</title> |
| 56 | jermar | 146 | |
| 147 | <para>In HelenOS, userspace tasks don't communicate with the kernel in |
||
| 85 | palkovsky | 148 | order to read the system time. Instead, a mechanism that shares kernel |
| 149 | time of the the day counters with userspace address spaces is deployed. On |
||
| 150 | the kernel side, during system initialization, HelenOS allocates a frame |
||
| 151 | of physical memory and stores the time of the day counters there. The |
||
| 152 | counters have the following structure:</para> |
||
| 56 | jermar | 153 | |
| 154 | <itemizedlist> |
||
| 155 | <listitem> |
||
| 156 | <para>first 32-bit counter for seconds,</para> |
||
| 157 | </listitem> |
||
| 158 | |||
| 159 | <listitem> |
||
| 160 | <para>32-bit counter for microseconds and</para> |
||
| 161 | </listitem> |
||
| 162 | |||
| 163 | <listitem> |
||
| 164 | <para>second 32-bit counter for seconds.</para> |
||
| 165 | </listitem> |
||
| 166 | </itemizedlist> |
||
| 167 | |||
| 168 | <para>One of the userspace tasks with capabilities of memory manager (e.g. |
||
| 85 | palkovsky | 169 | ns) asks the kernel to map this frame into its address space. Other |
| 170 | non-privileged tasks then use IPC to receive read-only share of this |
||
| 171 | memory. Reading time in a userspace task is therefore just a matter of |
||
| 172 | reading memory.</para> |
||
| 56 | jermar | 173 | |
| 174 | <para>There are two interesting points about this. First, the counters are |
||
| 175 | 32-bit even on 64-bit machines. The goal is to provide subsecond precision |
||
| 176 | with the possibility to span roughly 136 years. Note that a single 64-bit |
||
| 177 | microsecond counter could not be usually read atomically on 32-bit |
||
| 85 | palkovsky | 178 | platforms. Unfortunately, on 32-bit platforms it is usually impossible to |
| 179 | read atomically two 32-bit counters either. However, a generic protocol is |
||
| 180 | used to guarantee that sequentially read times will create a |
||
| 181 | non-decreasing sequence.</para> |
||
| 56 | jermar | 182 | |
| 85 | palkovsky | 183 | <para>The problematic part is incrementing seconds counter and clearing |
| 184 | microseconds counter together once every second. Seconds must be |
||
| 185 | incremented and microseconds must be reset. However, without any |
||
| 186 | synchronization, the two kernel stores and the two userspace reads can |
||
| 187 | arbitrarily interleave. Furthemore, the reader has no chance to detect |
||
| 188 | that the counters were updated only paritally. Therefore three counters |
||
| 189 | are used in HelenOS.</para> |
||
| 56 | jermar | 190 | |
| 191 | <para>If seconds need to be updated, the kernel increments the first |
||
| 192 | second counter, issues a write memory barrier operation, updates the |
||
| 193 | microsecond counter, issues another write memory barrier operation and |
||
| 85 | palkovsky | 194 | increments the second second counter. When only microseconds needs to be |
| 56 | jermar | 195 | updated, no special action is taken by the kernel. On the other hand, the |
| 85 | palkovsky | 196 | userspace task must always read all three counters in reversed order. A |
| 197 | read memory barrier operation must be issued between each two reads. A |
||
| 56 | jermar | 198 | non-atomic read is detected when the two second counters differ. The |
| 85 | palkovsky | 199 | userspace library solves this situation by returning higher of them with |
| 200 | microseconds set to zero.</para> |
||
| 56 | jermar | 201 | </section> |
| 52 | jermar | 202 | </chapter> |