WebSVN – HelenOS-doc – Diff – /design/trunk/src/ch_time.xml

 <?xml version="1.0" encoding="UTF-8"?>
 <chapter id="time">
   <?dbhtml filename="time.html"?>
   <title>Time management</title>
   <para>Time is one of the dimensions in which kernel, as well as the whole
   system, operates. It is of special importance to many kernel subsytems.
   Knowledge of time makes it possible for the scheduler to preemptively plan
   threads for execution. Different parts of the kernel can request execution
   of their callback function with some specified delay. A good example of such
   kernel code is the synchronization subsystem which uses this functionality
   to implement timeouting versions of synchronization primitives.</para>
   <section>
     <title>System clock</title>
     <para>Every hardware architecture supported by HelenOS must support some
     kind of a device that can be programmed to yield periodic time signals
     (i.e. clock interrupts). Some architectures have external clock that is
     merely programmed by the kernel to interrupt the processor multiple times
     in a second. This is the case of ia32 and amd64 architectures<footnote>
         <para>When running in uniprocessor mode.</para>
       </footnote>, which use i8254 or a compatible chip to achieve the
     goal.</para>
     <para>Other architectures' processors typically contain two registers. The
     first register is usually called a compare or a match register and can be
     set to an arbitrary value by the operating system. The contents of the
     compare register then stays unaltered until it is written by the kernel
     again. The second register, often called a counter register, can be also
     written by the kernel, but the processor automatically increments it after
     every executed instruction or in some fixed relation to processor speed.
     The point is that a clock interrupt is generated whenever the values of
     the counter and the compare registers match. Sometimes, the scheme of two
     registers is modified so that only one register is needed. Such a
     register, called a decrementer, then counts towards zero and an interrupt
     is generated when zero is reached.</para>
     <para>In any case, the initial value of the decrementer or the initial
     difference between the counter and the compare registers, respectively,
     must be set accordingly to a known relation between the real time and the
     speed of the decrementer or the counter register, respectively.</para>
     <para>The rest of this section will, for the sake of clarity, focus on the
     two-register scheme. The decrementer scheme is very similar.</para>
     <para>The kernel must reinitialize the counter registers after each clock
     interrupt in order to schedule next interrupt. However this step is tricky
     and must be done with caution. Imagine that the clock interrupt is masked
     either because the kernel is servicing another interrupt or because the
     processor locally disabled interrupts for a while. If the clock interrupt
     occurs during this period, it will be pending until interrupts are enabled
     again. In theory, that could happen arbitrary counter register ticks
     later. Which is worse, the ideal time period between two non-delayed clock
     interrupts can also elapse arbitrary number of times before the delayed
     interrupt gets serviced. The architecture-specific part of the clock
     interrupt driver must avoid time drifts caused by this by taking proactive
     counter-measures.</para>
     <para>Let us assume that the kernel wants each clock interrupt be
     generated every <constant>TICKCONST</constant> ticks. This value
     represents the ideal number of ticks between two non-delayed clock
     interrupts and has some known relation to real time. On each clock
     interrupt, the kernel computes and writes down the expected value of the
     counter register as it hopes to read it on the next clock interrupt. When
     that interrupt comes, the kernel reads the counter register again and
     compares it with the written down value. If the difference is smaller than
     or equal to <constant>TICKCONST</constant>, then the time drift is none or
     small and the next interrupt is scheduled earlier with a penalty of so
     many ticks as is the value of the difference. However, if the difference
     is bigger, then at least one clock signal was missed. In that case, the
     missed clock signal is remembered in the special counter. If there are
     more missed signals, each of them is recorded there. The next interrupt is
     scheduled with respect to the difference similarily to the former case.
     This time, the penalty is taken modulo <constant>TICKCONST</constant>. The
     effect of missed clock signals is remedied in the generic clock interrupt
     handler.</para>
   </section>
   <section>
     <title>Timeouts</title>
     <para>Kernel subsystems can register a callback function to be executed
     with a specified delay. Such a registration is represented by a kernel
     structure called <classname>timeout</classname>. Timeouts are registered
     via <code>timeout_register</code> function. This function takes a pointer
     to a timeout structure, a callback function, a parameter of the callback
     function and a delay in microseconds as parameters. After the structure is
     initialized with all these values, it is sorted into the processor's list
-    of active timeouts.Timeouts are sorted in this list according to the
+    of active timeouts, according to the number of clock interrupts remaining
-    number of clock interrupts remaining to their expiration and relative to
+    to their expiration and relatively to already listed timeouts.</para>
-    each other. </para>
     <para>Timeouts can be unregistered via <code>timeout_unregister</code>.
     This function can, as opposed to <code>timeout_register</code>, fail when
     it is too late to remove the timeout from the list of active
     timeouts.</para>
     <para>Timeouts are nearing their expiration in the list of active timeouts
     which exists on every processor in the system. The expiration counters are
     decremented on each clock interrupt by the generic clock interrupt
     handler. Due to the relative ordering of timeouts in the list, it is
     sufficient to decrement expiration counter only of the first timeout in
     the list. Timeouts with expiration counter equal to zero are removed from
     the list and their callback function is called with respective
     parameter.</para>
   </section>
   <section>
     <title>Generic clock interrupt handler</title>
     <para>On each clock interrupt, the architecture specific part of the clock
     interrupt handler makes a call to the generic clock interrupt handler
     implemented by the <code>clock</code> function. The generic handler takes
     care of several mission critical goals:</para>
     <itemizedlist>
       <listitem>
         <para>expiration of timeouts,</para>
       </listitem>
       <listitem>
         <para>updating time of the day counters for userspace and</para>
       </listitem>
       <listitem>
         <para>preemption of threads.</para>
       </listitem>
     </itemizedlist>
-    <para>The first two goals are performed exactly one more times than is the
+    <para>The <code>clock</code> function checks for expired timeouts and
+    decrements unexpired timeout expiration counters exactly one more times
-    number of missed clock signals (i.e. at least once and possibly more
+    than is the number of missed clock signals (i.e. at least once and
-    times, depending on the missed clock signals counter). The remaining
+    possibly more times, depending on the missed clock signals counter). The
+    time of the day counters are also updated one more times than is the
+    number of missed clock signals. And finally, the remaining timeslice of
-    timeslice of the running thread is decremented also with respect to this
+    the running thread is decremented with respect to this counter as well. By
-    counter. By considering its value, the kernel performs actions that would
+    considering its value, the kernel performs actions that would otherwise be
-    otherwise be lost due to an occasional excessive time drift described in
+    lost due to an occasional excessive time drift described in previous
-    previous paragraphs.</para>
+    paragraphs.</para>
+  </section>
+  <section>
+    <title>Time source for userspace</title>
+    <para>In HelenOS, userspace tasks don't communicate with the kernel in
+    order to read system time. Instead, a mechanism that shares kernel time of
+    the day counters with userspace address spaces is deployed. On the kernel
+    side, during system initialization, HelenOS allocates a frame of physical
+    memory and stores the time of the day counters there. The counters have
+    the following structure:</para>
+    <itemizedlist>
+      <listitem>
+        <para>first 32-bit counter for seconds,</para>
+      </listitem>
+      <listitem>
+        <para>32-bit counter for microseconds and</para>
+      </listitem>
+      <listitem>
+        <para>second 32-bit counter for seconds.</para>
+      </listitem>
+    </itemizedlist>
+    <para>One of the userspace tasks with capabilities of memory manager (e.g.
+    ns) asks the kernel to map this frame into its address space. Other,
+    non-privileged, tasks then use IPC to communicate read-only sharing of
+    this memory. Reading time in a userspace task is therefore just a matter
+    of reading memory.</para>
+    <para>There are two interesting points about this. First, the counters are
+-bit even on 64-bit machines. The goal is to provide subsecond precision
+    with the possibility to span roughly 136 years. Note that a single 64-bit
+    microsecond counter could not be usually read atomically on 32-bit
+    platforms. Now the second point is that 32-bit platforms cannot atomically
+    read two 32-bit counters either. However, a generic protocol is used to
+    guarantee that sequentially read times will create a non-decreasing
+    sequence.</para>
+    <para>The problematic part is updating both seconds and microseconds once
+    in a second. Seconds must be incremented and microseconds must be reset.
+    However, without any synchronization, the two kernel stores and the two
+    userspace reads can arbitrarily interleave. Furthemore, the reader has no
+    chance to detect that the counters were updated only from half. Therefore
+    three counters are used in HelenOS.</para>
+    <para>If seconds need to be updated, the kernel increments the first
+    second counter, issues a write memory barrier operation, updates the
+    microsecond counter, issues another write memory barrier operation and
+    increments the second second counter. When only microseconds need to be
+    updated, no special action is taken by the kernel. On the other hand, the
+    userspace task must always read all three counters and in reversed order.
+    A read memory barrier operation must be issued between each two reads. A
+    non-atomic read is detected when the two second counters differ. The
+    userspace library solves this situation by returning zero instead of the
+    value read from the microsecond counter.</para>
   </section>
 </chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_time.xml – Rev 55 → 56