Subversion Repositories HelenOS-doc

Rev

Rev 131 | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="time">
  3.   <?dbhtml filename="time.html"?>
  4.  
  5.   <title>Time Management</title>
  6.  
  7.   <para>Time is one of the dimensions in which kernel as well as the whole
  8.   system operates. It is of special importance to many kernel subsytems.
  9.   Knowledge of time makes it possible for the scheduler to preemptively plan
  10.   threads for execution. Different parts of the kernel can request execution
  11.   of their callback function with a specified delay. A good example of such
  12.   kernel code is the synchronization subsystem which uses this functionality
  13.   to implement timeouting versions of synchronization primitives.</para>
  14.  
  15.   <section>
  16.     <title>System Clock</title>
  17.  
  18.     <para>Every hardware architecture supported by HelenOS must support some
  19.     kind of a device that can be programmed to yield periodic time signals
  20.     (i.e. clock interrupts). Some architectures have external clock that is
  21.     merely programmed by the kernel to interrupt the processor multiple times
  22.     in a second. This is the case of ia32 and amd64 architectures<footnote>
  23.         <para>When running in uniprocessor mode.</para>
  24.       </footnote>, which use i8254 or a compatible chip to achieve the
  25.     goal.</para>
  26.  
  27.     <para>Other architectures' processors typically contain two registers. The
  28.    first register is usually called a compare or a match register and can be
  29.    set to an arbitrary value by the operating system. The contents of the
  30.    compare register then stays unaltered until it is written by the kernel
  31.    again. The second register, often called a counter register, can be also
  32.    written by the kernel, but the processor automatically increments it after
  33.    every executed instruction or in some fixed relation to processor speed.
  34.    The point is that a clock interrupt is generated whenever the values of
  35.    the counter and the compare registers match. Sometimes, the scheme of two
  36.    registers is modified so that only one register is needed. Such a
  37.    register, called a decrementer, then counts towards zero and an interrupt
  38.    is generated when zero is reached.</para>
  39.  
  40.    <para>In any case, the initial value of the decrementer or the initial
  41.    difference between the counter and the compare registers, respectively,
  42.    must be set accordingly to a known relation between the real time and the
  43.    speed of the decrementer or the counter register, respectively.</para>
  44.  
  45.    <para>The rest of this section will, for the sake of clarity, focus on the
  46.    two-register scheme. The decrementer scheme is very similar.</para>
  47.  
  48.    <para>The kernel must reinitialize one of the two registers after each
  49.    clock interrupt in order to schedule next interrupt. However this step is
  50.    tricky and must be done with caution. Imagine that the clock interrupt is
  51.    masked either because the kernel is servicing another interrupt or because
  52.    the processor locally disabled interrupts for a while. If the clock
  53.    interrupt occurs during this period, it will be pending until the
  54.    interrupts are enabled again. Theoretically, it could happen an arbitrary
  55.    counter register ticks later. Which is worse, the ideal time period
  56.    between two non-delayed clock interrupts can also elapse arbitrary number
  57.    of times before the delayed interrupt gets serviced. The
  58.    architecture-specific part of the clock interrupt driver must avoid time
  59.    drifts caused by such behaviour by taking proactive
  60.    counter-measures.</para>
  61.  
  62.    <para>Let us assume that the kernel wants each clock interrupt be
  63.    generated every <constant>TICKCONST</constant> ticks. This value
  64.    represents the ideal number of ticks between two non-delayed clock
  65.    interrupts and has some known relation to real time. On each clock
  66.    interrupt, the kernel computes and writes down the expected value of the
  67.    counter register as it hopes to read it on the next clock interrupt. When
  68.    that interrupt comes, the kernel reads the counter register again and
  69.    compares it with the written down value. If the difference is smaller than
  70.    or equal to <constant>TICKCONST</constant>, then the time drift is none or
  71.    small and the next interrupt is scheduled earlier with a penalty of so
  72.    many ticks as is the value of the difference. However, if the difference
  73.    is bigger, then at least one clock signal was missed. In that case, the
  74.    missed clock signal is remembered in the special counter. If there are
  75.    more missed signals, each of them is recorded there. The next interrupt is
  76.    scheduled with respect to the difference similarily to the former case.
  77.    This time, the penalty is taken modulo <constant>TICKCONST</constant>. The
  78.    effect of missed clock signals is remedied in the generic clock interrupt
  79.    handler.</para>
  80.  </section>
  81.  
  82.  <section>
  83.    <title>Timeouts</title>
  84.  
  85.    <para>Kernel subsystems can register a callback function to be executed
  86.    with a specified delay. Such a registration is represented by a kernel
  87.    structure called <classname>timeout</classname>. Timeouts are registered
  88.    via <code>timeout_register</code> function. This function takes a pointer
  89.    to a timeout structure, a callback function, a parameter of the callback
  90.    function and a delay in microseconds as parameters. After the structure is
  91.    initialized with all these values, it is sorted into the processor's list
  92.     of active timeouts, according to the number of clock interrupts remaining
  93.     to their expiration and relatively to already listed timeouts.</para>
  94.  
  95.     <para>Timeouts can be unregistered via <code>timeout_unregister</code>.
  96.     This function can, as opposed to <code>timeout_register</code>, fail when
  97.     it is too late to remove the timeout from the list of active
  98.     timeouts.</para>
  99.  
  100.     <para>Timeouts are nearing their expiration in the list of active timeouts
  101.     which exists on every processor in the system. The expiration counters are
  102.     decremented on each clock interrupt by the generic clock interrupt
  103.     handler. Due to the relative ordering of timeouts in the list, it is
  104.     sufficient to decrement expiration counter only of the first timeout in
  105.     the list. Timeouts with expiration counter equal to zero are removed from
  106.     the list and their callback function is called with respective
  107.     parameter.</para>
  108.   </section>
  109.  
  110.   <section>
  111.     <title>Generic Clock Interrupt Handler</title>
  112.  
  113.     <para>On each clock interrupt, the architecture specific part of the clock
  114.     interrupt handler makes a call to the generic clock interrupt handler
  115.     implemented by the <code>clock</code> function. The generic handler takes
  116.     care of several mission critical goals:</para>
  117.  
  118.     <itemizedlist>
  119.       <listitem>
  120.         <para>expiration of timeouts,</para>
  121.       </listitem>
  122.  
  123.       <listitem>
  124.         <para>updating time of the day counters for userspace and</para>
  125.       </listitem>
  126.  
  127.       <listitem>
  128.         <para>preemption of threads.</para>
  129.       </listitem>
  130.     </itemizedlist>
  131.  
  132.     <para>The <code>clock</code> function checks for expired timeouts and
  133.     decrements unexpired timeout expiration counters exactly one more times
  134.     than is the number of missed clock signals (i.e. at least once and
  135.     possibly more times, depending on the missed clock signals counter). The
  136.     time of the day counters are also updated one more times than is the
  137.     number of missed clock signals. And finally, the remaining timeslice of
  138.     the running thread is decremented with respect to this counter as well. By
  139.     considering its value, the kernel performs actions that would otherwise be
  140.     lost due to an occasional excessive time drift described in previous
  141.     paragraphs.</para>
  142.   </section>
  143.  
  144.   <section>
  145.     <title>Time Source for Userspace</title>
  146.  
  147.     <para>In HelenOS, userspace tasks don't communicate with the kernel in
  148.    order to read the system time. Instead, a mechanism that shares kernel
  149.    time of the the day counters with userspace address spaces is deployed. On
  150.    the kernel side, during system initialization, HelenOS allocates a frame
  151.    of physical memory and stores the time of the day counters there. The
  152.    counters have the following structure:</para>
  153.  
  154.    <itemizedlist>
  155.      <listitem>
  156.        <para>first 32-bit counter for seconds,</para>
  157.      </listitem>
  158.  
  159.      <listitem>
  160.        <para>32-bit counter for microseconds and</para>
  161.      </listitem>
  162.  
  163.      <listitem>
  164.        <para>second 32-bit counter for seconds.</para>
  165.      </listitem>
  166.    </itemizedlist>
  167.  
  168.    <para>One of the userspace tasks with capabilities of memory manager (e.g.
  169.    ns) asks the kernel to map this frame into its address space. Other
  170.    non-privileged tasks then use IPC to receive read-only share of this
  171.    memory. Reading time in a userspace task is therefore just a matter of
  172.    reading memory.</para>
  173.  
  174.    <para>There are two interesting points about this. First, the counters are
  175.    32-bit even on 64-bit machines. The goal is to provide subsecond precision
  176.    with the possibility to span roughly 136 years. Note that a single 64-bit
  177.    microsecond counter could not be usually read atomically on 32-bit
  178.    platforms. Unfortunately, on 32-bit platforms it is usually impossible to
  179.    read atomically two 32-bit counters either. However, a generic protocol is
  180.    used to guarantee that sequentially read times will create a
  181.    non-decreasing sequence.</para>
  182.  
  183.    <para>The problematic part is incrementing seconds counter and clearing
  184.    microseconds counter together once every second. Seconds must be
  185.    incremented and microseconds must be reset. However, without any
  186.    synchronization, the two kernel stores and the two userspace reads can
  187.    arbitrarily interleave. Furthemore, the reader has no chance to detect
  188.    that the counters were updated only paritally. Therefore three counters
  189.    are used in HelenOS.</para>
  190.  
  191.    <para>If seconds need to be updated, the kernel increments the first
  192.    second counter, issues a write memory barrier operation, updates the
  193.    microsecond counter, issues another write memory barrier operation and
  194.    increments the second second counter. When only microseconds needs to be
  195.    updated, no special action is taken by the kernel. On the other hand, the
  196.    userspace task must always read all three counters in reversed order. A
  197.    read memory barrier operation must be issued between each two reads. A
  198.    non-atomic read is detected when the two second counters differ. The
  199.    userspace library solves this situation by returning higher of them with
  200.    microseconds set to zero.</para>
  201.  </section>
  202. </chapter>