Subversion Repositories HelenOS-doc

Rev

Rev 169 | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <chapter id="ipc">
  3.   <?dbhtml filename="ipc.html"?>
  4.  
  5.   <title>IPC</title>
  6.  
  7.   <para>Due to the high intertask communication traffic, IPC becomes critical
  8.   subsystem for microkernels, putting high demands on the speed, latency and
  9.   reliability of IPC model and implementation. Although theoretically the use
  10.   of asynchronous messaging system looks promising, it is not often
  11.   implemented because of a problematic implementation of end user
  12.   applications. HelenOS implements fully asynchronous messaging system with a
  13.   special layer providing a user application developer a reasonably
  14.   synchronous multithreaded environment sufficient to develop complex
  15.   protocols.</para>
  16.  
  17.   <section>
  18.     <title>Kernel Services</title>
  19.  
  20.     <para>Every message consists of four numeric arguments (32-bit and 64-bit
  21.     on the corresponding platforms), from which the first one is considered a
  22.     method number on message receipt and a return value on answer receipt. The
  23.     received message contains identification of the incoming connection, so
  24.     that the receiving application can distinguish the messages between
  25.     different senders. Internally the message contains pointer to the
  26.     originating task and to the source of the communication channel. If the
  27.     message is forwarded, the originating task identifies the recipient of the
  28.     answer, the source channel identifies the connection in case of a hangup
  29.     response.</para>
  30.  
  31.     <para>Every message must be eventually answered. The system keeps track of
  32.     all messages, so that it can answer them with appropriate error code
  33.     should one of the connection parties fail unexpectedly. To limit buffering
  34.     of the messages in the kernel, every task has a limit on the amount of
  35.     asynchronous messages it can send simultaneously. If the limit is reached,
  36.     the kernel refuses to send any other message until some active message is
  37.     answered.</para>
  38.  
  39.     <para>To facilitate kernel-to-user communication, the IPC subsystem
  40.     provides notification messages. The applications can subscribe to a
  41.     notification channel and receive messages directed to this channel. Such
  42.     messages can be freely sent even from interrupt context as they are
  43.     primarily destined to deliver IRQ events to userspace device drivers.
  44.     These messages need not be answered, there is no party that could receive
  45.     such response.</para>
  46.  
  47.     <section>
  48.       <title>Low Level IPC</title>
  49.  
  50.       <para>The whole IPC subsystem consists of one-way communication
  51.       channels. Each task has one associated message queue (answerbox). The
  52.       task can call other tasks and connect its phones to their answerboxes,
  53.       send and forward messages through these connections and answer received
  54.       messages. Every sent message is identified by a unique number, so that
  55.       the response can be later matched against it. The message is sent over
  56.       the phone to the target answerbox. The server application periodically
  57.       checks the answerbox and pulls messages from several queues associated
  58.       with it. After completing the requested action, the server sends a reply
  59.       back to the answerbox of the originating task. If a need arises, it is
  60.       possible to <emphasis>forward</emphasis> a received message through any
  61.       of the open phones to another task. This mechanism is used e.g. for
  62.       opening new connections to services via the naming service.</para>
  63.  
  64.       <para>The answerbox contains four different message queues:</para>
  65.  
  66.       <itemizedlist>
  67.         <listitem>
  68.           <para>Incoming call queue</para>
  69.         </listitem>
  70.  
  71.         <listitem>
  72.           <para>Dispatched call queue</para>
  73.         </listitem>
  74.  
  75.         <listitem>
  76.           <para>Answer queue</para>
  77.         </listitem>
  78.  
  79.         <listitem>
  80.           <para>Notification queue</para>
  81.         </listitem>
  82.       </itemizedlist>
  83.  
  84.       <figure float="1">
  85.         <title>Low level IPC</title>
  86.  
  87.         <mediaobject id="ipc1">
  88.           <imageobject role="pdf">
  89.             <imagedata fileref="images/ipc1.pdf" format="PDF" />
  90.           </imageobject>
  91.  
  92.           <imageobject role="html">
  93.             <imagedata fileref="images/ipc1.png" format="PNG" />
  94.           </imageobject>
  95.  
  96.           <imageobject role="fop">
  97.             <imagedata fileref="images/ipc1.svg" format="SVG" />
  98.           </imageobject>
  99.         </mediaobject>
  100.       </figure>
  101.  
  102.       <para>The communication between task A, that is connected to task B
  103.       looks as follows: task A sends a message over its phone to the target
  104.       asnwerbox. The message is saved in task B's incoming call queue. When
  105.      task B fetches the message for processing, it is automatically moved
  106.      into the dispatched call queue. After the server decides to answer the
  107.      message, it is removed from dispatched queue and the result is moved
  108.      into the answer queue of task A.</para>
  109.  
  110.      <para>The arguments contained in the message are completely arbitrary
  111.      and decided by the user. The low level part of kernel IPC fills in
  112.      appropriate error codes if there is an error during communication. It is
  113.      assured that the applications are correctly notified about communication
  114.      state. If a program closes the outgoing connection, the target answerbox
  115.      receives a hangup message. The connection identification is not reused
  116.      until the hangup message is acknowledged and all other pending messages
  117.      are answered.</para>
  118.  
  119.      <para>Closing an incoming connection is done by responding to any
  120.      incoming message with an EHANGUP error code. The connection is then
  121.      immediately closed. The client connection identification (phone id) is
  122.      not reused, until the client closes its own side of the connection
  123.      ("hangs his phone up").</para>
  124.  
  125.      <para>When a task dies (whether voluntarily or by being killed), cleanup
  126.      process is started.</para>
  127.  
  128.      <orderedlist>
  129.        <listitem>
  130.          <para>hangs up all outgoing connections and sends hangup messages to
  131.          all target answerboxes,</para>
  132.        </listitem>
  133.  
  134.        <listitem>
  135.          <para>disconnects all incoming connections,</para>
  136.        </listitem>
  137.  
  138.        <listitem>
  139.          <para>disconnects from all notification channels,</para>
  140.        </listitem>
  141.  
  142.        <listitem>
  143.          <para>answers all unanswered messages from answerbox queues with
  144.          appropriate error code and</para>
  145.        </listitem>
  146.  
  147.        <listitem>
  148.          <para>waits until all outgoing messages are answered and all
  149.          remaining answerbox queues are empty.</para>
  150.        </listitem>
  151.      </orderedlist>
  152.    </section>
  153.  
  154.    <section>
  155.      <title>System Call IPC Layer</title>
  156.  
  157.      <para>On top of this simple protocol the kernel provides special
  158.      services closely related to the inter-process communication. A range of
  159.      method numbers is allocated and protocol is defined for these functions.
  160.      These messages are interpreted by the kernel layer and appropriate
  161.      actions are taken depending on the parameters of the message and the
  162.      answer.</para>
  163.  
  164.      <para>The kernel provides the following services:</para>
  165.  
  166.      <itemizedlist>
  167.        <listitem>
  168.          <para>creating new outgoing connection,</para>
  169.        </listitem>
  170.  
  171.        <listitem>
  172.          <para>creating a callback connection,</para>
  173.        </listitem>
  174.  
  175.        <listitem>
  176.          <para>sending an address space area and</para>
  177.        </listitem>
  178.  
  179.        <listitem>
  180.          <para>asking for an address space area.</para>
  181.        </listitem>
  182.      </itemizedlist>
  183.  
  184.      <para>On startup, every task is automatically connected to a
  185.      <emphasis>naming service task</emphasis>, which provides a switchboard
  186.      functionality. In order to open a new outgoing connection, the client
  187.      sends a <constant>CONNECT_ME_TO</constant> message using any of his
  188.      phones. If the recepient of this message answers with an accepting
  189.      answer, a new connection is created. In itself, this mechanism would
  190.      allow only duplicating existing connection. However, if the message is
  191.      forwarded, the new connection is made to the final recipient.</para>
  192.  
  193.      <para>In order for a task to be able to forward a message, it must have
  194.      a phone connected to the destination task. The destination task
  195.      establishes such connection by sending the
  196.      <constant>CONNECT_TO_ME</constant> message to the forwarding task. A
  197.      callback connection is opened afterwards. Every service that wants to
  198.      receive connections has to ask the naming service to create the callback
  199.      connection via this mechanism.</para>
  200.  
  201.      <para>Tasks can share their address space areas using IPC messages. The
  202.      two message types - <constant>AS_AREA_SEND</constant> and
  203.      <constant>AS_AREA_RECV</constant> are used for sending and receiving an
  204.      address space area respectively. The shared area can be accessed as soon
  205.      as the message is acknowledged.</para>
  206.    </section>
  207.  </section>
  208.  
  209.  <section>
  210.    <title>Userspace View</title>
  211.  
  212.    <para>The conventional design of the asynchronous API seems to produce
  213.    applications with one event loop and several big switch statements.
  214.    However, by intensive utilization of userspace fibrils, it was possible to
  215.    create an environment that is not necessarily restricted to this type of
  216.    event-driven programming and allows for more fluent expression of
  217.    application programs.</para>
  218.  
  219.    <section>
  220.      <title>Single Point of Entry</title>
  221.  
  222.      <para>Each task is associated with only one answerbox. If a
  223.      multithreaded application needs to communicate, it must be not only able
  224.      to send a message, but it should be able to retrieve the answer as well.
  225.      If several fibrils pull messages from task answerbox, it is a matter of
  226.      coincidence, which fibril receives which message. If a particular fibril
  227.      needs to wait for a message answer, an idle <emphasis>manager</emphasis>
  228.      fibril is found or a new one is created and control is transfered to
  229.      this manager fibril. The manager fibrils pop messages from the answerbox
  230.      and put them into appropriate queues of running fibrils. If a fibril
  231.      waiting for a message is not running, the control is transferred to
  232.      it.</para>
  233.  
  234.      <figure float="1">
  235.        <title>Single point of entry</title>
  236.  
  237.        <mediaobject id="ipc2">
  238.          <imageobject role="pdf">
  239.            <imagedata fileref="images/ipc2.pdf" format="PDF" />
  240.          </imageobject>
  241.  
  242.          <imageobject role="html">
  243.            <imagedata fileref="images/ipc2.png" format="PNG" />
  244.          </imageobject>
  245.  
  246.          <imageobject role="fop">
  247.            <imagedata fileref="images/ipc2.svg" format="SVG" />
  248.          </imageobject>
  249.        </mediaobject>
  250.      </figure>
  251.  
  252.      <para>Very similar situation arises when a task decides to send a lot of
  253.      messages and reaches the kernel limit of asynchronous messages. In such
  254.      situation, two remedies are available - the userspace library can either
  255.      cache the message locally and resend the message when some answers
  256.      arrive, or it can block the fibril and let it go on only after the
  257.      message is finally sent to the kernel layer. With one exception, HelenOS
  258.      uses the second approach - when the kernel responds that the maximum
  259.      limit of asynchronous messages was reached, the control is transferred
  260.      to a manager fibril. The manager fibril then handles incoming replies
  261.      and, when space is available, sends the message to the kernel and
  262.      resumes the application fibril execution.</para>
  263.  
  264.      <para>If a kernel notification is received, the servicing procedure is
  265.      run in the context of the manager fibril. Although it wouldn't be
  266.       impossible to allow recursive calling, it could potentially lead to an
  267.       explosion of manager fibrils. Thus, the kernel notification procedures
  268.       are not allowed to wait for a message result, they can only answer
  269.       messages and send new ones without waiting for their results. If the
  270.       kernel limit for outgoing messages is reached, the data is automatically
  271.       cached within the application. This behaviour is enforced automatically
  272.       and the decision making is hidden from the developer.</para>
  273.  
  274.       <figure float="1">
  275.         <title>Single point of entry solution</title>
  276.  
  277.         <mediaobject id="ipc3">
  278.           <imageobject role="pdf">
  279.             <imagedata fileref="images/ipc3.pdf" format="PDF" />
  280.           </imageobject>
  281.  
  282.           <imageobject role="html">
  283.             <imagedata fileref="images/ipc3.png" format="PNG" />
  284.           </imageobject>
  285.  
  286.           <imageobject role="fop">
  287.             <imagedata fileref="images/ipc3.svg" format="SVG" />
  288.           </imageobject>
  289.         </mediaobject>
  290.       </figure>
  291.     </section>
  292.  
  293.     <section>
  294.       <title>Ordering Problem</title>
  295.  
  296.       <para>Unfortunately, the real world is is never so simple. E.g. if a
  297.       server handles incoming requests and as a part of its response sends
  298.       asynchronous messages, it can be easily preempted and another thread may
  299.       start intervening. This can happen even if the application utilizes only
  300.       one userspace thread. Classical synchronization using semaphores is not
  301.       possible as locking on them would block the thread completely so that
  302.       the answer couldn't be ever processed. The IPC framework allows a
  303.      developer to specify, that part of the code should not be preempted by
  304.      any other fibril (except notification handlers) while still being able
  305.      to queue messages belonging to other fibrils and regain control when the
  306.      answer arrives.</para>
  307.  
  308.      <para>This mechanism works transparently in multithreaded environment,
  309.      where additional locking mechanism (futexes) should be used. The IPC
  310.      framework ensures that there will always be enough free userspace
  311.      threads to handle incoming answers and allow the application to run more
  312.      fibrils inside the userspace threads without the danger of locking all
  313.      userspace threads in futexes.</para>
  314.    </section>
  315.  
  316.    <section>
  317.      <title>The Interface</title>
  318.  
  319.      <para>The interface was developed to be as simple to use as possible.
  320.      Typical applications simply send messages and occasionally wait for an
  321.      answer and check results. If the number of sent messages is higher than
  322.      the kernel limit, the flow of application is stopped until some answers
  323.      arrive. On the other hand, server applications are expected to work in a
  324.      multithreaded environment.</para>
  325.  
  326.      <para>The server interface requires the developer to specify a
  327.      <function>connection_fibril</function> function. When new connection is
  328.      detected, a new fibril is automatically created and control is
  329.      transferred to this function. The code then decides whether to accept
  330.      the connection and creates a normal event loop. The userspace IPC
  331.      library ensures correct switching between several threads within the
  332.      kernel environment.</para>
  333.    </section>
  334.  </section>
  335. </chapter>