<?xml version="1.0" encoding="UTF-8"?>
<chapter id="ipc">
<?dbhtml filename="ipc.html"?>
<title>IPC</title>
<para>Due to the high intertask communication traffic, IPC becomes critical
subsystem for microkernels, putting high demands on the speed, latency and
reliability of IPC model and implementation. Although theoretically the use
of asynchronous messaging system looks promising, it is not often
implemented because of a problematic implementation of end user
applications. HelenOS implements fully asynchronous messaging system with a
special layer providing a user application developer a reasonably
synchronous multithreaded environment sufficient to develop complex
protocols.</para>
<section>
<title>Kernel Services</title>
<para>Every message consists of 4 numeric arguments (32-bit and 64-bit on
the corresponding platforms), from which the first one is considered a
method number on message receipt and a return value on answer receipt. The
received message contains identification of the incoming connection, so
that the receiving application can distinguish the messages between
different senders. Internally the message contains pointer to the
originating task and to the source of the communication channel. If the
message is forwarded, the originating task identifies the recipient of the
answer, the source channel identifies the connection in case of a hangup
response.</para>
<para>Every message must be eventually answered. The system keeps track of
all messages, so that it can answer them with appropriate error code
should one of the connection parties fail unexpectedly. To limit buffering
of the messages in the kernel, every process is has a limited account of
asynchronous messages it can send simultanously. If the limit is reached,
the kernel refuses to send any other message, until some active message is
answered.</para>
<para>To facilitate kernel-to-user communication, the IPC subsystem
provides notification messages. The applications can subscribe to a
notification channel and receive messages directed to this channel. Such
messages can be freely sent even from interrupt context as they are
primarily destined to deliver IRQ events to userspace device drivers.
These messages need not be answered, there is no party that could receive
such response.</para>
<section>
<title>Low Level IPC</title>
<para>The whole IPC subsystem consists of one-way communication
channels. Each task has one associated message queue (answerbox). The
task can call other tasks and connect it's phones to their answerboxes.,
send and forward messages through these connections and answer received
messages. Every sent message is identified by a unique number, so that
the response can be later matched against it. The message is sent over
the phone to the target answerbox. Server application periodically
checks the answerbox and pulls messages from several queues associated
with it. After completing the requested action, server sends a reply
back to the answerbox of the originating task. If a need arises, it is
possible to <emphasis>forward</emphasis> a recevied message throught any
of the open phones to another task. This mechanism is used e.g. for
opening new connections.</para>
<para>The answerbox contains four different message queues:</para>
<itemizedlist>
<listitem>
<para>Incoming call queue</para>
</listitem>
<listitem>
<para>Dispatched call queue</para>
</listitem>
<listitem>
<para>Answer queue</para>
</listitem>
<listitem>
<para>Notification queue</para>
</listitem>
</itemizedlist>
<figure float="1">
<title>Low level IPC</title>
<mediaobject id="ipc1">
<imageobject role="pdf">
<imagedata fileref="images/ipc1.pdf" format="PDF" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/ipc1.png" format="PNG" />
</imageobject>
<imageobject role="fop">
<imagedata fileref="images/ipc1.svg" format="SVG" />
</imageobject>
</mediaobject>
</figure>
<para>The communication between task A, that is connected to task B
looks as follows: Task A sends a message over it's phone to the target
asnwerbox. The message is saved in task B incoming call queue. When task
B fetches the message for processing, it is automatically moved into the
dispatched call queue. After the server decides to answer the message,
it is removed from dispatched queue and the result is moved into the
answer queue of task A.</para>
<para>The arguments contained in the message are completely arbitrary
and decided by the user. The low level part of kernel IPC fills in
appropriate error codes if there is an error during communication. It is
assured that the applications are correctly notified about communication
state. If a program closes the outgoing connection, the target answerbox
receives a hangup message. The connection identification is not reused,
until the hangup message is acknowledged and all other pending messages
are answered.</para>
<para>Closing an incoming connection is done by responding to any
incoming message with an EHANGUP error code. The connection is then
immediately closed. The client connection identification (phone id) is
not reused, until the client issues closes it's own side of the
connection ("hangs his phone up").</para>
<para>When a task dies (whether voluntarily or by being killed), cleanup
process is started.</para>
<orderedlist>
<listitem>
<para>Hangs up all outgoing connections and sends hangup messages to
all target answerboxes.</para>
</listitem>
<listitem>
<para>Disconnects all incoming connections.</para>
</listitem>
<listitem>
<para>Disconnects from all notification channels.</para>
</listitem>
<listitem>
<para>Answers all unanswered messages from answerbox queues with
appropriate error code.</para>
</listitem>
<listitem>
<para>Waits until all outgoing messages are answered and all
remaining answerbox queues are empty.</para>
</listitem>
</orderedlist>
</section>
<section>
<title>System Call IPC Layer</title>
<para>On top of this simple protocol the kernel provides special
services closely related to the inter-process communication. A range of
method numbers is allocated and protocol is defined for these functions.
The messages are interpreted by the kernel layer and appropriate actions
are taken depending on the parameters of message and answer.</para>
<para>The kernel provides the following services:</para>
<itemizedlist>
<listitem>
<para>Creating new outgoing connection</para>
</listitem>
<listitem>
<para>Creating a callback connection</para>
</listitem>
<listitem>
<para>Sending an address space area</para>
</listitem>
<listitem>
<para>Asking for an address space area</para>
</listitem>
</itemizedlist>
<para>On startup every task is automatically connected to a
<emphasis>name service task</emphasis>, which provides a switchboard
functionality. To open a new outgoing connection, the client sends a
<constant>CONNECT_ME_TO</constant> message using any of his phones. If
the recepient of this message answers with an accepting answer, a new
connection is created. In itself, this mechanism would allow only
duplicating existing connection. However, if the message is forwarded,
the new connection is made to the final recipient.</para>
<para>On startup every task is automatically connect to the name service
task, which acts as a switchboard and forwards requests for connection
to specific services. To be able to forward a message it must have a
phone connected to the service tasks. The task creates this connection
using a <constant>CONNECT_TO_ME</constant> message which creates a
callback connection. Every service that wants to receive connections
asks name service task to create a callback connection.</para>
<para>Tasks can share their address space areas using IPC messages. The
2 message types - <constant>AS_AREA_SEND</constant> and <constant>AS_AREA_RECV</constant> are used for sending and
receiving an address area respectively. The shared area can be accessed
as soon as the message is acknowledged.</para>
</section>
</section>
<section>
<title>Userspace View</title>
<para>The conventional design of the asynchronous api seems to produce
applications with one event loop and several big switch statements.
However, by intensive utilization of user-space threads, it was possible
to create an environment that is not necesarilly restricted to this type
of event-driven programming and allows for more fluent expression of
application programs.</para>
<section>
<title>Single Point of Entry</title>
<para>Each tasks is associated with only one answerbox. If a
multi-threaded application needs to communicate, it must be not only
able to send a message, but it should be able to retrieve the answer as
well. If several threads pull messages from task answerbox, it is a
matter of fortune, which thread receives which message. If a particular
thread needs to wait for a message answer, an idle
<emphasis>manager</emphasis> task is found or a new one is created and
control is transfered to this manager task. The manager tasks pops
messages from the answerbox and puts them into appropriate queues of
running tasks. If a task waiting for a message is not running, the
control is transferred to it.</para>
<figure float="1">
<title>Single point of entry</title>
<mediaobject id="ipc2">
<imageobject role="pdf">
<imagedata fileref="images/ipc2.pdf" format="PDF" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/ipc2.png" format="PNG" />
</imageobject>
<imageobject role="fop">
<imagedata fileref="images/ipc2.svg" format="SVG" />
</imageobject>
</mediaobject>
</figure>
<para>Very similar situation arises when a task decides to send a lot of
messages and reaches kernel limit of asynchronous messages. In such
situation 2 remedies are available - the userspace liberary can either
cache the message locally and resend the message when some answers
arrive, or it can block the thread and let it go on only after the
message is finally sent to the kernel layer. With one exception HelenOS
uses the second approach - when the kernel responds that maximum limit
of asynchronous messages was reached, control is transferred to manager
thread. The manager thread then handles incoming replies and when space
is available, sends the message to kernel and resumes application thread
execution.</para>
<para>If a kernel notification is received, the servicing procedure is
run in the context of the manager thread. Although it wouldn't be
impossible to allow recursive calling, it could potentially lead to an
explosion of manager threads. Thus, the kernel notification procedures
are not allowed to wait for a message result, they can only answer
messages and send new ones without waiting for their results. If the
kernel limit for outgoing messages is reached, the data is automatically
cached within the application. This behaviour is enforced automatically
and the decision making is hidden from developers view.</para>
<figure float="1">
<title>Single point of entry solution</title>
<mediaobject id="ipc3">
<imageobject role="pdf">
<imagedata fileref="images/ipc3.pdf" format="PDF" />
</imageobject>
<imageobject role="html">
<imagedata fileref="images/ipc3.png" format="PNG" />
</imageobject>
<imageobject role="fop">
<imagedata fileref="images/ipc3.svg" format="SVG" />
</imageobject>
</mediaobject>
</figure>
</section>
<section>
<title>Ordering Problem</title>
<para>Unfortunately, the real world is is never so simple. E.g. if a
server handles incoming requests and as a part of its response sends
asynchronous messages, it can be easily prempted and other thread may
start intervening. This can happen even if the application utilizes only
1 kernel thread. Classical synchronization using semaphores is not
possible, as locking on them would block the thread completely so that
the answer couldn't be ever processed. The IPC framework allows a
developer to specify, that part of the code should not be preempted by
any other thread (except notification handlers) while still being able
to queue messages belonging to other threads and regain control when the
answer arrives.</para>
<para>This mechanism works transparently in multithreaded environment,
where additional locking mechanism (futexes) should be used. The IPC
framework ensures that there will always be enough free kernel threads
to handle incoming answers and allow the application to run more
user-space threads inside the kernel threads without the danger of
locking all kernel threads in futexes.</para>
</section>
<section>
<title>The Interface</title>
<para>The interface was developed to be as simple to use as possible.
Classical applications simply send messages and occasionally wait for an
answer and check results. If the number of sent messages is higher than
kernel limit, the flow of application is stopped until some answers
arrive. On the other hand server applications are expected to work in a
multithreaded environment.</para>
<para>The server interface requires developer to specify a
<function>connection_thread</function> function. When new connection is
detected, a new userspace thread is automatically created and control is
transferred to this function. The code then decides whether to accept
the connection and creates a normal event loop. The userspace IPC
library ensures correct switching between several userspace threads
within the kernel environment.</para>
</section>
</section>
</chapter>