Rev 112 | Rev 117 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed
| Rev | Author | Line No. | Line |
|---|---|---|---|
| 9 | bondari | 1 | <?xml version="1.0" encoding="UTF-8"?> |
| 85 | palkovsky | 2 | <chapter id="ipc"> |
| 3 | <?dbhtml filename="ipc.html"?> |
||
| 9 | bondari | 4 | |
| 85 | palkovsky | 5 | <title>IPC</title> |
| 9 | bondari | 6 | |
| 85 | palkovsky | 7 | <para>Due to the high intertask communication traffic, IPC becomes critical |
| 8 | subsystem for microkernels, putting high demands on the speed, latency and |
||
| 9 | reliability of IPC model and implementation. Although theoretically the use |
||
| 10 | of asynchronous messaging system looks promising, it is not often |
||
| 11 | implemented because of a problematic implementation of end user |
||
| 112 | palkovsky | 12 | applications. HelenOS implements a fully asynchronous messaging system with |
| 13 | a special layer providing a user application developer a reasonably |
||
| 85 | palkovsky | 14 | synchronous multithreaded environment sufficient to develop complex |
| 15 | protocols.</para> |
||
| 38 | bondari | 16 | |
| 85 | palkovsky | 17 | <section> |
| 18 | <title>Services provided by kernel</title> |
||
| 9 | bondari | 19 | |
| 85 | palkovsky | 20 | <para>Every message consists of 4 numeric arguments (32-bit and 64-bit on |
| 21 | the corresponding platforms), from which the first one is considered a |
||
| 22 | method number on message receipt and a return value on answer receipt. The |
||
| 23 | received message contains identification of the incoming connection, so |
||
| 99 | palkovsky | 24 | that the receiving application can distinguish the messages between |
| 25 | different senders. Internally the message contains pointer to the |
||
| 26 | originating task and to the source of the communication channel. If the |
||
| 27 | message is forwarded, the originating task identifies the recipient of the |
||
| 28 | answer, the source channel identifies the connection in case of a hangup |
||
| 29 | response.</para> |
||
| 85 | palkovsky | 30 | |
| 31 | <para>Every message must be eventually answered. The system keeps track of |
||
| 32 | all messages, so that it can answer them with appropriate error code |
||
| 33 | should one of the connection parties fail unexpectedly. To limit buffering |
||
| 99 | palkovsky | 34 | of the messages in the kernel, every process is has a limited account of |
| 35 | asynchronous messages it can send simultanously. If the limit is reached, |
||
| 36 | the kernel refuses to send any other message, until some active message is |
||
| 37 | answered.</para> |
||
| 85 | palkovsky | 38 | |
| 99 | palkovsky | 39 | <para>To facilitate kernel-to-user communication, the IPC subsystem |
| 40 | provides notification messages. The applications can subscribe to a |
||
| 41 | notification channel and receive messages directed to this channel. Such |
||
| 42 | messages can be freely sent even from interrupt context as they are |
||
| 43 | primarily destined to deliver IRQ events to userspace device drivers. |
||
| 44 | These messages need not be answered, there is no party that could receive |
||
| 45 | such response.</para> |
||
| 46 | |||
| 85 | palkovsky | 47 | <section> |
| 48 | <title>Low level IPC</title> |
||
| 49 | |||
| 50 | <para>The whole IPC subsystem consists of one-way communication |
||
| 51 | channels. Each task has one associated message queue (answerbox). The |
||
| 112 | palkovsky | 52 | task can call other tasks and connect it's phones to their answerboxes., |
| 53 | send and forward messages through these connections and answer received |
||
| 85 | palkovsky | 54 | messages. Every sent message is identified by a unique number, so that |
| 55 | the response can be later matched against it. The message is sent over |
||
| 56 | the phone to the target answerbox. Server application periodically |
||
| 57 | checks the answerbox and pulls messages from several queues associated |
||
| 58 | with it. After completing the requested action, server sends a reply |
||
| 99 | palkovsky | 59 | back to the answerbox of the originating task. If a need arises, it is |
| 60 | possible to <emphasis>forward</emphasis> a recevied message throught any |
||
| 61 | of the open phones to another task. This mechanism is used e.g. for |
||
| 62 | opening new connections.</para> |
||
| 85 | palkovsky | 63 | |
| 112 | palkovsky | 64 | <para>The answerbox contains four different message queues:</para> |
| 65 | |||
| 66 | <itemizedlist> |
||
| 67 | <listitem> |
||
| 68 | <para>Incoming call queue</para> |
||
| 69 | </listitem> |
||
| 70 | |||
| 71 | <listitem> |
||
| 72 | <para>Dispatched call queue</para> |
||
| 73 | </listitem> |
||
| 74 | |||
| 75 | <listitem> |
||
| 76 | <para>Answer queue</para> |
||
| 77 | </listitem> |
||
| 78 | |||
| 79 | <listitem> |
||
| 80 | <para>Notification queue</para> |
||
| 81 | </listitem> |
||
| 82 | </itemizedlist> |
||
| 83 | |||
| 114 | bondari | 84 | <figure float="1"> |
| 85 | <mediaobject id="ipc1"> |
||
| 86 | <imageobject role="pdf"> |
||
| 87 | <imagedata fileref="images/ipc1.pdf" format="PDF" /> |
||
| 88 | </imageobject> |
||
| 89 | |||
| 90 | <imageobject role="html"> |
||
| 91 | <imagedata fileref="images/ipc1.png" format="PNG" /> |
||
| 92 | </imageobject> |
||
| 93 | |||
| 94 | <imageobject role="fop"> |
||
| 95 | <imagedata fileref="images/ipc1.svg" format="SVG" /> |
||
| 96 | </imageobject> |
||
| 97 | </mediaobject> |
||
| 98 | |||
| 99 | <title>Low level IPC</title> |
||
| 100 | </figure> |
||
| 101 | |||
| 112 | palkovsky | 102 | <para>The communication between task A, that is connected to task B |
| 103 | looks as follows: Task A sends a message over it's phone to the target |
||
| 104 | asnwerbox. The message is saved in task B incoming call queue. When task |
||
| 105 | B fetches the message for processing, it is automatically moved into the |
||
| 106 | dispatched call queue. After the server decides to answer the message, |
||
| 107 | it is removed from dispatched queue and the result is moved into the |
||
| 108 | answer queue of task A.</para> |
||
| 109 | |||
| 99 | palkovsky | 110 | <para>The arguments contained in the message are completely arbitrary |
| 111 | and decided by the user. The low level part of kernel IPC fills in |
||
| 112 | appropriate error codes if there is an error during communication. It is |
||
| 112 | palkovsky | 113 | assured that the applications are correctly notified about communication |
| 114 | state. If a program closes the outgoing connection, the target answerbox |
||
| 115 | receives a hangup message. The connection identification is not reused, |
||
| 116 | until the hangup message is acknowledged and all other pending messages |
||
| 117 | are answered.</para> |
||
| 99 | palkovsky | 118 | |
| 112 | palkovsky | 119 | <para>Closing an incoming connection is done by responding to any |
| 120 | incoming message with an EHANGUP error code. The connection is then |
||
| 121 | immediately closed. The client connection identification (phone id) is |
||
| 122 | not reused, until the client issues closes it's own side of the |
||
| 123 | connection ("hangs his phone up").</para> |
||
| 99 | palkovsky | 124 | |
| 112 | palkovsky | 125 | <para>When a task dies (whether voluntarily or by being killed), cleanup |
| 114 | bondari | 126 | process is started.</para> |
| 99 | palkovsky | 127 | |
| 128 | <orderedlist> |
||
| 129 | <listitem> |
||
| 130 | <para>Hangs up all outgoing connections and sends hangup messages to |
||
| 131 | all target answerboxes.</para> |
||
| 132 | </listitem> |
||
| 133 | |||
| 134 | <listitem> |
||
| 135 | <para>Disconnects all incoming connections.</para> |
||
| 136 | </listitem> |
||
| 137 | |||
| 138 | <listitem> |
||
| 139 | <para>Disconnects from all notification channels.</para> |
||
| 140 | </listitem> |
||
| 141 | |||
| 142 | <listitem> |
||
| 143 | <para>Answers all unanswered messages from answerbox queues with |
||
| 144 | appropriate error code.</para> |
||
| 145 | </listitem> |
||
| 146 | |||
| 147 | <listitem> |
||
| 148 | <para>Waits until all outgoing messages are answered and all |
||
| 149 | remaining answerbox queues are empty.</para> |
||
| 150 | </listitem> |
||
| 151 | </orderedlist> |
||
| 85 | palkovsky | 152 | </section> |
| 153 | |||
| 154 | <section> |
||
| 99 | palkovsky | 155 | <title>System call IPC layer</title> |
| 85 | palkovsky | 156 | |
| 157 | <para>On top of this simple protocol the kernel provides special |
||
| 99 | palkovsky | 158 | services closely related to the inter-process communication. A range of |
| 159 | method numbers is allocated and protocol is defined for these functions. |
||
| 160 | The messages are interpreted by the kernel layer and appropriate actions |
||
| 114 | bondari | 161 | are taken depending on the parameters of message and answer.</para> |
| 99 | palkovsky | 162 | |
| 163 | <para>The kernel provides the following services:</para> |
||
| 164 | |||
| 165 | <itemizedlist> |
||
| 166 | <listitem> |
||
| 167 | <para>Creating new outgoing connection</para> |
||
| 168 | </listitem> |
||
| 169 | |||
| 170 | <listitem> |
||
| 171 | <para>Creating a callback connection</para> |
||
| 172 | </listitem> |
||
| 173 | |||
| 174 | <listitem> |
||
| 175 | <para>Sending an address space area</para> |
||
| 176 | </listitem> |
||
| 177 | |||
| 178 | <listitem> |
||
| 179 | <para>Asking for an address space area</para> |
||
| 180 | </listitem> |
||
| 181 | </itemizedlist> |
||
| 112 | palkovsky | 182 | |
| 183 | <para>On startup every task is automatically connected to a |
||
| 184 | <emphasis>name service task</emphasis>, which provides a switchboard |
||
| 185 | functionality. To open a new outgoing connection, the client sends a |
||
| 186 | <constant>CONNECT_ME_TO</constant> message using any of his phones. If |
||
| 187 | the recepient of this message answers with an accepting answer, a new |
||
| 188 | connection is created. In itself, this mechanism would allow only |
||
| 189 | duplicating existing connection. However, if the message is forwarded, |
||
| 114 | bondari | 190 | the new connection is made to the final recipient.</para> |
| 112 | palkovsky | 191 | |
| 192 | <para>On startup every task is automatically connect to the name service |
||
| 193 | task, which acts as a switchboard and forwards requests for connection |
||
| 194 | to specific services. To be able to forward a message it must have a |
||
| 195 | phone connected to the service tasks. The task creates this connection |
||
| 196 | using a <constant>CONNECT_TO_ME</constant> message which creates a |
||
| 197 | callback connection. Every service that wants to receive connections |
||
| 198 | asks name service task to create a callback connection.</para> |
||
| 199 | |||
| 200 | <para>Tasks can share their address space areas using IPC messages. The |
||
| 201 | 2 message types - AS_AREA_SEND and AS_AREA_RECV are used for sending and |
||
| 202 | receiving an address area respectively. The shared area can be accessed |
||
| 114 | bondari | 203 | as soon as the message is acknowledged.</para> |
| 85 | palkovsky | 204 | </section> |
| 205 | </section> |
||
| 206 | |||
| 207 | <section> |
||
| 208 | <title>Userspace view</title> |
||
| 209 | |||
| 210 | <para>The conventional design of the asynchronous api seems to produce |
||
| 211 | applications with one event loop and several big switch statements. |
||
| 212 | However, by intensive utilization of user-space threads, it was possible |
||
| 213 | to create an environment that is not necesarilly restricted to this type |
||
| 214 | of event-driven programming and allows for more fluent expression of |
||
| 99 | palkovsky | 215 | application programs.</para> |
| 85 | palkovsky | 216 | |
| 217 | <section> |
||
| 218 | <title>Single point of entry</title> |
||
| 219 | |||
| 220 | <para>Each tasks is associated with only one answerbox. If a |
||
| 221 | multi-threaded application needs to communicate, it must be not only |
||
| 222 | able to send a message, but it should be able to retrieve the answer as |
||
| 223 | well. If several threads pull messages from task answerbox, it is a |
||
| 224 | matter of fortune, which thread receives which message. If a particular |
||
| 225 | thread needs to wait for a message answer, an idle |
||
| 226 | <emphasis>manager</emphasis> task is found or a new one is created and |
||
| 227 | control is transfered to this manager task. The manager tasks pops |
||
| 228 | messages from the answerbox and puts them into appropriate queues of |
||
| 229 | running tasks. If a task waiting for a message is not running, the |
||
| 99 | palkovsky | 230 | control is transferred to it.</para> |
| 114 | bondari | 231 | |
| 232 | <figure float="1"> |
||
| 233 | <mediaobject id="ipc2"> |
||
| 234 | <imageobject role="pdf"> |
||
| 235 | <imagedata fileref="images/ipc2.pdf" format="PDF" /> |
||
| 236 | </imageobject> |
||
| 85 | palkovsky | 237 | |
| 114 | bondari | 238 | <imageobject role="html"> |
| 239 | <imagedata fileref="images/ipc2.png" format="PNG" /> |
||
| 240 | </imageobject> |
||
| 241 | |||
| 242 | <imageobject role="fop"> |
||
| 243 | <imagedata fileref="images/ipc2.svg" format="SVG" /> |
||
| 244 | </imageobject> |
||
| 245 | </mediaobject> |
||
| 246 | |||
| 247 | <title>Single point of entry</title> |
||
| 248 | </figure> |
||
| 249 | |||
| 250 | |||
| 85 | palkovsky | 251 | <para>Very similar situation arises when a task decides to send a lot of |
| 252 | messages and reaches kernel limit of asynchronous messages. In such |
||
| 253 | situation 2 remedies are available - the userspace liberary can either |
||
| 254 | cache the message locally and resend the message when some answers |
||
| 255 | arrive, or it can block the thread and let it go on only after the |
||
| 256 | message is finally sent to the kernel layer. With one exception HelenOS |
||
| 257 | uses the second approach - when the kernel responds that maximum limit |
||
| 258 | of asynchronous messages was reached, control is transferred to manager |
||
| 259 | thread. The manager thread then handles incoming replies and when space |
||
| 260 | is available, sends the message to kernel and resumes application thread |
||
| 261 | execution.</para> |
||
| 262 | |||
| 263 | <para>If a kernel notification is received, the servicing procedure is |
||
| 264 | run in the context of the manager thread. Although it wouldn't be |
||
| 265 | impossible to allow recursive calling, it could potentially lead to an |
||
| 266 | explosion of manager threads. Thus, the kernel notification procedures |
||
| 267 | are not allowed to wait for a message result, they can only answer |
||
| 268 | messages and send new ones without waiting for their results. If the |
||
| 269 | kernel limit for outgoing messages is reached, the data is automatically |
||
| 270 | cached within the application. This behaviour is enforced automatically |
||
| 271 | and the decision making is hidden from developers view.</para> |
||
| 272 | </section> |
||
| 273 | |||
| 274 | <section> |
||
| 112 | palkovsky | 275 | <title>Ordering problem</title> |
| 85 | palkovsky | 276 | |
| 277 | <para>Unfortunately, in the real world is is never so easy. E.g. if a |
||
| 278 | server handles incoming requests and as a part of it's response sends |
||
| 279 | asynchronous messages, it can be easily prempted and other thread may |
||
| 280 | start intervening. This can happen even if the application utilizes only |
||
| 281 | 1 kernel thread. Classical synchronization using semaphores is not |
||
| 282 | possible, as locking on them would block the thread completely and the |
||
| 283 | answer couldn't be ever processed. The IPC framework allows a developer |
||
| 284 | to specify, that the thread should not be preempted to any other thread |
||
| 285 | (except notification handlers) while still being able to queue messages |
||
| 99 | palkovsky | 286 | belonging to other threads and regain control when the answer |
| 287 | arrives.</para> |
||
| 85 | palkovsky | 288 | |
| 289 | <para>This mechanism works transparently in multithreaded environment, |
||
| 290 | where classical locking mechanism (futexes) should be used. The IPC |
||
| 291 | framework ensures that there will always be enough free threads to |
||
| 292 | handle the threads requiring correct synchronization and allow the |
||
| 293 | application to run more user-space threads inside the kernel threads |
||
| 294 | without the danger of locking all kernel threads in futexes.</para> |
||
| 295 | </section> |
||
| 296 | |||
| 297 | <section> |
||
| 298 | <title>The interface</title> |
||
| 299 | |||
| 300 | <para></para> |
||
| 301 | </section> |
||
| 302 | </section> |
||
| 303 | </chapter> |