Rev 169 | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
9 | bondari | 1 | <?xml version="1.0" encoding="UTF-8"?> |
85 | palkovsky | 2 | <chapter id="ipc"> |
3 | <?dbhtml filename="ipc.html"?> |
||
9 | bondari | 4 | |
85 | palkovsky | 5 | <title>IPC</title> |
9 | bondari | 6 | |
85 | palkovsky | 7 | <para>Due to the high intertask communication traffic, IPC becomes critical |
8 | subsystem for microkernels, putting high demands on the speed, latency and |
||
9 | reliability of IPC model and implementation. Although theoretically the use |
||
10 | of asynchronous messaging system looks promising, it is not often |
||
11 | implemented because of a problematic implementation of end user |
||
117 | palkovsky | 12 | applications. HelenOS implements fully asynchronous messaging system with a |
13 | special layer providing a user application developer a reasonably |
||
85 | palkovsky | 14 | synchronous multithreaded environment sufficient to develop complex |
15 | protocols.</para> |
||
38 | bondari | 16 | |
85 | palkovsky | 17 | <section> |
128 | palkovsky | 18 | <title>Kernel Services</title> |
9 | bondari | 19 | |
171 | jermar | 20 | <para>Every message consists of four numeric arguments (32-bit and 64-bit |
21 | on the corresponding platforms), from which the first one is considered a |
||
85 | palkovsky | 22 | method number on message receipt and a return value on answer receipt. The |
23 | received message contains identification of the incoming connection, so |
||
99 | palkovsky | 24 | that the receiving application can distinguish the messages between |
25 | different senders. Internally the message contains pointer to the |
||
26 | originating task and to the source of the communication channel. If the |
||
27 | message is forwarded, the originating task identifies the recipient of the |
||
28 | answer, the source channel identifies the connection in case of a hangup |
||
29 | response.</para> |
||
85 | palkovsky | 30 | |
31 | <para>Every message must be eventually answered. The system keeps track of |
||
32 | all messages, so that it can answer them with appropriate error code |
||
33 | should one of the connection parties fail unexpectedly. To limit buffering |
||
157 | jermar | 34 | of the messages in the kernel, every task has a limit on the amount of |
35 | asynchronous messages it can send simultaneously. If the limit is reached, |
||
160 | jermar | 36 | the kernel refuses to send any other message until some active message is |
99 | palkovsky | 37 | answered.</para> |
85 | palkovsky | 38 | |
99 | palkovsky | 39 | <para>To facilitate kernel-to-user communication, the IPC subsystem |
40 | provides notification messages. The applications can subscribe to a |
||
41 | notification channel and receive messages directed to this channel. Such |
||
42 | messages can be freely sent even from interrupt context as they are |
||
43 | primarily destined to deliver IRQ events to userspace device drivers. |
||
44 | These messages need not be answered, there is no party that could receive |
||
45 | such response.</para> |
||
46 | |||
85 | palkovsky | 47 | <section> |
128 | palkovsky | 48 | <title>Low Level IPC</title> |
85 | palkovsky | 49 | |
50 | <para>The whole IPC subsystem consists of one-way communication |
||
51 | channels. Each task has one associated message queue (answerbox). The |
||
157 | jermar | 52 | task can call other tasks and connect its phones to their answerboxes, |
112 | palkovsky | 53 | send and forward messages through these connections and answer received |
85 | palkovsky | 54 | messages. Every sent message is identified by a unique number, so that |
55 | the response can be later matched against it. The message is sent over |
||
157 | jermar | 56 | the phone to the target answerbox. The server application periodically |
85 | palkovsky | 57 | checks the answerbox and pulls messages from several queues associated |
157 | jermar | 58 | with it. After completing the requested action, the server sends a reply |
99 | palkovsky | 59 | back to the answerbox of the originating task. If a need arises, it is |
157 | jermar | 60 | possible to <emphasis>forward</emphasis> a received message through any |
99 | palkovsky | 61 | of the open phones to another task. This mechanism is used e.g. for |
157 | jermar | 62 | opening new connections to services via the naming service.</para> |
85 | palkovsky | 63 | |
112 | palkovsky | 64 | <para>The answerbox contains four different message queues:</para> |
65 | |||
66 | <itemizedlist> |
||
67 | <listitem> |
||
68 | <para>Incoming call queue</para> |
||
69 | </listitem> |
||
70 | |||
71 | <listitem> |
||
72 | <para>Dispatched call queue</para> |
||
73 | </listitem> |
||
74 | |||
75 | <listitem> |
||
76 | <para>Answer queue</para> |
||
77 | </listitem> |
||
78 | |||
79 | <listitem> |
||
80 | <para>Notification queue</para> |
||
81 | </listitem> |
||
82 | </itemizedlist> |
||
83 | |||
114 | bondari | 84 | <figure float="1"> |
137 | palkovsky | 85 | <title>Low level IPC</title> |
86 | |||
114 | bondari | 87 | <mediaobject id="ipc1"> |
88 | <imageobject role="pdf"> |
||
89 | <imagedata fileref="images/ipc1.pdf" format="PDF" /> |
||
90 | </imageobject> |
||
91 | |||
92 | <imageobject role="html"> |
||
93 | <imagedata fileref="images/ipc1.png" format="PNG" /> |
||
94 | </imageobject> |
||
95 | |||
96 | <imageobject role="fop"> |
||
97 | <imagedata fileref="images/ipc1.svg" format="SVG" /> |
||
98 | </imageobject> |
||
99 | </mediaobject> |
||
100 | </figure> |
||
101 | |||
112 | palkovsky | 102 | <para>The communication between task A, that is connected to task B |
157 | jermar | 103 | looks as follows: task A sends a message over its phone to the target |
171 | jermar | 104 | asnwerbox. The message is saved in task B's incoming call queue. When |
105 | task B fetches the message for processing, it is automatically moved |
||
106 | into the dispatched call queue. After the server decides to answer the |
||
107 | message, it is removed from dispatched queue and the result is moved |
||
108 | into the answer queue of task A.</para> |
||
112 | palkovsky | 109 | |
99 | palkovsky | 110 | <para>The arguments contained in the message are completely arbitrary |
111 | and decided by the user. The low level part of kernel IPC fills in |
||
112 | appropriate error codes if there is an error during communication. It is |
||
112 | palkovsky | 113 | assured that the applications are correctly notified about communication |
114 | state. If a program closes the outgoing connection, the target answerbox |
||
157 | jermar | 115 | receives a hangup message. The connection identification is not reused |
112 | palkovsky | 116 | until the hangup message is acknowledged and all other pending messages |
117 | are answered.</para> |
||
99 | palkovsky | 118 | |
112 | palkovsky | 119 | <para>Closing an incoming connection is done by responding to any |
120 | incoming message with an EHANGUP error code. The connection is then |
||
121 | immediately closed. The client connection identification (phone id) is |
||
171 | jermar | 122 | not reused, until the client closes its own side of the connection |
123 | ("hangs his phone up").</para> |
||
99 | palkovsky | 124 | |
112 | palkovsky | 125 | <para>When a task dies (whether voluntarily or by being killed), cleanup |
114 | bondari | 126 | process is started.</para> |
99 | palkovsky | 127 | |
128 | <orderedlist> |
||
129 | <listitem> |
||
157 | jermar | 130 | <para>hangs up all outgoing connections and sends hangup messages to |
131 | all target answerboxes,</para> |
||
99 | palkovsky | 132 | </listitem> |
133 | |||
134 | <listitem> |
||
157 | jermar | 135 | <para>disconnects all incoming connections,</para> |
99 | palkovsky | 136 | </listitem> |
137 | |||
138 | <listitem> |
||
157 | jermar | 139 | <para>disconnects from all notification channels,</para> |
99 | palkovsky | 140 | </listitem> |
141 | |||
142 | <listitem> |
||
157 | jermar | 143 | <para>answers all unanswered messages from answerbox queues with |
144 | appropriate error code and</para> |
||
99 | palkovsky | 145 | </listitem> |
146 | |||
147 | <listitem> |
||
157 | jermar | 148 | <para>waits until all outgoing messages are answered and all |
99 | palkovsky | 149 | remaining answerbox queues are empty.</para> |
150 | </listitem> |
||
151 | </orderedlist> |
||
85 | palkovsky | 152 | </section> |
153 | |||
154 | <section> |
||
128 | palkovsky | 155 | <title>System Call IPC Layer</title> |
85 | palkovsky | 156 | |
157 | <para>On top of this simple protocol the kernel provides special |
||
99 | palkovsky | 158 | services closely related to the inter-process communication. A range of |
159 | method numbers is allocated and protocol is defined for these functions. |
||
171 | jermar | 160 | These messages are interpreted by the kernel layer and appropriate |
161 | actions are taken depending on the parameters of the message and the |
||
162 | answer.</para> |
||
99 | palkovsky | 163 | |
164 | <para>The kernel provides the following services:</para> |
||
165 | |||
166 | <itemizedlist> |
||
167 | <listitem> |
||
157 | jermar | 168 | <para>creating new outgoing connection,</para> |
99 | palkovsky | 169 | </listitem> |
170 | |||
171 | <listitem> |
||
157 | jermar | 172 | <para>creating a callback connection,</para> |
99 | palkovsky | 173 | </listitem> |
174 | |||
175 | <listitem> |
||
160 | jermar | 176 | <para>sending an address space area and</para> |
99 | palkovsky | 177 | </listitem> |
178 | |||
179 | <listitem> |
||
157 | jermar | 180 | <para>asking for an address space area.</para> |
99 | palkovsky | 181 | </listitem> |
182 | </itemizedlist> |
||
112 | palkovsky | 183 | |
157 | jermar | 184 | <para>On startup, every task is automatically connected to a |
185 | <emphasis>naming service task</emphasis>, which provides a switchboard |
||
171 | jermar | 186 | functionality. In order to open a new outgoing connection, the client |
187 | sends a <constant>CONNECT_ME_TO</constant> message using any of his |
||
188 | phones. If the recepient of this message answers with an accepting |
||
189 | answer, a new connection is created. In itself, this mechanism would |
||
190 | allow only duplicating existing connection. However, if the message is |
||
191 | forwarded, the new connection is made to the final recipient.</para> |
||
112 | palkovsky | 192 | |
171 | jermar | 193 | <para>In order for a task to be able to forward a message, it must have |
194 | a phone connected to the destination task. The destination task |
||
195 | establishes such connection by sending the |
||
196 | <constant>CONNECT_TO_ME</constant> message to the forwarding task. A |
||
197 | callback connection is opened afterwards. Every service that wants to |
||
198 | receive connections has to ask the naming service to create the callback |
||
199 | connection via this mechanism.</para> |
||
112 | palkovsky | 200 | |
201 | <para>Tasks can share their address space areas using IPC messages. The |
||
171 | jermar | 202 | two message types - <constant>AS_AREA_SEND</constant> and |
203 | <constant>AS_AREA_RECV</constant> are used for sending and receiving an |
||
204 | address space area respectively. The shared area can be accessed as soon |
||
205 | as the message is acknowledged.</para> |
||
85 | palkovsky | 206 | </section> |
207 | </section> |
||
208 | |||
209 | <section> |
||
128 | palkovsky | 210 | <title>Userspace View</title> |
85 | palkovsky | 211 | |
157 | jermar | 212 | <para>The conventional design of the asynchronous API seems to produce |
85 | palkovsky | 213 | applications with one event loop and several big switch statements. |
171 | jermar | 214 | However, by intensive utilization of userspace fibrils, it was possible to |
215 | create an environment that is not necessarily restricted to this type of |
||
216 | event-driven programming and allows for more fluent expression of |
||
99 | palkovsky | 217 | application programs.</para> |
85 | palkovsky | 218 | |
219 | <section> |
||
128 | palkovsky | 220 | <title>Single Point of Entry</title> |
85 | palkovsky | 221 | |
157 | jermar | 222 | <para>Each task is associated with only one answerbox. If a |
171 | jermar | 223 | multithreaded application needs to communicate, it must be not only able |
224 | to send a message, but it should be able to retrieve the answer as well. |
||
225 | If several fibrils pull messages from task answerbox, it is a matter of |
||
226 | coincidence, which fibril receives which message. If a particular fibril |
||
227 | needs to wait for a message answer, an idle <emphasis>manager</emphasis> |
||
228 | fibril is found or a new one is created and control is transfered to |
||
229 | this manager fibril. The manager fibrils pop messages from the answerbox |
||
230 | and put them into appropriate queues of running fibrils. If a fibril |
||
231 | waiting for a message is not running, the control is transferred to |
||
232 | it.</para> |
||
117 | palkovsky | 233 | |
114 | bondari | 234 | <figure float="1"> |
137 | palkovsky | 235 | <title>Single point of entry</title> |
171 | jermar | 236 | |
114 | bondari | 237 | <mediaobject id="ipc2"> |
238 | <imageobject role="pdf"> |
||
239 | <imagedata fileref="images/ipc2.pdf" format="PDF" /> |
||
240 | </imageobject> |
||
85 | palkovsky | 241 | |
114 | bondari | 242 | <imageobject role="html"> |
243 | <imagedata fileref="images/ipc2.png" format="PNG" /> |
||
244 | </imageobject> |
||
245 | |||
246 | <imageobject role="fop"> |
||
247 | <imagedata fileref="images/ipc2.svg" format="SVG" /> |
||
248 | </imageobject> |
||
249 | </mediaobject> |
||
250 | </figure> |
||
251 | |||
85 | palkovsky | 252 | <para>Very similar situation arises when a task decides to send a lot of |
157 | jermar | 253 | messages and reaches the kernel limit of asynchronous messages. In such |
254 | situation, two remedies are available - the userspace library can either |
||
85 | palkovsky | 255 | cache the message locally and resend the message when some answers |
171 | jermar | 256 | arrive, or it can block the fibril and let it go on only after the |
157 | jermar | 257 | message is finally sent to the kernel layer. With one exception, HelenOS |
171 | jermar | 258 | uses the second approach - when the kernel responds that the maximum |
259 | limit of asynchronous messages was reached, the control is transferred |
||
260 | to a manager fibril. The manager fibril then handles incoming replies |
||
261 | and, when space is available, sends the message to the kernel and |
||
262 | resumes the application fibril execution.</para> |
||
85 | palkovsky | 263 | |
264 | <para>If a kernel notification is received, the servicing procedure is |
||
171 | jermar | 265 | run in the context of the manager fibril. Although it wouldn't be |
85 | palkovsky | 266 | impossible to allow recursive calling, it could potentially lead to an |
171 | jermar | 267 | explosion of manager fibrils. Thus, the kernel notification procedures |
85 | palkovsky | 268 | are not allowed to wait for a message result, they can only answer |
269 | messages and send new ones without waiting for their results. If the |
||
270 | kernel limit for outgoing messages is reached, the data is automatically |
||
271 | cached within the application. This behaviour is enforced automatically |
||
157 | jermar | 272 | and the decision making is hidden from the developer.</para> |
121 | palkovsky | 273 | |
274 | <figure float="1"> |
||
137 | palkovsky | 275 | <title>Single point of entry solution</title> |
171 | jermar | 276 | |
121 | palkovsky | 277 | <mediaobject id="ipc3"> |
278 | <imageobject role="pdf"> |
||
279 | <imagedata fileref="images/ipc3.pdf" format="PDF" /> |
||
280 | </imageobject> |
||
281 | |||
282 | <imageobject role="html"> |
||
283 | <imagedata fileref="images/ipc3.png" format="PNG" /> |
||
284 | </imageobject> |
||
285 | |||
286 | <imageobject role="fop"> |
||
287 | <imagedata fileref="images/ipc3.svg" format="SVG" /> |
||
288 | </imageobject> |
||
289 | </mediaobject> |
||
290 | </figure> |
||
85 | palkovsky | 291 | </section> |
292 | |||
293 | <section> |
||
128 | palkovsky | 294 | <title>Ordering Problem</title> |
85 | palkovsky | 295 | |
117 | palkovsky | 296 | <para>Unfortunately, the real world is is never so simple. E.g. if a |
297 | server handles incoming requests and as a part of its response sends |
||
157 | jermar | 298 | asynchronous messages, it can be easily preempted and another thread may |
85 | palkovsky | 299 | start intervening. This can happen even if the application utilizes only |
157 | jermar | 300 | one userspace thread. Classical synchronization using semaphores is not |
301 | possible as locking on them would block the thread completely so that |
||
117 | palkovsky | 302 | the answer couldn't be ever processed. The IPC framework allows a |
303 | developer to specify, that part of the code should not be preempted by |
||
171 | jermar | 304 | any other fibril (except notification handlers) while still being able |
305 | to queue messages belonging to other fibrils and regain control when the |
||
117 | palkovsky | 306 | answer arrives.</para> |
85 | palkovsky | 307 | |
308 | <para>This mechanism works transparently in multithreaded environment, |
||
117 | palkovsky | 309 | where additional locking mechanism (futexes) should be used. The IPC |
171 | jermar | 310 | framework ensures that there will always be enough free userspace |
311 | threads to handle incoming answers and allow the application to run more |
||
312 | fibrils inside the userspace threads without the danger of locking all |
||
313 | userspace threads in futexes.</para> |
||
85 | palkovsky | 314 | </section> |
315 | |||
316 | <section> |
||
128 | palkovsky | 317 | <title>The Interface</title> |
85 | palkovsky | 318 | |
117 | palkovsky | 319 | <para>The interface was developed to be as simple to use as possible. |
169 | jermar | 320 | Typical applications simply send messages and occasionally wait for an |
117 | palkovsky | 321 | answer and check results. If the number of sent messages is higher than |
157 | jermar | 322 | the kernel limit, the flow of application is stopped until some answers |
323 | arrive. On the other hand, server applications are expected to work in a |
||
117 | palkovsky | 324 | multithreaded environment.</para> |
325 | |||
157 | jermar | 326 | <para>The server interface requires the developer to specify a |
171 | jermar | 327 | <function>connection_fibril</function> function. When new connection is |
328 | detected, a new fibril is automatically created and control is |
||
117 | palkovsky | 329 | transferred to this function. The code then decides whether to accept |
330 | the connection and creates a normal event loop. The userspace IPC |
||
171 | jermar | 331 | library ensures correct switching between several threads within the |
332 | kernel environment.</para> |
||
85 | palkovsky | 333 | </section> |
334 | </section> |
||
335 | </chapter> |