Subversion Repositories HelenOS-doc

Rev

Rev 169 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
9 bondari 1
<?xml version="1.0" encoding="UTF-8"?>
85 palkovsky 2
<chapter id="ipc">
3
  <?dbhtml filename="ipc.html"?>
9 bondari 4
 
85 palkovsky 5
  <title>IPC</title>
9 bondari 6
 
85 palkovsky 7
  <para>Due to the high intertask communication traffic, IPC becomes critical
8
  subsystem for microkernels, putting high demands on the speed, latency and
9
  reliability of IPC model and implementation. Although theoretically the use
10
  of asynchronous messaging system looks promising, it is not often
11
  implemented because of a problematic implementation of end user
117 palkovsky 12
  applications. HelenOS implements fully asynchronous messaging system with a
13
  special layer providing a user application developer a reasonably
85 palkovsky 14
  synchronous multithreaded environment sufficient to develop complex
15
  protocols.</para>
38 bondari 16
 
85 palkovsky 17
  <section>
128 palkovsky 18
    <title>Kernel Services</title>
9 bondari 19
 
171 jermar 20
    <para>Every message consists of four numeric arguments (32-bit and 64-bit
21
    on the corresponding platforms), from which the first one is considered a
85 palkovsky 22
    method number on message receipt and a return value on answer receipt. The
23
    received message contains identification of the incoming connection, so
99 palkovsky 24
    that the receiving application can distinguish the messages between
25
    different senders. Internally the message contains pointer to the
26
    originating task and to the source of the communication channel. If the
27
    message is forwarded, the originating task identifies the recipient of the
28
    answer, the source channel identifies the connection in case of a hangup
29
    response.</para>
85 palkovsky 30
 
31
    <para>Every message must be eventually answered. The system keeps track of
32
    all messages, so that it can answer them with appropriate error code
33
    should one of the connection parties fail unexpectedly. To limit buffering
157 jermar 34
    of the messages in the kernel, every task has a limit on the amount of
35
    asynchronous messages it can send simultaneously. If the limit is reached,
160 jermar 36
    the kernel refuses to send any other message until some active message is
99 palkovsky 37
    answered.</para>
85 palkovsky 38
 
99 palkovsky 39
    <para>To facilitate kernel-to-user communication, the IPC subsystem
40
    provides notification messages. The applications can subscribe to a
41
    notification channel and receive messages directed to this channel. Such
42
    messages can be freely sent even from interrupt context as they are
43
    primarily destined to deliver IRQ events to userspace device drivers.
44
    These messages need not be answered, there is no party that could receive
45
    such response.</para>
46
 
85 palkovsky 47
    <section>
128 palkovsky 48
      <title>Low Level IPC</title>
85 palkovsky 49
 
50
      <para>The whole IPC subsystem consists of one-way communication
51
      channels. Each task has one associated message queue (answerbox). The
157 jermar 52
      task can call other tasks and connect its phones to their answerboxes,
112 palkovsky 53
      send and forward messages through these connections and answer received
85 palkovsky 54
      messages. Every sent message is identified by a unique number, so that
55
      the response can be later matched against it. The message is sent over
157 jermar 56
      the phone to the target answerbox. The server application periodically
85 palkovsky 57
      checks the answerbox and pulls messages from several queues associated
157 jermar 58
      with it. After completing the requested action, the server sends a reply
99 palkovsky 59
      back to the answerbox of the originating task. If a need arises, it is
157 jermar 60
      possible to <emphasis>forward</emphasis> a received message through any
99 palkovsky 61
      of the open phones to another task. This mechanism is used e.g. for
157 jermar 62
      opening new connections to services via the naming service.</para>
85 palkovsky 63
 
112 palkovsky 64
      <para>The answerbox contains four different message queues:</para>
65
 
66
      <itemizedlist>
67
        <listitem>
68
          <para>Incoming call queue</para>
69
        </listitem>
70
 
71
        <listitem>
72
          <para>Dispatched call queue</para>
73
        </listitem>
74
 
75
        <listitem>
76
          <para>Answer queue</para>
77
        </listitem>
78
 
79
        <listitem>
80
          <para>Notification queue</para>
81
        </listitem>
82
      </itemizedlist>
83
 
114 bondari 84
      <figure float="1">
137 palkovsky 85
        <title>Low level IPC</title>
86
 
114 bondari 87
        <mediaobject id="ipc1">
88
          <imageobject role="pdf">
89
            <imagedata fileref="images/ipc1.pdf" format="PDF" />
90
          </imageobject>
91
 
92
          <imageobject role="html">
93
            <imagedata fileref="images/ipc1.png" format="PNG" />
94
          </imageobject>
95
 
96
          <imageobject role="fop">
97
            <imagedata fileref="images/ipc1.svg" format="SVG" />
98
          </imageobject>
99
        </mediaobject>
100
      </figure>
101
 
112 palkovsky 102
      <para>The communication between task A, that is connected to task B
157 jermar 103
      looks as follows: task A sends a message over its phone to the target
171 jermar 104
      asnwerbox. The message is saved in task B's incoming call queue. When
105
      task B fetches the message for processing, it is automatically moved
106
      into the dispatched call queue. After the server decides to answer the
107
      message, it is removed from dispatched queue and the result is moved
108
      into the answer queue of task A.</para>
112 palkovsky 109
 
99 palkovsky 110
      <para>The arguments contained in the message are completely arbitrary
111
      and decided by the user. The low level part of kernel IPC fills in
112
      appropriate error codes if there is an error during communication. It is
112 palkovsky 113
      assured that the applications are correctly notified about communication
114
      state. If a program closes the outgoing connection, the target answerbox
157 jermar 115
      receives a hangup message. The connection identification is not reused
112 palkovsky 116
      until the hangup message is acknowledged and all other pending messages
117
      are answered.</para>
99 palkovsky 118
 
112 palkovsky 119
      <para>Closing an incoming connection is done by responding to any
120
      incoming message with an EHANGUP error code. The connection is then
121
      immediately closed. The client connection identification (phone id) is
171 jermar 122
      not reused, until the client closes its own side of the connection
123
      ("hangs his phone up").</para>
99 palkovsky 124
 
112 palkovsky 125
      <para>When a task dies (whether voluntarily or by being killed), cleanup
114 bondari 126
      process is started.</para>
99 palkovsky 127
 
128
      <orderedlist>
129
        <listitem>
157 jermar 130
          <para>hangs up all outgoing connections and sends hangup messages to
131
          all target answerboxes,</para>
99 palkovsky 132
        </listitem>
133
 
134
        <listitem>
157 jermar 135
          <para>disconnects all incoming connections,</para>
99 palkovsky 136
        </listitem>
137
 
138
        <listitem>
157 jermar 139
          <para>disconnects from all notification channels,</para>
99 palkovsky 140
        </listitem>
141
 
142
        <listitem>
157 jermar 143
          <para>answers all unanswered messages from answerbox queues with
144
          appropriate error code and</para>
99 palkovsky 145
        </listitem>
146
 
147
        <listitem>
157 jermar 148
          <para>waits until all outgoing messages are answered and all
99 palkovsky 149
          remaining answerbox queues are empty.</para>
150
        </listitem>
151
      </orderedlist>
85 palkovsky 152
    </section>
153
 
154
    <section>
128 palkovsky 155
      <title>System Call IPC Layer</title>
85 palkovsky 156
 
157
      <para>On top of this simple protocol the kernel provides special
99 palkovsky 158
      services closely related to the inter-process communication. A range of
159
      method numbers is allocated and protocol is defined for these functions.
171 jermar 160
      These messages are interpreted by the kernel layer and appropriate
161
      actions are taken depending on the parameters of the message and the
162
      answer.</para>
99 palkovsky 163
 
164
      <para>The kernel provides the following services:</para>
165
 
166
      <itemizedlist>
167
        <listitem>
157 jermar 168
          <para>creating new outgoing connection,</para>
99 palkovsky 169
        </listitem>
170
 
171
        <listitem>
157 jermar 172
          <para>creating a callback connection,</para>
99 palkovsky 173
        </listitem>
174
 
175
        <listitem>
160 jermar 176
          <para>sending an address space area and</para>
99 palkovsky 177
        </listitem>
178
 
179
        <listitem>
157 jermar 180
          <para>asking for an address space area.</para>
99 palkovsky 181
        </listitem>
182
      </itemizedlist>
112 palkovsky 183
 
157 jermar 184
      <para>On startup, every task is automatically connected to a
185
      <emphasis>naming service task</emphasis>, which provides a switchboard
171 jermar 186
      functionality. In order to open a new outgoing connection, the client
187
      sends a <constant>CONNECT_ME_TO</constant> message using any of his
188
      phones. If the recepient of this message answers with an accepting
189
      answer, a new connection is created. In itself, this mechanism would
190
      allow only duplicating existing connection. However, if the message is
191
      forwarded, the new connection is made to the final recipient.</para>
112 palkovsky 192
 
171 jermar 193
      <para>In order for a task to be able to forward a message, it must have
194
      a phone connected to the destination task. The destination task
195
      establishes such connection by sending the
196
      <constant>CONNECT_TO_ME</constant> message to the forwarding task. A
197
      callback connection is opened afterwards. Every service that wants to
198
      receive connections has to ask the naming service to create the callback
199
      connection via this mechanism.</para>
112 palkovsky 200
 
201
      <para>Tasks can share their address space areas using IPC messages. The
171 jermar 202
      two message types - <constant>AS_AREA_SEND</constant> and
203
      <constant>AS_AREA_RECV</constant> are used for sending and receiving an
204
      address space area respectively. The shared area can be accessed as soon
205
      as the message is acknowledged.</para>
85 palkovsky 206
    </section>
207
  </section>
208
 
209
  <section>
128 palkovsky 210
    <title>Userspace View</title>
85 palkovsky 211
 
157 jermar 212
    <para>The conventional design of the asynchronous API seems to produce
85 palkovsky 213
    applications with one event loop and several big switch statements.
171 jermar 214
    However, by intensive utilization of userspace fibrils, it was possible to
215
    create an environment that is not necessarily restricted to this type of
216
    event-driven programming and allows for more fluent expression of
99 palkovsky 217
    application programs.</para>
85 palkovsky 218
 
219
    <section>
128 palkovsky 220
      <title>Single Point of Entry</title>
85 palkovsky 221
 
157 jermar 222
      <para>Each task is associated with only one answerbox. If a
171 jermar 223
      multithreaded application needs to communicate, it must be not only able
224
      to send a message, but it should be able to retrieve the answer as well.
225
      If several fibrils pull messages from task answerbox, it is a matter of
226
      coincidence, which fibril receives which message. If a particular fibril
227
      needs to wait for a message answer, an idle <emphasis>manager</emphasis>
228
      fibril is found or a new one is created and control is transfered to
229
      this manager fibril. The manager fibrils pop messages from the answerbox
230
      and put them into appropriate queues of running fibrils. If a fibril
231
      waiting for a message is not running, the control is transferred to
232
      it.</para>
117 palkovsky 233
 
114 bondari 234
      <figure float="1">
137 palkovsky 235
        <title>Single point of entry</title>
171 jermar 236
 
114 bondari 237
        <mediaobject id="ipc2">
238
          <imageobject role="pdf">
239
            <imagedata fileref="images/ipc2.pdf" format="PDF" />
240
          </imageobject>
85 palkovsky 241
 
114 bondari 242
          <imageobject role="html">
243
            <imagedata fileref="images/ipc2.png" format="PNG" />
244
          </imageobject>
245
 
246
          <imageobject role="fop">
247
            <imagedata fileref="images/ipc2.svg" format="SVG" />
248
          </imageobject>
249
        </mediaobject>
250
      </figure>
251
 
85 palkovsky 252
      <para>Very similar situation arises when a task decides to send a lot of
157 jermar 253
      messages and reaches the kernel limit of asynchronous messages. In such
254
      situation, two remedies are available - the userspace library can either
85 palkovsky 255
      cache the message locally and resend the message when some answers
171 jermar 256
      arrive, or it can block the fibril and let it go on only after the
157 jermar 257
      message is finally sent to the kernel layer. With one exception, HelenOS
171 jermar 258
      uses the second approach - when the kernel responds that the maximum
259
      limit of asynchronous messages was reached, the control is transferred
260
      to a manager fibril. The manager fibril then handles incoming replies
261
      and, when space is available, sends the message to the kernel and
262
      resumes the application fibril execution.</para>
85 palkovsky 263
 
264
      <para>If a kernel notification is received, the servicing procedure is
171 jermar 265
      run in the context of the manager fibril. Although it wouldn't be
85 palkovsky 266
      impossible to allow recursive calling, it could potentially lead to an
171 jermar 267
      explosion of manager fibrils. Thus, the kernel notification procedures
85 palkovsky 268
      are not allowed to wait for a message result, they can only answer
269
      messages and send new ones without waiting for their results. If the
270
      kernel limit for outgoing messages is reached, the data is automatically
271
      cached within the application. This behaviour is enforced automatically
157 jermar 272
      and the decision making is hidden from the developer.</para>
121 palkovsky 273
 
274
      <figure float="1">
137 palkovsky 275
        <title>Single point of entry solution</title>
171 jermar 276
 
121 palkovsky 277
        <mediaobject id="ipc3">
278
          <imageobject role="pdf">
279
            <imagedata fileref="images/ipc3.pdf" format="PDF" />
280
          </imageobject>
281
 
282
          <imageobject role="html">
283
            <imagedata fileref="images/ipc3.png" format="PNG" />
284
          </imageobject>
285
 
286
          <imageobject role="fop">
287
            <imagedata fileref="images/ipc3.svg" format="SVG" />
288
          </imageobject>
289
        </mediaobject>
290
      </figure>
85 palkovsky 291
    </section>
292
 
293
    <section>
128 palkovsky 294
      <title>Ordering Problem</title>
85 palkovsky 295
 
117 palkovsky 296
      <para>Unfortunately, the real world is is never so simple. E.g. if a
297
      server handles incoming requests and as a part of its response sends
157 jermar 298
      asynchronous messages, it can be easily preempted and another thread may
85 palkovsky 299
      start intervening. This can happen even if the application utilizes only
157 jermar 300
      one userspace thread. Classical synchronization using semaphores is not
301
      possible as locking on them would block the thread completely so that
117 palkovsky 302
      the answer couldn't be ever processed. The IPC framework allows a
303
      developer to specify, that part of the code should not be preempted by
171 jermar 304
      any other fibril (except notification handlers) while still being able
305
      to queue messages belonging to other fibrils and regain control when the
117 palkovsky 306
      answer arrives.</para>
85 palkovsky 307
 
308
      <para>This mechanism works transparently in multithreaded environment,
117 palkovsky 309
      where additional locking mechanism (futexes) should be used. The IPC
171 jermar 310
      framework ensures that there will always be enough free userspace
311
      threads to handle incoming answers and allow the application to run more
312
      fibrils inside the userspace threads without the danger of locking all
313
      userspace threads in futexes.</para>
85 palkovsky 314
    </section>
315
 
316
    <section>
128 palkovsky 317
      <title>The Interface</title>
85 palkovsky 318
 
117 palkovsky 319
      <para>The interface was developed to be as simple to use as possible.
169 jermar 320
      Typical applications simply send messages and occasionally wait for an
117 palkovsky 321
      answer and check results. If the number of sent messages is higher than
157 jermar 322
      the kernel limit, the flow of application is stopped until some answers
323
      arrive. On the other hand, server applications are expected to work in a
117 palkovsky 324
      multithreaded environment.</para>
325
 
157 jermar 326
      <para>The server interface requires the developer to specify a
171 jermar 327
      <function>connection_fibril</function> function. When new connection is
328
      detected, a new fibril is automatically created and control is
117 palkovsky 329
      transferred to this function. The code then decides whether to accept
330
      the connection and creates a normal event loop. The userspace IPC
171 jermar 331
      library ensures correct switching between several threads within the
332
      kernel environment.</para>
85 palkovsky 333
    </section>
334
  </section>
335
</chapter>