Subversion Repositories HelenOS-doc

Rev

Rev 169 | Only display areas with differences | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 169 Rev 171
1
<?xml version="1.0" encoding="UTF-8"?>
1
<?xml version="1.0" encoding="UTF-8"?>
2
<chapter id="scheduling">
2
<chapter id="scheduling">
3
  <?dbhtml filename="scheduling.html"?>
3
  <?dbhtml filename="scheduling.html"?>
4
 
4
 
5
  <title>Scheduling</title>
5
  <title>Scheduling</title>
6
 
6
 
7
  <para>One of the key aims of the operating system is to create and support
7
  <para>One of the key aims of the operating system is to create and support
8
  the impression that several activities are executing contemporarily. This is
8
  the impression that several activities are executing contemporarily. This is
9
  true for both uniprocessor as well as multiprocessor systems. In the case of
9
  true for both uniprocessor as well as multiprocessor systems. In the case of
10
  multiprocessor systems, the activities are truly happening in parallel. The
10
  multiprocessor systems, the activities are truly happening in parallel. The
11
  scheduler helps to materialize this impression by planning threads on as
11
  scheduler helps to materialize this impression by planning threads on as
12
  many processors as possible and, when this strategy reaches its limits, by
12
  many processors as possible and, when this strategy reaches its limits, by
13
  quickly switching among threads executing on a single processor.</para>
13
  quickly switching among threads executing on a single processor.</para>
14
 
14
 
15
  <section>
15
  <section>
16
    <title>Contexts</title>
16
    <title>Contexts</title>
17
 
17
 
18
    <para>The term <emphasis>context</emphasis> refers to the set of processor
18
    <para>The term <emphasis>context</emphasis> refers to the set of processor
19
    resources that define the current state of the computation or the
19
    resources that define the current state of the computation or the
20
    environment and the kernel understands it in several more or less narrow
20
    environment and the kernel understands it in several more or less narrow
21
    sences:</para>
21
    sences:</para>
22
 
22
 
23
    <itemizedlist>
23
    <itemizedlist>
24
      <listitem>
24
      <listitem>
25
        <para>synchronous register context,</para>
25
        <para>synchronous register context,</para>
26
      </listitem>
26
      </listitem>
27
 
27
 
28
      <listitem>
28
      <listitem>
29
        <para>asynchronous register context,</para>
29
        <para>asynchronous register context,</para>
30
      </listitem>
30
      </listitem>
31
 
31
 
32
      <listitem>
32
      <listitem>
33
        <para>FPU context and</para>
33
        <para>FPU context and</para>
34
      </listitem>
34
      </listitem>
35
 
35
 
36
      <listitem>
36
      <listitem>
37
        <para>memory management context.</para>
37
        <para>memory management context.</para>
38
      </listitem>
38
      </listitem>
39
    </itemizedlist>
39
    </itemizedlist>
40
 
40
 
41
    <para>The most narrow sense refers to the the synchronous register
41
    <para>The most narrow sense refers to the the synchronous register
42
    context. It includes all the preserved registers as defined by the
42
    context. It includes all the preserved registers as defined by the
43
    architecture. To highlight some, the program counter and stack pointer
43
    architecture. To highlight some, the program counter and stack pointer
44
    take part in the synchronous register context. These registers must be
44
    take part in the synchronous register context. These registers must be
45
    preserved across a procedure call and during synchronous context
45
    preserved across a procedure call and during synchronous context
46
    switches.</para>
46
    switches.</para>
47
 
47
 
48
    <para>The next type of the context understood by the kernel is the
48
    <para>The next type of the context understood by the kernel is the
49
    asynchronous register context. On an interrupt, the interrupted execution
49
    asynchronous register context. On an interrupt, the interrupted execution
50
    flow's state must be guaranteed to be eventually completely restored.
50
    flow's state must be guaranteed to be eventually completely restored.
51
    Therefore the interrupt context includes, among other things, the scratch
51
    Therefore the interrupt context includes, among other things, the scratch
52
    registers as defined by the architecture. As a special optimization and if
52
    registers as defined by the architecture. As a special optimization and if
53
    certain conditions are met, it need not include the architecture's
53
    certain conditions are met, it need not include the architecture's
54
    preserved registers. The condition mentioned in the previous sentence is
54
    preserved registers. The condition mentioned in the previous sentence is
55
    that the low-level assembly language interrupt routines don't modify the
55
    that the low-level assembly language interrupt routines don't modify the
56
    preserved registers. The handlers usually call a higher-level C routine.
56
    preserved registers. The handlers usually call a higher-level C routine.
57
    The preserved registers are then saved on the stack by the compiler
57
    The preserved registers are then saved on the stack by the compiler
58
    generated code of the higher-level function. In HelenOS, several
58
    generated code of the higher-level function. In HelenOS, several
59
    architectures can be compiled with this optimization.</para>
59
    architectures can be compiled with this optimization.</para>
60
 
60
 
61
    <para>Although the kernel does not do any floating point
61
    <para>Although the kernel does not do any floating point
62
    arithmetics<footnote>
62
    arithmetics<footnote>
63
        <para>Some architectures (e.g. ia64) inevitably use a fixed set of
63
        <para>Some architectures (e.g. ia64) inevitably use a fixed set of
64
        floating point registers to carry out their normal operations.</para>
64
        floating point registers to carry out their normal operations.</para>
65
      </footnote>, it must protect FPU context of userspace threads against
65
      </footnote>, it must protect FPU context of userspace threads against
66
    destruction by other threads. Moreover, only a fraction of userspace
66
    destruction by other threads. Moreover, only a fraction of userspace
67
    programs use the floating point unit. HelenOS contains a generic framework
67
    programs use the floating point unit. HelenOS contains a generic framework
68
    for switching FPU context only when the switch is forced (i.e. a thread
68
    for switching FPU context only when the switch is forced (i.e. a thread
69
    uses a floating point instruction and its FPU context is not loaded in the
69
    uses a floating point instruction and its FPU context is not loaded in the
70
    processor).</para>
70
    processor).</para>
71
 
71
 
72
    <para>The last member of the context family is the memory management
72
    <para>The last member of the context family is the memory management
73
    context. It includes memory management registers that identify address
73
    context. It includes memory management registers that identify address
74
    spaces on hardware level (i.e. ASIDs and page tables pointers).</para>
74
    spaces on hardware level (i.e. ASIDs and page tables pointers).</para>
75
 
75
 
76
    <section>
76
    <section>
77
      <title>Synchronous Context Switches</title>
77
      <title>Synchronous Context Switches</title>
78
 
78
 
79
      <para>The scheduler, but also other pieces of the kernel, make heavy use
79
      <para>The scheduler, but also other pieces of the kernel, make heavy use
80
      of synchronous context switches, because it is a natural vehicle not
80
      of synchronous context switches, because it is a natural vehicle not
81
      only for changes in control flow, but also for switching between two
81
      only for changes in control flow, but also for switching between two
82
      kernel stacks. Two functions figure in a synchronous context switch
82
      kernel stacks. Two functions figure in a synchronous context switch
83
      implementation: <code>context_save()</code> and
83
      implementation: <code>context_save()</code> and
84
      <code>context_restore()</code>. Note that these two functions break the
84
      <code>context_restore()</code>. Note that these two functions break the
85
      natural perception of the linear C code execution flow starting at
85
      natural perception of the linear C code execution flow starting at
86
      function's entry point and ending on one of the function's exit
86
      function's entry point and ending on one of the function's exit
87
      points.</para>
87
      points.</para>
88
 
88
 
89
      <para>When the <code>context_save()</code> function is called, the
89
      <para>When the <code>context_save()</code> function is called, the
90
      synchronous context is saved in a memory structure passed to it. After
90
      synchronous context is saved in a memory structure passed to it. After
91
      executing <code>context_save()</code>, the caller is returned 1 as a
91
      executing <code>context_save()</code>, the caller is returned 1 as a
92
      return value. The execution of instructions continues as normally until
92
      return value. The execution of instructions continues as normally until
93
      <code>context_restore()</code> is called. For the caller, it seems like
93
      <code>context_restore()</code> is called. For the caller, it seems like
94
      the call never returns<footnote>
94
      the call never returns<footnote>
95
          <para>Which might be a source of problems with variable liveliness
95
          <para>Which might be a source of problems with variable liveliness
96
          after <code>context_restore()</code>.</para>
96
          after <code>context_restore()</code>.</para>
97
        </footnote>. Nevertheless, a synchronous register context, which is
97
        </footnote>. Nevertheless, a synchronous register context, which is
98
      saved in a memory structure passed to <code>context_restore()</code>, is
98
      saved in a memory structure passed to <code>context_restore()</code>, is
99
      restored, thus transfering the control flow to the place of occurrence
99
      restored, thus transfering the control flow to the place of occurrence
100
      of the corresponding call to <code>context_save()</code>. From the
100
      of the corresponding call to <code>context_save()</code>. From the
101
      perspective of the caller of the corresponding
101
      perspective of the caller of the corresponding
102
      <code>context_save()</code>, it looks like a return from
102
      <code>context_save()</code>, it looks like a return from
103
      <code>context_save()</code>. However, this time a return value of 0 is
103
      <code>context_save()</code>. However, this time a return value of 0 is
104
      returned.</para>
104
      returned.</para>
105
    </section>
105
    </section>
106
  </section>
106
  </section>
107
 
107
 
108
  <section>
108
  <section>
109
    <title>Threads</title>
109
    <title>Threads</title>
110
 
110
 
111
    <para>A thread is the basic executable entity with some code and a stack.
111
    <para>A thread is the basic executable entity with some code and a stack.
112
    While the code, implemented by a C language function, can be shared by
112
    While the code, implemented by a C language function, can be shared by
113
    several threads, the stack is always private to each instance of the
113
    several threads, the stack is always private to each instance of the
114
    thread. Each thread belongs to exactly one task through which it shares
114
    thread. Each thread belongs to exactly one task through which it shares
115
    address space with its sibling threads. Threads that execute purely in the
115
    address space with its sibling threads. Threads that execute purely in the
116
    kernel don't have any userspace memory allocated. However, when a thread
116
    kernel don't have any userspace memory allocated. However, when a thread
117
    has ambitions to run in userspace, it must be allocated a userspace stack.
117
    has ambitions to run in userspace, it must be allocated a userspace stack.
118
    The distinction between the purely kernel threads and threads running also
118
    The distinction between the purely kernel threads and threads running also
119
    in userspace is made by refering to the former group as to kernel threads
119
    in userspace is made by refering to the former group as to kernel threads
120
    and to the latter group as to userspace threads. Both kernel and userspace
120
    and to the latter group as to userspace threads. Both kernel and userspace
121
    threads are visible to the scheduler and can become a subject of kernel
121
    threads are visible to the scheduler and can become a subject of kernel
122
    preemption and thread migration anytime when preemption is not
122
    preemption and thread migration anytime when preemption is not
123
    disabled.</para>
123
    disabled.</para>
124
 
124
 
125
    <formalpara>
125
    <formalpara>
126
      <title>Thread States</title>
126
      <title>Thread States</title>
127
 
127
 
128
      <para>In each moment, a thread exists in one of six possible thread
128
      <para>In each moment, a thread exists in one of six possible thread
129
      states. When the thread is created and first inserted into the
129
      states. When the thread is created and first inserted into the
130
      scheduler's run queues or when a thread is migrated to a new processor,
130
      scheduler's run queues or when a thread is migrated to a new processor,
131
      it is put into the <constant>Entering</constant> state. After some time
131
      it is put into the <constant>Entering</constant> state. After some time
132
      elapses, the scheduler picks up the thread and starts executing it. A
132
      elapses, the scheduler picks up the thread and starts executing it. A
133
      thread being currently executed on a processor is in the
133
      thread being currently executed on a processor is in the
134
      <constant>Running</constant> state. From there, the thread has three
134
      <constant>Running</constant> state. From there, the thread has three
135
      possibilities. It either runs until it is preemtped, in which case the
135
      possibilities. It either runs until it is preemtped, in which case the
136
      state changes to <constant>Ready</constant>, goes to the
136
      state changes to <constant>Ready</constant>, goes to the
137
      <constant>Sleeping</constant> state by going to sleep or enters the
137
      <constant>Sleeping</constant> state by going to sleep or enters the
138
      <constant>Exiting</constant> state when it reaches termination. When the
138
      <constant>Exiting</constant> state when it reaches termination. When the
139
      thread exits, its kernel structure usually stays in memory, until the
139
      thread exits, its kernel structure usually stays in memory, until the
140
      thread is detached by another thread using <code>thread_detach()</code>
140
      thread is detached by another thread using <code>thread_detach()</code>
141
      function. Terminated but undetached threads are in the
141
      function. Terminated but undetached threads are in the
142
      <constant>Undead</constant> state. When the thread is detached or
142
      <constant>Lingering</constant> state. When the thread is detached or
143
      detaches itself during its life, it is destroyed in the
143
      detaches itself during its life, it is destroyed in the
144
      <constant>Exiting</constant> state and the <constant>Undead</constant>
144
      <constant>Exiting</constant> state and the
145
      state is not reached.<figure float="1">
145
      <constant>Lingering</constant> state is not reached.<figure float="1">
146
          <title>Transitions among thread states.</title>
146
          <title>Transitions among thread states.</title>
147
 
147
 
148
          <mediaobject id="thread_states" xreflabel="">
148
          <mediaobject id="thread_states" xreflabel="">
149
            <imageobject role="pdf">
149
            <imageobject role="pdf">
150
              <imagedata fileref="images/thread_states.pdf" format="PDF" />
150
              <imagedata fileref="images/thread_states.pdf" format="PDF" />
151
            </imageobject>
151
            </imageobject>
152
 
152
 
153
            <imageobject role="html">
153
            <imageobject role="html">
154
              <imagedata fileref="images/thread_states.png" format="PNG" />
154
              <imagedata fileref="images/thread_states.png" format="PNG" />
155
            </imageobject>
155
            </imageobject>
156
 
156
 
157
            <imageobject role="fop">
157
            <imageobject role="fop">
158
              <imagedata fileref="images/thread_states.svg" format="SVG" />
158
              <imagedata fileref="images/thread_states.svg" format="SVG" />
159
            </imageobject>
159
            </imageobject>
160
          </mediaobject>
160
          </mediaobject>
161
        </figure></para>
161
        </figure></para>
162
    </formalpara>
162
    </formalpara>
163
 
163
 
164
    <formalpara>
164
    <formalpara>
165
      <title>Pseudo Threads</title>
165
      <title>Fibrils</title>
166
 
166
 
167
      <para>HelenOS userspace layer knows even smaller units of execution.
167
      <para>HelenOS userspace layer knows even smaller units of execution.
168
      Each userspace thread can make use of an arbitrary number of pseudo
168
      Each userspace thread can make use of an arbitrary number of fibrils.
169
      threads. These pseudo threads have their own synchronous register
169
      These fibrils have their own synchronous register context, userspace
170
      context, userspace code and stack. They live their own life within the
170
      code and stack. They live their own life within the userspace thread and
171
      userspace thread and the scheduler does not have any idea about them
171
      the scheduler does not have any idea about them because they are
172
      because they are completely implemented by the userspace library. This
172
      completely implemented by the userspace library. This implies several
173
      implies several things:<itemizedlist>
173
      things:<itemizedlist>
174
          <listitem>
174
          <listitem>
175
            <para>pseudothreads schedule themselves cooperatively within the
175
            <para>fibrils schedule themselves cooperatively within the time
176
            time slice given to their userspace thread,</para>
176
            slice given to their userspace thread,</para>
177
          </listitem>
177
          </listitem>
178
 
178
 
179
          <listitem>
179
          <listitem>
180
            <para>pseudothreads share FPU context of their containing thread
180
            <para>fibrils share FPU context of their containing thread
181
            and</para>
181
            and</para>
182
          </listitem>
182
          </listitem>
183
 
183
 
184
          <listitem>
184
          <listitem>
185
            <para>all pseudothreads of one userspace thread block when one of
185
            <para>all fibrils of one userspace thread block when one of them
186
            them goes to sleep.</para>
186
            goes to sleep.</para>
187
          </listitem>
187
          </listitem>
188
        </itemizedlist></para>
188
        </itemizedlist></para>
189
    </formalpara>
189
    </formalpara>
190
  </section>
190
  </section>
191
 
191
 
192
  <section>
192
  <section>
193
    <title>Scheduler</title>
193
    <title>Scheduler</title>
194
 
194
 
195
    <section>
195
    <section>
196
      <title>Run Queues</title>
196
      <title>Run Queues</title>
197
 
197
 
198
      <para>There is an array of several run queues on each processor. The
198
      <para>There is an array of several run queues on each processor. The
199
      current version of HelenOS uses 16 run queues implemented by 16 doubly
199
      current version of HelenOS uses 16 run queues implemented by 16 doubly
200
      linked lists. Each of the run queues is associated with thread priority.
200
      linked lists. Each of the run queues is associated with thread priority.
201
      The lower the run queue index in the array is, the higher is the
201
      The lower the run queue index in the array is, the higher is the
202
      priority of threads linked in that run queue and the shorter is the time
202
      priority of threads linked in that run queue and the shorter is the time
203
      in which those threads will execute. When kernel code wants to access
203
      in which those threads will execute. When kernel code wants to access
204
      the run queue, it must first acquire its lock.</para>
204
      the run queue, it must first acquire its lock.</para>
205
    </section>
205
    </section>
206
 
206
 
207
    <section>
207
    <section>
208
      <title>Scheduler Operation</title>
208
      <title>Scheduler Operation</title>
209
 
209
 
210
      <para>The scheduler is invoked either explicitly when a thread calls the
210
      <para>The scheduler is invoked either explicitly when a thread calls the
211
      <code>scheduler()</code> function (e.g. goes to sleep or merely wants to
211
      <code>scheduler()</code> function (e.g. goes to sleep or merely wants to
212
      relinquish the processor for a while) or implicitly on a periodic basis
212
      relinquish the processor for a while) or implicitly on a periodic basis
213
      when the generic clock interrupt preempts the current thread. After its
213
      when the generic clock interrupt preempts the current thread. After its
214
      invocation, the scheduler saves the synchronous register context of the
214
      invocation, the scheduler saves the synchronous register context of the
215
      current thread and switches to its private stack. Afterwards, a new
215
      current thread and switches to its private stack. Afterwards, a new
216
      thread is selected according to the scheduling policy. If there is no
216
      thread is selected according to the scheduling policy. If there is no
217
      suitable thread, the processor is idle and no thread executes on it.
217
      suitable thread, the processor is idle and no thread executes on it.
218
      Note that the act of switching to the private scheduler stack is
218
      Note that the act of switching to the private scheduler stack is
219
      essential. If the processor kept running using the stack of the
219
      essential. If the processor kept running using the stack of the
220
      preempted thread it could damage it because the old thread can be
220
      preempted thread it could damage it because the old thread can be
221
      migrated to another processor and scheduled there. In the worst case
221
      migrated to another processor and scheduled there. In the worst case
222
      scenario, two execution flows would be using the same stack.</para>
222
      scenario, two execution flows would be using the same stack.</para>
223
 
223
 
224
      <para>The scheduling policy is implemented in the function
224
      <para>The scheduling policy is implemented in the function
225
      <code>find_best_thread()</code>. This function walks the processor run
225
      <code>find_best_thread()</code>. This function walks the processor run
226
      queues from lower towards higher indices and looks for a thread. If the
226
      queues from lower towards higher indices and looks for a thread. If the
227
      visited run queue is empty, it simply searches the next run queue. If it
227
      visited run queue is empty, it simply searches the next run queue. If it
228
      is known in advance that there are no ready threads waiting for
228
      is known in advance that there are no ready threads waiting for
229
      execution, <code>find_best_thread()</code> interruptibly halts the
229
      execution, <code>find_best_thread()</code> interruptibly halts the
230
      processor or busy waits until some threads arrive. This process repeats
230
      processor or busy waits until some threads arrive. This process repeats
231
      until <code>find_best_thread()</code> succeeds.</para>
231
      until <code>find_best_thread()</code> succeeds.</para>
232
 
232
 
233
      <para>After the best thread is chosen, the scheduler switches to the
233
      <para>After the best thread is chosen, the scheduler switches to the
234
      thread's task and memory management context. Finally, the saved
234
      thread's task and memory management context. Finally, the saved
235
      synchronous register context is restored and the thread runs. Each
235
      synchronous register context is restored and the thread runs. Each
236
      scheduled thread is given a time slice depending on its priority (i.e.
236
      scheduled thread is given a time slice depending on its priority (i.e.
237
      run queue). The higher priority, the shorter timeslice. To summarize,
237
      run queue). The higher priority, the shorter timeslice. To summarize,
238
      this policy schedules threads with high priorities more frequently but
238
      this policy schedules threads with high priorities more frequently but
239
      gives them smaller time slices. On the other hand, lower priority
239
      gives them smaller time slices. On the other hand, lower priority
240
      threads are scheduled less frequently, but run for longer periods of
240
      threads are scheduled less frequently, but run for longer periods of
241
      time.</para>
241
      time.</para>
242
 
242
 
243
      <para>When a thread uses its entire time slice, it is preempted and put
243
      <para>When a thread uses its entire time slice, it is preempted and put
244
      back into the run queue that immediately follows the previous run queue
244
      back into the run queue that immediately follows the previous run queue
245
      from which the thread ran. Threads that are woken up from a sleep are
245
      from which the thread ran. Threads that are woken up from a sleep are
246
      put into the biggest priority run queue. Low priority threads are
246
      put into the biggest priority run queue. Low priority threads are
247
      therefore those that don't go to sleep so often and just occupy the
247
      therefore those that don't go to sleep so often and just occupy the
248
      processor.</para>
248
      processor.</para>
249
 
249
 
250
      <para>In order to avoid complete starvation of the low priority threads,
250
      <para>In order to avoid complete starvation of the low priority threads,
251
      from time to time, the scheduler will provide them with a bonus of one
251
      from time to time, the scheduler will provide them with a bonus of one
252
      point priority increase. In other words, the scheduler will now and then
252
      point priority increase. In other words, the scheduler will now and then
253
      move the entire run queues one level up.</para>
253
      move the entire run queues one level up.</para>
254
    </section>
254
    </section>
255
 
255
 
256
    <section>
256
    <section>
257
      <title>Processor Load Balancing</title>
257
      <title>Processor Load Balancing</title>
258
 
258
 
259
      <para>Normally, for the sake of cache locality, threads are scheduled on
259
      <para>Normally, for the sake of cache locality, threads are scheduled on
260
      one of the processors and don't leave it. Nevertheless, a situation in
260
      one of the processors and don't leave it. Nevertheless, a situation in
261
      which one processor is heavily overloaded while others sit idle can
261
      which one processor is heavily overloaded while others sit idle can
262
      occur. HelenOS deploys special kernel threads to help mitigate this
262
      occur. HelenOS deploys special kernel threads to help mitigate this
263
      problem. Each processor is associated with one load balancing thread
263
      problem. Each processor is associated with one load balancing thread
264
      called <code>kcpulb</code> that wakes up regularly to see whether its
264
      called <code>kcpulb</code> that wakes up regularly to see whether its
265
      processor is underbalanced or not. If it is, the thread attempts to
265
      processor is underbalanced or not. If it is, the thread attempts to
266
      migrate threads from other overloaded processors to its own processor's
266
      migrate threads from other overloaded processors to its own processor's
267
      run queues. When the job is done or there is no need for load balancing,
267
      run queues. When the job is done or there is no need for load balancing,
268
      the thread goes to sleep.</para>
268
      the thread goes to sleep.</para>
269
 
269
 
270
      <para>The balancing threads operate very gently and try to migrate low
270
      <para>The balancing threads operate very gently and try to migrate low
271
      priority threads first; one <code>kcpulb</code> never takes from one
271
      priority threads first; one <code>kcpulb</code> never takes from one
272
      processor twice in a row. The load balancing threads as well as threads
272
      processor twice in a row. The load balancing threads as well as threads
273
      that were just stolen cannot be migrated. The <code>kcpulb</code>
273
      that were just stolen cannot be migrated. The <code>kcpulb</code>
274
      threads are wired to their processors and cannot be migrated whatsoever.
274
      threads are wired to their processors and cannot be migrated whatsoever.
275
      The ordinary threads are protected only until they are
275
      The ordinary threads are protected only until they are
276
      rescheduled.</para>
276
      rescheduled.</para>
277
    </section>
277
    </section>
278
  </section>
278
  </section>
279
</chapter>
279
</chapter>