Rev 62 | Rev 75 | Go to most recent revision | Only display areas with differences | Regard whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 62 | Rev 64 | ||
---|---|---|---|
1 | <?xml version="1.0" encoding="UTF-8"?> |
1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <chapter id="scheduling"> |
2 | <chapter id="scheduling"> |
3 | <?dbhtml filename="scheduling.html"?> |
3 | <?dbhtml filename="scheduling.html"?> |
4 | 4 | ||
5 | <title>Scheduling</title> |
5 | <title>Scheduling</title> |
6 | 6 | ||
7 | <para>One of the key aims of the operating system is to create and support |
7 | <para>One of the key aims of the operating system is to create and support |
8 | the impression that several activities are executing contemporarily. This is |
8 | the impression that several activities are executing contemporarily. This is |
9 | true for both uniprocessor as well as multiprocessor systems. In the case of |
9 | true for both uniprocessor as well as multiprocessor systems. In the case of |
10 | multiprocessor systems, the activities are trully happening in parallel. The |
10 | multiprocessor systems, the activities are trully happening in parallel. The |
11 | scheduler helps to materialize this impression by planning threads on as |
11 | scheduler helps to materialize this impression by planning threads on as |
12 | many processors as possible and, where this means reaches its limits, by |
12 | many processors as possible and, where this means reaches its limits, by |
13 | quickly switching among threads executing on a single processor.</para> |
13 | quickly switching among threads executing on a single processor.</para> |
14 | 14 | ||
15 | <section> |
15 | <section> |
16 | <title>Contexts</title> |
16 | <title>Contexts</title> |
17 | 17 | ||
18 | <para>The term context refers to the set of processor resources that |
18 | <para>The term context refers to the set of processor resources that |
19 | define the current state of the computation or the environment and the |
19 | define the current state of the computation or the environment and the |
20 | kernel understands it in several more or less narrow sences:</para> |
20 | kernel understands it in several more or less narrow sences:</para> |
21 | 21 | ||
22 | <itemizedlist> |
22 | <itemizedlist> |
23 | <listitem> |
23 | <listitem> |
24 | <para>synchronous register context,</para> |
24 | <para>synchronous register context,</para> |
25 | </listitem> |
25 | </listitem> |
26 | 26 | ||
27 | <listitem> |
27 | <listitem> |
28 | <para>asynchronous register context,</para> |
28 | <para>asynchronous register context,</para> |
29 | </listitem> |
29 | </listitem> |
30 | 30 | ||
31 | <listitem> |
31 | <listitem> |
32 | <para>FPU context and</para> |
32 | <para>FPU context and</para> |
33 | </listitem> |
33 | </listitem> |
34 | 34 | ||
35 | <listitem> |
35 | <listitem> |
36 | <para>memory management context.</para> |
36 | <para>memory management context.</para> |
37 | </listitem> |
37 | </listitem> |
38 | </itemizedlist> |
38 | </itemizedlist> |
39 | 39 | ||
40 | <para>The most narrow sence refers to the the synchronous register |
40 | <para>The most narrow sence refers to the the synchronous register |
41 | context. It includes all the preserved registers as defined by the |
41 | context. It includes all the preserved registers as defined by the |
42 | architecture. To highlight some, the program counter and stack pointer |
42 | architecture. To highlight some, the program counter and stack pointer |
43 | take part in the synchronous register context. These are the registers |
43 | take part in the synchronous register context. These are the registers |
44 | that must be preserved across a procedure call and during synchronous |
44 | that must be preserved across a procedure call and during synchronous |
45 | context switches.</para> |
45 | context switches.</para> |
46 | 46 | ||
47 | <para>The next type of the context understood by the kernel is the |
47 | <para>The next type of the context understood by the kernel is the |
48 | asynchronous register context. On an interrupt, the interrupted execution |
48 | asynchronous register context. On an interrupt, the interrupted execution |
49 | flow's state must be guaranteed to be eventually completely restored. |
49 | flow's state must be guaranteed to be eventually completely restored. |
50 | Therefore the interrupt context includes, among other things, the scratch |
50 | Therefore the interrupt context includes, among other things, the scratch |
51 | registers as defined by the architecture. As a special optimization and if |
51 | registers as defined by the architecture. As a special optimization and if |
52 | certain conditions are met, it need not include the architecture's |
52 | certain conditions are met, it need not include the architecture's |
53 | preserved registers. The condition mentioned in the previous sentence is |
53 | preserved registers. The condition mentioned in the previous sentence is |
54 | that the low-level assembly language interrupt routines don't modify the |
54 | that the low-level assembly language interrupt routines don't modify the |
55 | preserved registers. The handlers usually call a higher-level C routine. |
55 | preserved registers. The handlers usually call a higher-level C routine. |
56 | The preserved registers are then saved on the stack by the compiler |
56 | The preserved registers are then saved on the stack by the compiler |
57 | generated code of the higher-level function. In HelenOS, several |
57 | generated code of the higher-level function. In HelenOS, several |
58 | architectures can be compiled with this optimization.</para> |
58 | architectures can be compiled with this optimization.</para> |
59 | 59 | ||
60 | <para>Although the kernel does not do any floating point |
60 | <para>Although the kernel does not do any floating point |
61 | arithmetics<footnote> |
61 | arithmetics<footnote> |
62 | <para>Some architectures (e.g. ia64) inevitably use a fixed set of |
62 | <para>Some architectures (e.g. ia64) inevitably use a fixed set of |
63 | floating point registers to carry out its normal operations.</para> |
63 | floating point registers to carry out their normal operations.</para> |
64 | </footnote>, it must protect FPU context of userspace threads against |
64 | </footnote>, it must protect FPU context of userspace threads against |
65 | destruction by other threads. Moreover, only a fraction of userspace |
65 | destruction by other threads. Moreover, only a fraction of userspace |
66 | programs use the floating point unit. HelenOS contains a generic framework |
66 | programs use the floating point unit. HelenOS contains a generic framework |
67 | for switching FPU context only when the switch is forced.</para> |
67 | for switching FPU context only when the switch is forced (i.e. a thread |
- | 68 | uses a floating point instruction and its FPU context is not loaded in the |
|
- | 69 | processor).</para> |
|
68 | 70 | ||
69 | <para>The last member of the context family is the memory management |
71 | <para>The last member of the context family is the memory management |
70 | context. It includes memory management registers that identify address |
72 | context. It includes memory management registers that identify address |
71 | spaces on hardware level (i.e. ASIDs and page tables pointers).</para> |
73 | spaces on hardware level (i.e. ASIDs and page tables pointers).</para> |
72 | 74 | ||
73 | <section> |
75 | <section> |
74 | <title>Synchronous context switches</title> |
76 | <title>Synchronous context switches</title> |
75 | 77 | ||
76 | <para>The scheduler, but also other pieces of the kernel, make heavy use |
78 | <para>The scheduler, but also other pieces of the kernel, make heavy use |
77 | of synchronous context switches, because it is a natural vehicle not |
79 | of synchronous context switches, because it is a natural vehicle not |
78 | only for changes in control flow, but also for switching between two |
80 | only for changes in control flow, but also for switching between two |
79 | kernel stacks. Two functions figure in a synchronous context switch |
81 | kernel stacks. Two functions figure in a synchronous context switch |
80 | implementation: <code>context_save</code> and |
82 | implementation: <code>context_save</code> and |
81 | <code>context_restore</code>. Note that these two functions break the |
83 | <code>context_restore</code>. Note that these two functions break the |
82 | natural perception of the linear C code execution flow starting at |
84 | natural perception of the linear C code execution flow starting at |
83 | function's entry point and ending on one of the function's exit |
85 | function's entry point and ending on one of the function's exit |
84 | points.</para> |
86 | points.</para> |
85 | 87 | ||
86 | <para>When the <code>context_save</code> function is called, the |
88 | <para>When the <code>context_save</code> function is called, the |
87 | synchronous context is saved in a memory structure passed to it. After |
89 | synchronous context is saved in a memory structure passed to it. After |
88 | executing <code>context_save</code>, the caller is returned 1 as a |
90 | executing <code>context_save</code>, the caller is returned 1 as a |
89 | return value. The execution of instructions continues as normally until |
91 | return value. The execution of instructions continues as normally until |
90 | <code>context_restore</code> is called. For the caller, it seems like |
92 | <code>context_restore</code> is called. For the caller, it seems like |
91 | the call never returns<footnote> |
93 | the call never returns<footnote> |
92 | <para>Which might be a source of problems with variable liveliness |
94 | <para>Which might be a source of problems with variable liveliness |
93 | after <code>context_restore</code>.</para> |
95 | after <code>context_restore</code>.</para> |
94 | </footnote>. Nevertheless, a synchronous register context, which is |
96 | </footnote>. Nevertheless, a synchronous register context, which is |
95 | saved in a memory structure passed to <code>context_restore,</code> is |
97 | saved in a memory structure passed to <code>context_restore,</code> is |
96 | restored, thus transfering the control flow to the place of occurrence |
98 | restored, thus transfering the control flow to the place of occurrence |
97 | of the corresponding call to <code>context_save</code>. From the |
99 | of the corresponding call to <code>context_save</code>. From the |
98 | perspective of the caller of the corresponding |
100 | perspective of the caller of the corresponding |
99 | <code>context_save</code>, it looks as though a return from |
101 | <code>context_save</code>, it looks as though a return from |
100 | <code>context_save</code>. However, this time a return value of 0 is |
102 | <code>context_save</code>. However, this time a return value of 0 is |
101 | returned.</para> |
103 | returned.</para> |
102 | </section> |
104 | </section> |
103 | </section> |
105 | </section> |
104 | 106 | ||
105 | <section> |
107 | <section> |
106 | <title>Threads</title> |
108 | <title>Threads</title> |
107 | 109 | ||
108 | <para>A thread is the basic executable entity with some code and stack. |
110 | <para>A thread is the basic executable entity with some code and stack. |
109 | While the code, implemented by a C language function, can be shared by |
111 | While the code, implemented by a C language function, can be shared by |
110 | several threads, the stack is always private to each instance of the |
112 | several threads, the stack is always private to each instance of the |
111 | thread. Each thread belongs to exactly one task through which it shares |
113 | thread. Each thread belongs to exactly one task through which it shares |
112 | address space with its sibling threads. Threads that execute purely in the |
114 | address space with its sibling threads. Threads that execute purely in the |
113 | kernel don't have any userspace memory allocated. However, when a thread |
115 | kernel don't have any userspace memory allocated. However, when a thread |
114 | has ambitions to run in userspace, it must be allocated a userspace stack. |
116 | has ambitions to run in userspace, it must be allocated a userspace stack. |
115 | The distinction between the purely kernel threads and threads running also |
117 | The distinction between the purely kernel threads and threads running also |
116 | in userspace is made by refering to the former group as to kernel threads |
118 | in userspace is made by refering to the former group as to kernel threads |
117 | and to the latter group as to userspace threads. Both kernel and userspace |
119 | and to the latter group as to userspace threads. Both kernel and userspace |
118 | threads are visible to the scheduler and can become a subject of kernel |
120 | threads are visible to the scheduler and can become a subject of kernel |
119 | preemption and thread migration during times when preemption is |
121 | preemption and thread migration during times when preemption is |
120 | possible.</para> |
122 | possible.</para> |
121 | 123 | ||
122 | <figure> |
124 | <formalpara> |
- | 125 | <title>Thread states</title> |
|
- | 126 | ||
- | 127 | <para>In each moment, a thread exists in one of five possible thread |
|
- | 128 | states. When the thread is created and first readied into the |
|
- | 129 | scheduler's run queues or when a thread is migrated to a new processor, |
|
- | 130 | it is put into the <code>Entering</code> state. After some time, the |
|
- | 131 | scheduler picks up the thread and starts executing it. A thread being |
|
- | 132 | currently executed on a processor is in the <code>Running</code> state. |
|
- | 133 | From there, the thread has three possibilities. It either runs until it |
|
- | 134 | is preemtped, in which case the state changes to <code>Ready</code>, |
|
- | 135 | goes to the <code>Sleeping</code> state by going to sleep or enters the |
|
- | 136 | <code>Exiting</code> state when it reaches termination.<figure> |
|
123 | <title>Transitions among thread states.</title> |
137 | <title>Transitions among thread states.</title> |
124 | 138 | ||
125 | <mediaobject id="thread_states" xreflabel=""> |
139 | <mediaobject id="thread_states" xreflabel=""> |
126 | <imageobject role="html"> |
140 | <imageobject role="html"> |
127 | <imagedata fileref="images/thread_states.png" format="PNG" /> |
141 | <imagedata fileref="images/thread_states.png" format="PNG" /> |
128 | </imageobject> |
142 | </imageobject> |
129 | 143 | ||
130 | <imageobject role="fop"> |
144 | <imageobject role="fop"> |
131 | <imagedata fileref="images.vector/thread_states.svg" format="SVG" /> |
145 | <imagedata fileref="images.vector/thread_states.svg" |
- | 146 | format="SVG" /> |
|
132 | </imageobject> |
147 | </imageobject> |
133 | </mediaobject> |
148 | </mediaobject> |
- | 149 | </figure></para> |
|
134 | </figure> |
150 | </formalpara> |
135 | 151 | ||
136 | <para>HelenOS userspace layer knows even smaller units of execution. Each |
- | |
137 | userspace thread can make use of an arbitrary number of pseudo threads. |
- | |
138 | These pseudo threads have their own synchronous register context, |
- | |
139 | userspace code and stack. They live their own life within the userspace |
- | |
140 | thread and the scheduler does not have any idea about them because they |
- | |
141 | are completely implemented by the userspace library. This implies several |
- | |
142 | things:</para> |
152 | <formalpara> |
- | 153 | <title>Pseudo threads</title> |
|
143 | 154 | ||
- | 155 | <para>HelenOS userspace layer knows even smaller units of execution. |
|
- | 156 | Each userspace thread can make use of an arbitrary number of pseudo |
|
- | 157 | threads. These pseudo threads have their own synchronous register |
|
- | 158 | context, userspace code and stack. They live their own life within the |
|
- | 159 | userspace thread and the scheduler does not have any idea about them |
|
- | 160 | because they are completely implemented by the userspace library. This |
|
144 | <itemizedlist> |
161 | implies several things:<itemizedlist> |
145 | <listitem> |
162 | <listitem> |
146 | <para>pseudothreads schedule themselves cooperatively within the time |
163 | <para>pseudothreads schedule themselves cooperatively within the |
147 | slice given to their userspace thread,</para> |
164 | time slice given to their userspace thread,</para> |
148 | </listitem> |
165 | </listitem> |
149 | 166 | ||
150 | <listitem> |
167 | <listitem> |
151 | <para>pseudothreads share FPU context of their containing thread |
168 | <para>pseudothreads share FPU context of their containing thread |
152 | and</para> |
169 | and</para> |
153 | </listitem> |
170 | </listitem> |
154 | 171 | ||
155 | <listitem> |
172 | <listitem> |
156 | <para>all pseudothreads of one userspace thread block when one of them |
173 | <para>all pseudothreads of one userspace thread block when one of |
157 | goes to sleep.</para> |
174 | them goes to sleep.</para> |
158 | </listitem> |
175 | </listitem> |
159 | </itemizedlist> |
176 | </itemizedlist></para> |
- | 177 | </formalpara> |
|
160 | </section> |
178 | </section> |
161 | 179 | ||
162 | <section> |
180 | <section> |
163 | <title>Scheduler</title> |
181 | <title>Scheduler</title> |
164 | 182 | ||
165 | <section> |
183 | <section> |
166 | <title>Run queues</title> |
184 | <title>Run queues</title> |
167 | 185 | ||
168 | <para>There is an array of several run queues on each processor. The |
186 | <para>There is an array of several run queues on each processor. The |
169 | current version of HelenOS uses 16 run queues implemented by 16 doubly |
187 | current version of HelenOS uses 16 run queues implemented by 16 doubly |
170 | linked lists. Each of the run queues is associated with thread priority. |
188 | linked lists. Each of the run queues is associated with thread priority. |
171 | The lower the run queue index in the array is, the higher is the |
189 | The lower the run queue index in the array is, the higher is the |
172 | priority of threads linked in that run queue and the shorter is the time |
190 | priority of threads linked in that run queue and the shorter is the time |
173 | in which those threads will execute. When kernel code wants to access |
191 | in which those threads will execute. When kernel code wants to access |
174 | the run queue, it must first acquire its lock.</para> |
192 | the run queue, it must first acquire its lock.</para> |
175 | </section> |
193 | </section> |
176 | 194 | ||
177 | <section> |
195 | <section> |
178 | <title>Scheduler operation</title> |
196 | <title>Scheduler operation</title> |
179 | 197 | ||
180 | <para>The scheduler is invoked either explicitly when a thread calls the |
198 | <para>The scheduler is invoked either explicitly when a thread calls the |
181 | <code>scheduler</code> function (e.g. goes to sleep or merely wants to |
199 | <code>scheduler</code> function (e.g. goes to sleep or merely wants to |
182 | relinquish the processor for a while) or implicitly on a periodic basis |
200 | relinquish the processor for a while) or implicitly on a periodic basis |
183 | when the generic clock interrupt preempts the current thread. After its |
201 | when the generic clock interrupt preempts the current thread. After its |
184 | invocation, the scheduler saves the synchronous register context of the |
202 | invocation, the scheduler saves the synchronous register context of the |
185 | current thread and switches to its private stack. Afterwards, a new |
203 | current thread and switches to its private stack. Afterwards, a new |
186 | thread is selected according to the scheduling policy. If there is no |
204 | thread is selected according to the scheduling policy. If there is no |
187 | suitable thread, the processor is idle and no thread executes on it. |
205 | suitable thread, the processor is idle and no thread executes on it. |
188 | Note that the act of switching to the private scheduler stack is |
206 | Note that the act of switching to the private scheduler stack is |
189 | essential. If the processor kept running using the stack of the |
207 | essential. If the processor kept running using the stack of the |
190 | preempted thread it could damage it because the old thread can be |
208 | preempted thread it could damage it because the old thread can be |
191 | migrated to another processor and scheduled there. In the worst case |
209 | migrated to another processor and scheduled there. In the worst case |
192 | scenario, two execution flows would be using the same stack.</para> |
210 | scenario, two execution flows would be using the same stack.</para> |
193 | 211 | ||
194 | <para>The scheduling policy is implemented in function |
212 | <para>The scheduling policy is implemented in function |
195 | <code>find_best_thread</code>. This function walks the processor run |
213 | <code>find_best_thread</code>. This function walks the processor run |
196 | queues from lower towards higher indices and looks for a thread. If the |
214 | queues from lower towards higher indices and looks for a thread. If the |
197 | visited run queue is empty, it simply searches the next run queue. If it |
215 | visited run queue is empty, it simply searches the next run queue. If it |
198 | is known in advance that there are no ready threads waiting for |
216 | is known in advance that there are no ready threads waiting for |
199 | execution, <code>find_best_thread</code> interruptibly halts the |
217 | execution, <code>find_best_thread</code> interruptibly halts the |
200 | processor or busy waits until some threads arrive. This process repeats |
218 | processor or busy waits until some threads arrive. This process repeats |
201 | until <code>find_best_thread</code> succeeds.</para> |
219 | until <code>find_best_thread</code> succeeds.</para> |
202 | 220 | ||
203 | <para>After the best thread is chosen, the scheduler switches to the |
221 | <para>After the best thread is chosen, the scheduler switches to the |
204 | thread's task and memory management context. Finally, the saved |
222 | thread's task and memory management context. Finally, the saved |
205 | synchronous register context is restored and the thread runs. Each |
223 | synchronous register context is restored and the thread runs. Each |
206 | scheduled thread is given a time slice depending on its priority (i.e. |
224 | scheduled thread is given a time slice depending on its priority (i.e. |
207 | run queue). The higher priority, the shorter timeslice. To summarize, |
225 | run queue). The higher priority, the shorter timeslice. To summarize, |
208 | this policy schedules threads with high priorities more frequently but |
226 | this policy schedules threads with high priorities more frequently but |
209 | gives them smaller time slices. On the other hand, lower priority |
227 | gives them smaller time slices. On the other hand, lower priority |
210 | threads are scheduled less frequently, but run for longer periods of |
228 | threads are scheduled less frequently, but run for longer periods of |
211 | time.</para> |
229 | time.</para> |
212 | 230 | ||
213 | <para>When a thread uses its entire time slice, it is preempted and put |
231 | <para>When a thread uses its entire time slice, it is preempted and put |
214 | back into the run queue that immediately follows the previous run queue |
232 | back into the run queue that immediately follows the previous run queue |
215 | from which the thread ran. Threads that are woken up from a sleep are |
233 | from which the thread ran. Threads that are woken up from a sleep are |
216 | put into the biggest priority run queue. Low priority threads are |
234 | put into the biggest priority run queue. Low priority threads are |
217 | therefore those that don't go to sleep so often and just occupy the |
235 | therefore those that don't go to sleep so often and just occupy the |
218 | processor.</para> |
236 | processor.</para> |
219 | 237 | ||
220 | <para>In order to avoid complete starvation of the low priority threads, |
238 | <para>In order to avoid complete starvation of the low priority threads, |
221 | from time to time, the scheduler will provide them with a bonus of one |
239 | from time to time, the scheduler will provide them with a bonus of one |
222 | point priority increase. In other words, the scheduler will now and then |
240 | point priority increase. In other words, the scheduler will now and then |
223 | move the entire run queues one level up.</para> |
241 | move the entire run queues one level up.</para> |
224 | </section> |
242 | </section> |
225 | 243 | ||
226 | <section> |
244 | <section> |
227 | <title>Processor load balancing</title> |
245 | <title>Processor load balancing</title> |
228 | 246 | ||
229 | <para>Normally, for the sake of cache locality, threads are scheduled on |
247 | <para>Normally, for the sake of cache locality, threads are scheduled on |
230 | one of the processors and don't leave it. Nevertheless, a situation in |
248 | one of the processors and don't leave it. Nevertheless, a situation in |
231 | which one processor is heavily overloaded while others sit idle can |
249 | which one processor is heavily overloaded while others sit idle can |
232 | occur. HelenOS deploys special kernel threads to help to mitigate this |
250 | occur. HelenOS deploys special kernel threads to help to mitigate this |
233 | problem. Each processor is associated with one load balancing thread |
251 | problem. Each processor is associated with one load balancing thread |
234 | called <code>kcpulb</code> that wakes up regularily to see whether its |
252 | called <code>kcpulb</code> that wakes up regularily to see whether its |
235 | processor is underbalanced or not. If yes, the thread attempts to |
253 | processor is underbalanced or not. If yes, the thread attempts to |
236 | migrate threads from other overloaded processors to its own processor's |
254 | migrate threads from other overloaded processors to its own processor's |
237 | run queues. When the job is done or there is no need for load balancing, |
255 | run queues. When the job is done or there is no need for load balancing, |
238 | the thread goes to sleep.</para> |
256 | the thread goes to sleep.</para> |
239 | 257 | ||
240 | <para>The balancing threads operate very gently and try to migrate low |
258 | <para>The balancing threads operate very gently and try to migrate low |
241 | priority threads first; one <code>kcpulb</code> never takes from one |
259 | priority threads first; one <code>kcpulb</code> never takes from one |
242 | processor twice in a row. The load balancing threads as well as threads |
260 | processor twice in a row. The load balancing threads as well as threads |
243 | that were just stolen cannot be migrated. The <code>kcpulb</code> |
261 | that were just stolen cannot be migrated. The <code>kcpulb</code> |
244 | threads are wired to their processors and cannot be migrated whatsoever. |
262 | threads are wired to their processors and cannot be migrated whatsoever. |
245 | The ordinary threads are protected only until they are |
263 | The ordinary threads are protected only until they are |
246 | rescheduled.</para> |
264 | rescheduled.</para> |
247 | </section> |
265 | </section> |
248 | </section> |
266 | </section> |
249 | </chapter> |
267 | </chapter> |