Rev 58 | Rev 61 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
9 | bondari | 1 | <?xml version="1.0" encoding="UTF-8"?> |
57 | jermar | 2 | <chapter id="scheduling"> |
3 | <?dbhtml filename="scheduling.html"?> |
||
9 | bondari | 4 | |
57 | jermar | 5 | <title>Scheduling</title> |
9 | bondari | 6 | |
57 | jermar | 7 | <para>One of the key aims of the operating system is to create and support |
8 | the impression that several activities are executing contemporarily. This is |
||
9 | true for both uniprocessor as well as multiprocessor systems. In the case of |
||
10 | multiprocessor systems, the activities are trully happening in parallel. The |
||
11 | scheduler helps to materialize this impression by planning threads on as |
||
12 | many processors as possible and, where this means reaches its limits, by |
||
13 | quickly switching among threads executing on a single processor.</para> |
||
14 | |||
15 | <section> |
||
16 | <title>Contexts</title> |
||
17 | |||
18 | <para>The term context refers to the set of processor resources that |
||
19 | define the current state of the computation or the environment and the |
||
20 | kernel understands it in several more or less narrow sences:</para> |
||
21 | |||
22 | <itemizedlist> |
||
23 | <listitem> |
||
24 | <para>synchronous register context,</para> |
||
25 | </listitem> |
||
26 | |||
27 | <listitem> |
||
28 | <para>asynchronous register context,</para> |
||
29 | </listitem> |
||
30 | |||
31 | <listitem> |
||
32 | <para>FPU context and</para> |
||
33 | </listitem> |
||
34 | |||
35 | <listitem> |
||
36 | <para>memory management context.</para> |
||
37 | </listitem> |
||
38 | </itemizedlist> |
||
39 | |||
40 | <para>The most narrow sence refers to the the synchronous register |
||
41 | context. It includes all the preserved registers as defined by the |
||
42 | architecture. To highlight some, the program counter and stack pointer |
||
43 | take part in the synchronous register context. These are the registers |
||
44 | that must be preserved across a procedure call and during synchronous |
||
58 | jermar | 45 | context switches.</para> |
57 | jermar | 46 | |
47 | <para>The next type of the context understood by the kernel is the |
||
48 | asynchronous register context. On an interrupt, the interrupted execution |
||
49 | flow's state must be guaranteed to be eventually completely restored. |
||
50 | Therefore the interrupt context includes, among other things, the scratch |
||
51 | registers as defined by the architecture. As a special optimization and if |
||
52 | certain conditions are met, it need not include the architecture's |
||
53 | preserved registers. The condition mentioned in the previous sentence is |
||
54 | that the low-level assembly language interrupt routines don't modify the |
||
55 | preserved registers. The handlers usually call a higher-level C routine. |
||
56 | The preserved registers are then saved on the stack by the compiler |
||
57 | generated code of the higher-level function. In HelenOS, several |
||
58 | architectures can be compiled with this optimization.</para> |
||
59 | |||
60 | <para>Although the kernel does not do any floating point |
||
61 | arithmetics<footnote> |
||
62 | <para>Some architectures (e.g. ia64) inevitably use a fixed set of |
||
63 | floating point registers to carry out its normal operations.</para> |
||
64 | </footnote>, it must protect FPU context of userspace threads against |
||
65 | destruction by other threads. Moreover, only a fraction of userspace |
||
66 | programs use the floating point unit. HelenOS contains a generic framework |
||
67 | for switching FPU context only when the switch is forced.</para> |
||
68 | |||
69 | <para>The last member of the context family is the memory management |
||
70 | context. It includes memory management registers that identify address |
||
71 | spaces on hardware level (i.e. ASIDs and page tables pointers).</para> |
||
72 | |||
9 | bondari | 73 | <section> |
57 | jermar | 74 | <title>Synchronous context switches</title> |
9 | bondari | 75 | |
57 | jermar | 76 | <para>The scheduler, but also other pieces of the kernel, make heavy use |
77 | of synchronous context switches, because it is a natural vehicle not |
||
78 | only for changes in control flow, but also for switching between two |
||
79 | kernel stacks. Two functions figure in a synchronous context switch |
||
80 | implementation: <code>context_save</code> and |
||
81 | <code>context_restore</code>. Note that these two functions break the |
||
82 | natural perception of the linear C code execution flow starting at |
||
83 | function's entry point and ending on one of the function's exit |
||
84 | points.</para> |
||
85 | |||
86 | <para>When the <code>context_save</code> function is called, the |
||
87 | synchronous context is saved in a memory structure passed to it. After |
||
88 | executing <code>context_save</code>, the caller is returned 1 as a |
||
89 | return value. The execution of instructions continues as normally until |
||
90 | <code>context_restore</code> is called. For the caller, it seems like |
||
91 | the call never returns<footnote> |
||
92 | <para>Which might be a source of problems with variable liveliness |
||
93 | after <code>context_restore</code>.</para> |
||
94 | </footnote>. Nevertheless, a synchronous register context, which is |
||
95 | saved in a memory structure passed to <code>context_restore,</code> is |
||
96 | restored, thus transfering the control flow to the place of occurrence |
||
97 | of the corresponding call to <code>context_save</code>. From the |
||
98 | perspective of the caller of the corresponding |
||
99 | <code>context_save</code>, it looks as though a return from |
||
100 | <code>context_save</code>. However, this time a return value of 0 is |
||
101 | returned.</para> |
||
9 | bondari | 102 | </section> |
57 | jermar | 103 | </section> |
9 | bondari | 104 | |
57 | jermar | 105 | <section> |
58 | jermar | 106 | <title>Threads</title> |
9 | bondari | 107 | |
58 | jermar | 108 | <para>A thread is the basic executable entity with some code and stack. |
109 | While the code, implemented by a C language function, can be shared by |
||
110 | several threads, the stack is always private to each instance of the |
||
111 | thread. Each thread belongs to exactly one task through which it shares |
||
112 | address space with its sibling threads. Threads that execute purely in the |
||
113 | kernel don't have any userspace memory allocated. However, when a thread |
||
114 | has ambitions to run in userspace, it must be allocated a userspace stack. |
||
115 | The distinction between the purely kernel threads and threads running also |
||
116 | in userspace is made by refering to the former group as to kernel threads |
||
117 | and to the latter group as to userspace threads. Both kernel and userspace |
||
118 | threads are visible to the scheduler and can become a subject of kernel |
||
119 | preemption and thread migration during times when preemption is |
||
120 | possible.</para> |
||
121 | |||
122 | <para>HelenOS userspace layer knows even smaller units of execution. Each |
||
123 | userspace thread can make use of an arbitrary number of pseudo threads. |
||
124 | These pseudo threads have their own synchronous register context, |
||
125 | userspace code and stack. They live their own life within the userspace |
||
126 | thread and the scheduler does not have any idea about them because they |
||
127 | are completely implemented by the userspace library. This implies several |
||
128 | things:</para> |
||
129 | |||
130 | <itemizedlist> |
||
131 | <listitem> |
||
132 | <para>pseudothreads schedule themselves cooperatively within the time |
||
133 | slice given to their userspace thread,</para> |
||
134 | </listitem> |
||
135 | |||
136 | <listitem> |
||
137 | <para>pseudothreads share FPU context of their containing thread |
||
138 | and</para> |
||
139 | </listitem> |
||
140 | |||
141 | <listitem> |
||
142 | <para>all pseudothreads of one userspace thread block when one of them |
||
143 | goes to sleep.</para> |
||
144 | </listitem> |
||
145 | </itemizedlist> |
||
59 | jermar | 146 | </section> |
58 | jermar | 147 | |
59 | jermar | 148 | <section> |
149 | <title>Scheduler</title> |
||
150 | |||
151 | <section> |
||
152 | <title>Run queues</title> |
||
153 | |||
154 | <para>There is an array of several run queues on each processor. The |
||
155 | current version of HelenOS uses 16 run queues implemented by 16 doubly |
||
156 | linked lists. Each of the run queues is associated with thread priority. |
||
157 | The lower the run queue index in the array is, the higher is the |
||
158 | priority of threads linked in that run queue and the shorter is the time |
||
159 | in which those threads will execute. When kernel code wants to access |
||
160 | the run queue, it must first acquire its lock.</para> |
||
161 | </section> |
||
162 | |||
163 | <section> |
||
164 | <title>Scheduler operation</title> |
||
165 | |||
166 | <para>The scheduler is invoked either explicitly when a thread calls the |
||
167 | <code>scheduler</code> function (e.g. goes to sleep or merely wants to |
||
168 | relinquish the processor for a while) or implicitly on a periodic basis |
||
169 | when the generic clock interrupt preempts the current thread. After its |
||
170 | invocation, the scheduler saves the synchronous register context of the |
||
171 | current thread and switches to its private stack. Afterwards, a new |
||
172 | thread is selected according to the scheduling policy. If there is no |
||
173 | suitable thread, the processor is idle and no thread executes on it. |
||
174 | Note that the act of switching to the private scheduler stack is |
||
175 | essential. If the processor kept running using the stack of the |
||
176 | preempted thread it could damage it because the old thread can be |
||
177 | migrated to another processor and scheduled there. In the worst case |
||
178 | scenario, two execution flows would be using the same stack. </para> |
||
179 | |||
180 | <para>The scheduling policy is implemented in function |
||
181 | <code>find_best_thread</code>. This function walks the processor run |
||
182 | queues from lower towards higher indices and looks for a thread. If the |
||
183 | visited run queue is empty, it simply searches the next run queue. If it |
||
184 | is known in advance that there are no ready threads waiting for |
||
185 | execution, <code>find_best_thread</code> interruptibly halts the |
||
186 | processor or busy waits until some threads arrive. This process repeats |
||
187 | until <code>find_best_thread</code> succeeds.</para> |
||
188 | |||
189 | <para>After the best thread is chosen, the scheduler switches to the |
||
190 | thread's task and memory management context. Finally, the saved |
||
191 | synchronous register context is restored and the thread runs. Each |
||
192 | scheduled thread is given a time slice depending on its priority (i.e. |
||
193 | run queue). The higher priority, the shorter timeslice. To summarize, |
||
194 | this policy schedules threads with high priorities more frequently but |
||
195 | gives them smaller time slices. On the other hand, lower priority |
||
196 | threads are scheduled less frequently, but run for longer periods of |
||
197 | time.</para> |
||
198 | |||
199 | <para>When a thread uses its entire time slice, it is preempted and put |
||
200 | back into the run queue that immediately follows the previous run queue |
||
201 | from which the thread ran. Threads that are woken up from a sleep are |
||
202 | put into the biggest priority run queue. Low priority threads are |
||
203 | therefore those that don't go to sleep so often and just occupy the |
||
204 | processor.</para> |
||
205 | |||
206 | <para>In order to avoid complete starvation of the low priority threads, |
||
207 | from time to time, the scheduler will provide them with a bonus of one |
||
208 | point priority increase. In other words, the scheduler will now and then |
||
209 | move the entire run queues one level up.</para> |
||
210 | </section> |
||
211 | |||
212 | <section> |
||
213 | <title>Processor load balancing</title> |
||
214 | |||
215 | <para>Normally, for the sake of cache locality, threads are scheduled on |
||
216 | one of the processors and don't leave it. Nevertheless, a situation in |
||
217 | which one processor is heavily overloaded while others sit idle can |
||
218 | occur. HelenOS deploys special kernel threads to help to mitigate this |
||
219 | problem. Each processor is associated with one load balancing thread |
||
220 | called <code>kcpulb</code> that wakes up regularily to see whether its |
||
221 | processor is underbalanced or not. If yes, the thread attempts to |
||
222 | migrate threads from other overloaded processors to its own processor's |
||
223 | run queues. When the job is done or there is no need for load balancing, |
||
224 | the thread goes to sleep.</para> |
||
225 | |||
226 | <para>The balancing threads operate very gently and try to migrate low |
||
227 | priority threads first; one <code>kcpulb</code> never takes from one |
||
228 | processor twice in a row. The load balancing threads as well as threads |
||
229 | that were just stolen cannot be migrated. The <code>kcpulb</code> |
||
230 | threads are wired to their processors and cannot be migrated whatsoever. |
||
231 | The ordinary threads are protected only until they are |
||
232 | rescheduled.</para> |
||
233 | </section> |
||
57 | jermar | 234 | </section> |
235 | </chapter> |