Rev 159 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed
| Rev | Author | Line No. | Line |
|---|---|---|---|
| 9 | bondari | 1 | <?xml version="1.0" encoding="UTF-8"?> |
| 57 | jermar | 2 | <chapter id="scheduling"> |
| 3 | <?dbhtml filename="scheduling.html"?> |
||
| 9 | bondari | 4 | |
| 57 | jermar | 5 | <title>Scheduling</title> |
| 9 | bondari | 6 | |
| 57 | jermar | 7 | <para>One of the key aims of the operating system is to create and support |
| 8 | the impression that several activities are executing contemporarily. This is |
||
| 9 | true for both uniprocessor as well as multiprocessor systems. In the case of |
||
| 169 | jermar | 10 | multiprocessor systems, the activities are truly happening in parallel. The |
| 57 | jermar | 11 | scheduler helps to materialize this impression by planning threads on as |
| 142 | jermar | 12 | many processors as possible and, when this strategy reaches its limits, by |
| 57 | jermar | 13 | quickly switching among threads executing on a single processor.</para> |
| 14 | |||
| 15 | <section> |
||
| 16 | <title>Contexts</title> |
||
| 17 | |||
| 140 | palkovsky | 18 | <para>The term <emphasis>context</emphasis> refers to the set of processor |
| 19 | resources that define the current state of the computation or the |
||
| 20 | environment and the kernel understands it in several more or less narrow |
||
| 21 | sences:</para> |
||
| 57 | jermar | 22 | |
| 23 | <itemizedlist> |
||
| 24 | <listitem> |
||
| 25 | <para>synchronous register context,</para> |
||
| 26 | </listitem> |
||
| 27 | |||
| 28 | <listitem> |
||
| 29 | <para>asynchronous register context,</para> |
||
| 30 | </listitem> |
||
| 31 | |||
| 32 | <listitem> |
||
| 33 | <para>FPU context and</para> |
||
| 34 | </listitem> |
||
| 35 | |||
| 36 | <listitem> |
||
| 37 | <para>memory management context.</para> |
||
| 38 | </listitem> |
||
| 39 | </itemizedlist> |
||
| 40 | |||
| 76 | palkovsky | 41 | <para>The most narrow sense refers to the the synchronous register |
| 57 | jermar | 42 | context. It includes all the preserved registers as defined by the |
| 43 | architecture. To highlight some, the program counter and stack pointer |
||
| 76 | palkovsky | 44 | take part in the synchronous register context. These registers must be |
| 45 | preserved across a procedure call and during synchronous context |
||
| 46 | switches.</para> |
||
| 57 | jermar | 47 | |
| 48 | <para>The next type of the context understood by the kernel is the |
||
| 49 | asynchronous register context. On an interrupt, the interrupted execution |
||
| 50 | flow's state must be guaranteed to be eventually completely restored. |
||
| 51 | Therefore the interrupt context includes, among other things, the scratch |
||
| 52 | registers as defined by the architecture. As a special optimization and if |
||
| 53 | certain conditions are met, it need not include the architecture's |
||
| 54 | preserved registers. The condition mentioned in the previous sentence is |
||
| 55 | that the low-level assembly language interrupt routines don't modify the |
||
| 56 | preserved registers. The handlers usually call a higher-level C routine. |
||
| 57 | The preserved registers are then saved on the stack by the compiler |
||
| 58 | generated code of the higher-level function. In HelenOS, several |
||
| 59 | architectures can be compiled with this optimization.</para> |
||
| 60 | |||
| 61 | <para>Although the kernel does not do any floating point |
||
| 62 | arithmetics<footnote> |
||
| 63 | <para>Some architectures (e.g. ia64) inevitably use a fixed set of |
||
| 64 | jermar | 64 | floating point registers to carry out their normal operations.</para> |
| 57 | jermar | 65 | </footnote>, it must protect FPU context of userspace threads against |
| 66 | destruction by other threads. Moreover, only a fraction of userspace |
||
| 67 | programs use the floating point unit. HelenOS contains a generic framework |
||
| 64 | jermar | 68 | for switching FPU context only when the switch is forced (i.e. a thread |
| 69 | uses a floating point instruction and its FPU context is not loaded in the |
||
| 70 | processor).</para> |
||
| 57 | jermar | 71 | |
| 72 | <para>The last member of the context family is the memory management |
||
| 73 | context. It includes memory management registers that identify address |
||
| 74 | spaces on hardware level (i.e. ASIDs and page tables pointers).</para> |
||
| 75 | |||
| 9 | bondari | 76 | <section> |
| 131 | jermar | 77 | <title>Synchronous Context Switches</title> |
| 9 | bondari | 78 | |
| 57 | jermar | 79 | <para>The scheduler, but also other pieces of the kernel, make heavy use |
| 80 | of synchronous context switches, because it is a natural vehicle not |
||
| 81 | only for changes in control flow, but also for switching between two |
||
| 82 | kernel stacks. Two functions figure in a synchronous context switch |
||
| 133 | jermar | 83 | implementation: <code>context_save()</code> and |
| 84 | <code>context_restore()</code>. Note that these two functions break the |
||
| 57 | jermar | 85 | natural perception of the linear C code execution flow starting at |
| 86 | function's entry point and ending on one of the function's exit |
||
| 87 | points.</para> |
||
| 88 | |||
| 133 | jermar | 89 | <para>When the <code>context_save()</code> function is called, the |
| 57 | jermar | 90 | synchronous context is saved in a memory structure passed to it. After |
| 133 | jermar | 91 | executing <code>context_save()</code>, the caller is returned 1 as a |
| 57 | jermar | 92 | return value. The execution of instructions continues as normally until |
| 133 | jermar | 93 | <code>context_restore()</code> is called. For the caller, it seems like |
| 57 | jermar | 94 | the call never returns<footnote> |
| 95 | <para>Which might be a source of problems with variable liveliness |
||
| 133 | jermar | 96 | after <code>context_restore()</code>.</para> |
| 57 | jermar | 97 | </footnote>. Nevertheless, a synchronous register context, which is |
| 133 | jermar | 98 | saved in a memory structure passed to <code>context_restore()</code>, is |
| 57 | jermar | 99 | restored, thus transfering the control flow to the place of occurrence |
| 133 | jermar | 100 | of the corresponding call to <code>context_save()</code>. From the |
| 57 | jermar | 101 | perspective of the caller of the corresponding |
| 133 | jermar | 102 | <code>context_save()</code>, it looks like a return from |
| 103 | <code>context_save()</code>. However, this time a return value of 0 is |
||
| 57 | jermar | 104 | returned.</para> |
| 9 | bondari | 105 | </section> |
| 57 | jermar | 106 | </section> |
| 9 | bondari | 107 | |
| 57 | jermar | 108 | <section> |
| 58 | jermar | 109 | <title>Threads</title> |
| 9 | bondari | 110 | |
| 142 | jermar | 111 | <para>A thread is the basic executable entity with some code and a stack. |
| 58 | jermar | 112 | While the code, implemented by a C language function, can be shared by |
| 113 | several threads, the stack is always private to each instance of the |
||
| 114 | thread. Each thread belongs to exactly one task through which it shares |
||
| 115 | address space with its sibling threads. Threads that execute purely in the |
||
| 116 | kernel don't have any userspace memory allocated. However, when a thread |
||
| 117 | has ambitions to run in userspace, it must be allocated a userspace stack. |
||
| 118 | The distinction between the purely kernel threads and threads running also |
||
| 119 | in userspace is made by refering to the former group as to kernel threads |
||
| 120 | and to the latter group as to userspace threads. Both kernel and userspace |
||
| 121 | threads are visible to the scheduler and can become a subject of kernel |
||
| 140 | palkovsky | 122 | preemption and thread migration anytime when preemption is not |
| 123 | disabled.</para> |
||
| 58 | jermar | 124 | |
| 64 | jermar | 125 | <formalpara> |
| 131 | jermar | 126 | <title>Thread States</title> |
| 62 | jermar | 127 | |
| 75 | jermar | 128 | <para>In each moment, a thread exists in one of six possible thread |
| 140 | palkovsky | 129 | states. When the thread is created and first inserted into the |
| 64 | jermar | 130 | scheduler's run queues or when a thread is migrated to a new processor, |
| 140 | palkovsky | 131 | it is put into the <constant>Entering</constant> state. After some time |
| 132 | elapses, the scheduler picks up the thread and starts executing it. A |
||
| 133 | thread being currently executed on a processor is in the |
||
| 75 | jermar | 134 | <constant>Running</constant> state. From there, the thread has three |
| 135 | possibilities. It either runs until it is preemtped, in which case the |
||
| 136 | state changes to <constant>Ready</constant>, goes to the |
||
| 137 | <constant>Sleeping</constant> state by going to sleep or enters the |
||
| 138 | <constant>Exiting</constant> state when it reaches termination. When the |
||
| 139 | thread exits, its kernel structure usually stays in memory, until the |
||
| 133 | jermar | 140 | thread is detached by another thread using <code>thread_detach()</code> |
| 75 | jermar | 141 | function. Terminated but undetached threads are in the |
| 142 | <constant>Undead</constant> state. When the thread is detached or |
||
| 143 | detaches itself during its life, it is destroyed in the |
||
| 144 | <constant>Exiting</constant> state and the <constant>Undead</constant> |
||
| 101 | bondari | 145 | state is not reached.<figure float="1"> |
| 64 | jermar | 146 | <title>Transitions among thread states.</title> |
| 61 | jermar | 147 | |
| 64 | jermar | 148 | <mediaobject id="thread_states" xreflabel=""> |
| 87 | bondari | 149 | <imageobject role="pdf"> |
| 131 | jermar | 150 | <imagedata fileref="images/thread_states.pdf" format="PDF" /> |
| 77 | bondari | 151 | </imageobject> |
| 152 | |||
| 64 | jermar | 153 | <imageobject role="html"> |
| 154 | <imagedata fileref="images/thread_states.png" format="PNG" /> |
||
| 155 | </imageobject> |
||
| 62 | jermar | 156 | |
| 64 | jermar | 157 | <imageobject role="fop"> |
| 131 | jermar | 158 | <imagedata fileref="images/thread_states.svg" format="SVG" /> |
| 64 | jermar | 159 | </imageobject> |
| 160 | </mediaobject> |
||
| 161 | </figure></para> |
||
| 162 | </formalpara> |
||
| 58 | jermar | 163 | |
| 64 | jermar | 164 | <formalpara> |
| 131 | jermar | 165 | <title>Pseudo Threads</title> |
| 58 | jermar | 166 | |
| 64 | jermar | 167 | <para>HelenOS userspace layer knows even smaller units of execution. |
| 168 | Each userspace thread can make use of an arbitrary number of pseudo |
||
| 169 | threads. These pseudo threads have their own synchronous register |
||
| 170 | context, userspace code and stack. They live their own life within the |
||
| 171 | userspace thread and the scheduler does not have any idea about them |
||
| 172 | because they are completely implemented by the userspace library. This |
||
| 173 | implies several things:<itemizedlist> |
||
| 174 | <listitem> |
||
| 175 | <para>pseudothreads schedule themselves cooperatively within the |
||
| 176 | time slice given to their userspace thread,</para> |
||
| 177 | </listitem> |
||
| 58 | jermar | 178 | |
| 64 | jermar | 179 | <listitem> |
| 180 | <para>pseudothreads share FPU context of their containing thread |
||
| 181 | and</para> |
||
| 182 | </listitem> |
||
| 183 | |||
| 184 | <listitem> |
||
| 185 | <para>all pseudothreads of one userspace thread block when one of |
||
| 186 | them goes to sleep.</para> |
||
| 187 | </listitem> |
||
| 188 | </itemizedlist></para> |
||
| 189 | </formalpara> |
||
| 59 | jermar | 190 | </section> |
| 58 | jermar | 191 | |
| 59 | jermar | 192 | <section> |
| 193 | <title>Scheduler</title> |
||
| 194 | |||
| 195 | <section> |
||
| 131 | jermar | 196 | <title>Run Queues</title> |
| 59 | jermar | 197 | |
| 198 | <para>There is an array of several run queues on each processor. The |
||
| 199 | current version of HelenOS uses 16 run queues implemented by 16 doubly |
||
| 200 | linked lists. Each of the run queues is associated with thread priority. |
||
| 201 | The lower the run queue index in the array is, the higher is the |
||
| 202 | priority of threads linked in that run queue and the shorter is the time |
||
| 203 | in which those threads will execute. When kernel code wants to access |
||
| 204 | the run queue, it must first acquire its lock.</para> |
||
| 205 | </section> |
||
| 206 | |||
| 207 | <section> |
||
| 131 | jermar | 208 | <title>Scheduler Operation</title> |
| 59 | jermar | 209 | |
| 210 | <para>The scheduler is invoked either explicitly when a thread calls the |
||
| 133 | jermar | 211 | <code>scheduler()</code> function (e.g. goes to sleep or merely wants to |
| 59 | jermar | 212 | relinquish the processor for a while) or implicitly on a periodic basis |
| 213 | when the generic clock interrupt preempts the current thread. After its |
||
| 214 | invocation, the scheduler saves the synchronous register context of the |
||
| 215 | current thread and switches to its private stack. Afterwards, a new |
||
| 216 | thread is selected according to the scheduling policy. If there is no |
||
| 217 | suitable thread, the processor is idle and no thread executes on it. |
||
| 218 | Note that the act of switching to the private scheduler stack is |
||
| 219 | essential. If the processor kept running using the stack of the |
||
| 220 | preempted thread it could damage it because the old thread can be |
||
| 221 | migrated to another processor and scheduled there. In the worst case |
||
| 62 | jermar | 222 | scenario, two execution flows would be using the same stack.</para> |
| 59 | jermar | 223 | |
| 140 | palkovsky | 224 | <para>The scheduling policy is implemented in the function |
| 159 | jermar | 225 | <code>find_best_thread()</code>. This function walks the processor run |
| 59 | jermar | 226 | queues from lower towards higher indices and looks for a thread. If the |
| 227 | visited run queue is empty, it simply searches the next run queue. If it |
||
| 228 | is known in advance that there are no ready threads waiting for |
||
| 133 | jermar | 229 | execution, <code>find_best_thread()</code> interruptibly halts the |
| 59 | jermar | 230 | processor or busy waits until some threads arrive. This process repeats |
| 133 | jermar | 231 | until <code>find_best_thread()</code> succeeds.</para> |
| 59 | jermar | 232 | |
| 233 | <para>After the best thread is chosen, the scheduler switches to the |
||
| 234 | thread's task and memory management context. Finally, the saved |
||
| 235 | synchronous register context is restored and the thread runs. Each |
||
| 236 | scheduled thread is given a time slice depending on its priority (i.e. |
||
| 237 | run queue). The higher priority, the shorter timeslice. To summarize, |
||
| 238 | this policy schedules threads with high priorities more frequently but |
||
| 239 | gives them smaller time slices. On the other hand, lower priority |
||
| 240 | threads are scheduled less frequently, but run for longer periods of |
||
| 241 | time.</para> |
||
| 242 | |||
| 243 | <para>When a thread uses its entire time slice, it is preempted and put |
||
| 244 | back into the run queue that immediately follows the previous run queue |
||
| 245 | from which the thread ran. Threads that are woken up from a sleep are |
||
| 246 | put into the biggest priority run queue. Low priority threads are |
||
| 247 | therefore those that don't go to sleep so often and just occupy the |
||
| 248 | processor.</para> |
||
| 249 | |||
| 250 | <para>In order to avoid complete starvation of the low priority threads, |
||
| 251 | from time to time, the scheduler will provide them with a bonus of one |
||
| 252 | point priority increase. In other words, the scheduler will now and then |
||
| 253 | move the entire run queues one level up.</para> |
||
| 254 | </section> |
||
| 255 | |||
| 256 | <section> |
||
| 131 | jermar | 257 | <title>Processor Load Balancing</title> |
| 59 | jermar | 258 | |
| 259 | <para>Normally, for the sake of cache locality, threads are scheduled on |
||
| 260 | one of the processors and don't leave it. Nevertheless, a situation in |
||
| 261 | which one processor is heavily overloaded while others sit idle can |
||
| 76 | palkovsky | 262 | occur. HelenOS deploys special kernel threads to help mitigate this |
| 59 | jermar | 263 | problem. Each processor is associated with one load balancing thread |
| 76 | palkovsky | 264 | called <code>kcpulb</code> that wakes up regularly to see whether its |
| 265 | processor is underbalanced or not. If it is, the thread attempts to |
||
| 59 | jermar | 266 | migrate threads from other overloaded processors to its own processor's |
| 267 | run queues. When the job is done or there is no need for load balancing, |
||
| 268 | the thread goes to sleep.</para> |
||
| 269 | |||
| 270 | <para>The balancing threads operate very gently and try to migrate low |
||
| 271 | priority threads first; one <code>kcpulb</code> never takes from one |
||
| 272 | processor twice in a row. The load balancing threads as well as threads |
||
| 273 | that were just stolen cannot be migrated. The <code>kcpulb</code> |
||
| 274 | threads are wired to their processors and cannot be migrated whatsoever. |
||
| 275 | The ordinary threads are protected only until they are |
||
| 276 | rescheduled.</para> |
||
| 277 | </section> |
||
| 57 | jermar | 278 | </section> |
| 279 | </chapter> |