Rev 69 | Rev 71 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
9 | bondari | 1 | <?xml version="1.0" encoding="UTF-8"?> |
11 | bondari | 2 | <chapter id="mm"> |
3 | <?dbhtml filename="mm.html"?> |
||
9 | bondari | 4 | |
11 | bondari | 5 | <title>Memory management</title> |
9 | bondari | 6 | |
67 | jermar | 7 | <para>In previous chapters, this book described the scheduling subsystem as |
8 | the creator of the impression that threads execute in parallel. The memory |
||
9 | management subsystem, on the other hand, creates the impression that there |
||
10 | is enough physical memory for the kernel and that userspace tasks have the |
||
11 | entire address space only for themselves.</para> |
||
64 | jermar | 12 | |
26 | bondari | 13 | <section> |
64 | jermar | 14 | <title>Physical memory management</title> |
15 | |||
16 | <section id="zones_and_frames"> |
||
17 | <title>Zones and frames</title> |
||
18 | |||
67 | jermar | 19 | <para>HelenOS represents continuous areas of physical memory in |
20 | structures called frame zones (abbreviated as zones). Each zone contains |
||
21 | information about the number of allocated and unallocated physical |
||
22 | memory frames as well as the physical base address of the zone and |
||
23 | number of frames contained in it. A zone also contains an array of frame |
||
24 | structures describing each frame of the zone and, in the last, but not |
||
25 | the least important, front, each zone is equipped with a buddy system |
||
26 | that faciliates effective allocation of power-of-two sized block of |
||
27 | frames.</para> |
||
64 | jermar | 28 | |
67 | jermar | 29 | <para>This organization of physical memory provides good preconditions |
30 | for hot-plugging of more zones. There is also one currently unused zone |
||
31 | attribute: <code>flags</code>. The attribute could be used to give a |
||
32 | special meaning to some zones in the future.</para> |
||
64 | jermar | 33 | |
67 | jermar | 34 | <para>The zones are linked in a doubly-linked list. This might seem a |
35 | bit ineffective because the zone list is walked everytime a frame is |
||
36 | allocated or deallocated. However, this does not represent a significant |
||
37 | performance problem as it is expected that the number of zones will be |
||
38 | rather low. Moreover, most architectures merge all zones into |
||
39 | one.</para> |
||
40 | |||
41 | <para>For each physical memory frame found in a zone, there is a frame |
||
42 | structure that contains number of references and data used by buddy |
||
43 | system.</para> |
||
64 | jermar | 44 | </section> |
45 | |||
46 | <section id="frame_allocator"> |
||
47 | <title>Frame allocator</title> |
||
48 | |||
67 | jermar | 49 | <para>The frame allocator satisfies kernel requests to allocate |
50 | power-of-two sized blocks of physical memory. Because of zonal |
||
51 | organization of physical memory, the frame allocator is always working |
||
52 | within a context of some frame zone. In order to carry out the |
||
53 | allocation requests, the frame allocator is tightly integrated with the |
||
54 | buddy system belonging to the zone. The frame allocator is also |
||
55 | responsible for updating information about the number of free and busy |
||
56 | frames in the zone. <figure> |
||
57 | <mediaobject id="frame_alloc"> |
||
58 | <imageobject role="html"> |
||
59 | <imagedata fileref="images/frame_alloc.png" format="PNG" /> |
||
60 | </imageobject> |
||
64 | jermar | 61 | |
67 | jermar | 62 | <imageobject role="fop"> |
63 | <imagedata fileref="images.vector/frame_alloc.svg" format="SVG" /> |
||
64 | </imageobject> |
||
65 | </mediaobject> |
||
64 | jermar | 66 | |
67 | jermar | 67 | <title>Frame allocator scheme.</title> |
68 | </figure></para> |
||
64 | jermar | 69 | |
70 | <formalpara> |
||
71 | <title>Allocation / deallocation</title> |
||
72 | |||
67 | jermar | 73 | <para>Upon allocation request via function <code>frame_alloc</code>, |
74 | the frame allocator first tries to find a zone that can satisfy the |
||
75 | request (i.e. has the required amount of free frames). Once a suitable |
||
76 | zone is found, the frame allocator uses the buddy allocator on the |
||
77 | zone's buddy system to perform the allocation. During deallocation, |
||
78 | which is triggered by a call to <code>frame_free</code>, the frame |
||
79 | allocator looks up the respective zone that contains the frame being |
||
80 | deallocated. Afterwards, it calls the buddy allocator again, this time |
||
81 | to take care of deallocation within the zone's buddy system.</para> |
||
64 | jermar | 82 | </formalpara> |
83 | </section> |
||
84 | |||
85 | <section id="buddy_allocator"> |
||
86 | <title>Buddy allocator</title> |
||
87 | |||
67 | jermar | 88 | <para>In the buddy system, the memory is broken down into power-of-two |
89 | sized naturally aligned blocks. These blocks are organized in an array |
||
90 | of lists, in which the list with index i contains all unallocated blocks |
||
91 | of size <mathphrase>2<superscript>i</superscript></mathphrase>. The |
||
92 | index i is called the order of block. Should there be two adjacent |
||
93 | equally sized blocks in the list i<mathphrase />(i.e. buddies), the |
||
94 | buddy allocator would coalesce them and put the resulting block in list |
||
95 | <mathphrase>i + 1</mathphrase>, provided that the resulting block would |
||
96 | be naturally aligned. Similarily, when the allocator is asked to |
||
97 | allocate a block of size |
||
98 | <mathphrase>2<superscript>i</superscript></mathphrase>, it first tries |
||
99 | to satisfy the request from the list with index i. If the request cannot |
||
100 | be satisfied (i.e. the list i is empty), the buddy allocator will try to |
||
101 | allocate and split a larger block from the list with index i + 1. Both |
||
102 | of these algorithms are recursive. The recursion ends either when there |
||
103 | are no blocks to coalesce in the former case or when there are no blocks |
||
104 | that can be split in the latter case.</para> |
||
64 | jermar | 105 | |
67 | jermar | 106 | <para>This approach greatly reduces external fragmentation of memory and |
107 | helps in allocating bigger continuous blocks of memory aligned to their |
||
108 | size. On the other hand, the buddy allocator suffers increased internal |
||
109 | fragmentation of memory and is not suitable for general kernel |
||
110 | allocations. This purpose is better addressed by the <link |
||
111 | linkend="slab">slab allocator</link>.<figure> |
||
64 | jermar | 112 | <mediaobject id="buddy_alloc"> |
113 | <imageobject role="html"> |
||
114 | <imagedata fileref="images/buddy_alloc.png" format="PNG" /> |
||
115 | </imageobject> |
||
116 | |||
117 | <imageobject role="fop"> |
||
118 | <imagedata fileref="images.vector/buddy_alloc.svg" format="SVG" /> |
||
119 | </imageobject> |
||
120 | </mediaobject> |
||
121 | |||
122 | <title>Buddy system scheme.</title> |
||
67 | jermar | 123 | </figure></para> |
64 | jermar | 124 | |
125 | <section> |
||
126 | <title>Implementation</title> |
||
127 | |||
128 | <para>The buddy allocator is, in fact, an abstract framework wich can |
||
129 | be easily specialized to serve one particular task. It knows nothing |
||
130 | about the nature of memory it helps to allocate. In order to beat the |
||
131 | lack of this knowledge, the buddy allocator exports an interface that |
||
132 | each of its clients is required to implement. When supplied with an |
||
133 | implementation of this interface, the buddy allocator can use |
||
134 | specialized external functions to find a buddy for a block, split and |
||
135 | coalesce blocks, manipulate block order and mark blocks busy or |
||
67 | jermar | 136 | available.</para> |
64 | jermar | 137 | |
138 | <formalpara> |
||
139 | <title>Data organization</title> |
||
140 | |||
141 | <para>Each entity allocable by the buddy allocator is required to |
||
142 | contain space for storing block order number and a link variable |
||
143 | used to interconnect blocks within the same order.</para> |
||
144 | |||
145 | <para>Whatever entities are allocated by the buddy allocator, the |
||
146 | first entity within a block is used to represent the entire block. |
||
147 | The first entity keeps the order of the whole block. Other entities |
||
148 | within the block are assigned the magic value |
||
149 | <constant>BUDDY_INNER_BLOCK</constant>. This is especially important |
||
150 | for effective identification of buddies in a one-dimensional array |
||
151 | because the entity that represents a potential buddy cannot be |
||
152 | associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it |
||
153 | is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is |
||
154 | not a buddy).</para> |
||
155 | </formalpara> |
||
156 | </section> |
||
157 | </section> |
||
158 | |||
159 | <section id="slab"> |
||
70 | jermar | 160 | <title>Slab allocator</title> |
64 | jermar | 161 | |
67 | jermar | 162 | <para>The majority of memory allocation requests in the kernel is for |
163 | small, frequently used data structures. The basic idea behind the slab |
||
164 | allocator is that commonly used objects are preallocated in continuous |
||
165 | areas of physical memory called slabs<footnote> |
||
166 | <para>Slabs are in fact blocks of physical memory frames allocated |
||
167 | from the frame allocator.</para> |
||
168 | </footnote>. Whenever an object is to be allocated, the slab allocator |
||
169 | returns the first available item from a suitable slab corresponding to |
||
170 | the object type<footnote> |
||
171 | <para>The mechanism is rather more complicated, see the next |
||
172 | paragraph.</para> |
||
173 | </footnote>. Due to the fact that the sizes of the requested and |
||
174 | allocated object match, the slab allocator significantly reduces |
||
175 | internal fragmentation.</para> |
||
64 | jermar | 176 | |
67 | jermar | 177 | <para>Slabs of one object type are organized in a structure called slab |
178 | cache. There are ususally more slabs in the slab cache, depending on |
||
179 | previous allocations. If the the slab cache runs out of available slabs, |
||
180 | new slabs are allocated. In order to exploit parallelism and to avoid |
||
181 | locking of shared spinlocks, slab caches can have variants of |
||
182 | processor-private slabs called magazines. On each processor, there is a |
||
183 | two-magazine cache. Full magazines that are not part of any |
||
184 | per-processor magazine cache are stored in a global list of full |
||
185 | magazines.</para> |
||
64 | jermar | 186 | |
67 | jermar | 187 | <para>Each object begins its life in a slab. When it is allocated from |
188 | there, the slab allocator calls a constructor that is registered in the |
||
189 | respective slab cache. The constructor initializes and brings the object |
||
190 | into a known state. The object is then used by the user. When the user |
||
191 | later frees the object, the slab allocator puts it into a processor |
||
192 | private magazine cache, from where it can be precedently allocated |
||
193 | again. Note that allocations satisfied from a magazine are already |
||
194 | initialized by the constructor. When both of the processor cached |
||
195 | magazines get full, the allocator will move one of the magazines to the |
||
196 | list of full magazines. Similarily, when allocating from an empty |
||
197 | processor magazine cache, the kernel will reload only one magazine from |
||
198 | the list of full magazines. In other words, the slab allocator tries to |
||
199 | keep the processor magazine cache only half-full in order to prevent |
||
200 | thrashing when allocations and deallocations interleave on magazine |
||
70 | jermar | 201 | boundaries. The advantage of this setup is that during most of the |
202 | allocations, no global spinlock needs to be held.</para> |
||
65 | jermar | 203 | |
67 | jermar | 204 | <para>Should HelenOS run short of memory, it would start deallocating |
205 | objects from magazines, calling slab cache destructor on them and |
||
206 | putting them back into slabs. When a slab contanins no allocated object, |
||
207 | it is immediately freed.</para> |
||
66 | bondari | 208 | |
67 | jermar | 209 | <para><figure> |
64 | jermar | 210 | <mediaobject id="slab_alloc"> |
211 | <imageobject role="html"> |
||
212 | <imagedata fileref="images/slab_alloc.png" format="PNG" /> |
||
213 | </imageobject> |
||
214 | </mediaobject> |
||
215 | |||
216 | <title>Slab allocator scheme.</title> |
||
67 | jermar | 217 | </figure></para> |
64 | jermar | 218 | |
67 | jermar | 219 | <section> |
220 | <title>Implementation</title> |
||
221 | |||
222 | <para>The slab allocator is closely modelled after OpenSolaris slab |
||
223 | allocator by Jeff Bonwick and Jonathan Adams with the following |
||
224 | exceptions:<itemizedlist> |
||
64 | jermar | 225 | <listitem> |
70 | jermar | 226 | empty slabs are immediately deallocated and |
64 | jermar | 227 | </listitem> |
66 | bondari | 228 | |
229 | <listitem> |
||
70 | jermar | 230 | <para>empty magazines are deallocated when not needed.</para> |
66 | bondari | 231 | </listitem> |
70 | jermar | 232 | </itemizedlist>The following features are not currently supported |
233 | but would be easy to do: <itemizedlist> |
||
64 | jermar | 234 | <listitem> |
70 | jermar | 235 | cache coloring and |
64 | jermar | 236 | </listitem> |
237 | |||
238 | <listitem> |
||
70 | jermar | 239 | dynamic magazine grow (different magazine sizes are already supported, but the allocation strategy would need to be adjusted). |
64 | jermar | 240 | </listitem> |
241 | </itemizedlist></para> |
||
242 | |||
243 | <section> |
||
244 | <title>Allocation/deallocation</title> |
||
245 | |||
70 | jermar | 246 | <para>The following two paragraphs summarize and complete the |
247 | description of the slab allocator operation (i.e. |
||
248 | <code>slab_alloc</code> and <code>slab_free</code> |
||
249 | operations).</para> |
||
64 | jermar | 250 | |
251 | <formalpara> |
||
252 | <title>Allocation</title> |
||
253 | |||
70 | jermar | 254 | <para><emphasis>Step 1.</emphasis> When an allocation request |
255 | comes, the slab allocator checks availability of memory in the |
||
256 | current magazine of the local processor magazine cache. If the |
||
257 | available memory is there, the allocator just pops the magazine |
||
258 | and returns pointer to the object.</para> |
||
64 | jermar | 259 | |
70 | jermar | 260 | <para><emphasis>Step 2.</emphasis> If the current magazine in the |
261 | processor magazine cache is empty, the allocator will attempt to |
||
262 | swap it with the last magazine from the cache and return to the |
||
263 | first step. If also the last magazine is empty, the algorithm will |
||
264 | fall through to Step 3.</para> |
||
64 | jermar | 265 | |
70 | jermar | 266 | <para><emphasis>Step 3.</emphasis> Now the allocator is in the |
267 | situation when both magazines in the processor magazine cache are |
||
268 | empty. The allocator reloads one magazine from the shared list of |
||
269 | full magazines. If the reload is successful (i.e. there are full |
||
270 | magazines in the list), the algorithm continues with Step |
||
271 | 1.</para> |
||
64 | jermar | 272 | |
70 | jermar | 273 | <para><emphasis>Step 4.</emphasis> In this fail-safe step, an |
274 | object is allocated from the conventional slab layer and a pointer |
||
275 | to it is returned. If also the last magazine is full,</para> |
||
64 | jermar | 276 | </formalpara> |
277 | |||
278 | <formalpara> |
||
279 | <title>Deallocation</title> |
||
280 | |||
70 | jermar | 281 | <para><emphasis>Step 1.</emphasis> During a deallocation request, |
282 | the slab allocator checks if the current magazine of the local |
||
283 | processor magazine cache is not full. If yes, the pointer to the |
||
284 | objects is just pushed into the magazine and the algorithm |
||
285 | returns.</para> |
||
64 | jermar | 286 | |
70 | jermar | 287 | <para><emphasis>Step 2.</emphasis> If the current magazine is |
288 | full, the allocator will attempt to swap it with the last magazine |
||
289 | from the cache and return to the first step. If also the last |
||
290 | magazine is empty, the algorithm will fall through to Step |
||
291 | 3.</para> |
||
64 | jermar | 292 | |
70 | jermar | 293 | <para><emphasis>Step 3.</emphasis> Now the allocator is in the |
294 | situation when both magazines in the processor magazine cache are |
||
295 | full. The allocator moves one magazine to the shared list of full |
||
296 | magazines. The algoritm continues with Step 1.</para> |
||
64 | jermar | 297 | </formalpara> |
298 | </section> |
||
299 | </section> |
||
300 | </section> |
||
301 | </section> |
||
302 | |||
303 | <section> |
||
67 | jermar | 304 | <title>Virtual memory management</title> |
9 | bondari | 305 | |
67 | jermar | 306 | <section> |
307 | <title>Introduction</title> |
||
308 | |||
309 | <para>Virtual memory is a special memory management technique, used by |
||
310 | kernel to achieve a bunch of mission critical goals. <itemizedlist> |
||
311 | <listitem> |
||
312 | Isolate each task from other tasks that are running on the system at the same time. |
||
313 | </listitem> |
||
314 | |||
315 | <listitem> |
||
316 | Allow to allocate more memory, than is actual physical memory size of the machine. |
||
317 | </listitem> |
||
318 | |||
319 | <listitem> |
||
320 | Allowing, in general, to load and execute two programs that are linked on the same address without complicated relocations. |
||
321 | </listitem> |
||
322 | </itemizedlist></para> |
||
323 | |||
324 | <para><!-- |
||
325 | |||
70 | jermar | 326 | TLB shootdown ASID/ASID:PAGE/ALL. |
327 | TLB shootdown requests can come in asynchroniously |
||
328 | so there is a cache of TLB shootdown requests. Upon cache overflow TLB shootdown ALL is executed |
||
329 | |||
330 | |||
67 | jermar | 331 | <para> |
332 | Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc). |
||
333 | Special address space area type - device - prohibits shrink/extend syscalls to call on it. |
||
334 | Address space has link to mapping tables (hierarchical - per Address space, hash - global tables). |
||
335 | </para> |
||
336 | |||
337 | --></para> |
||
338 | </section> |
||
339 | |||
340 | <section> |
||
70 | jermar | 341 | <title>Paging</title> |
342 | |||
343 | <para>Virtual memory is usually using paged memory model, where virtual |
||
344 | memory address space is divided into the <emphasis>pages</emphasis> |
||
345 | (usually having size 4096 bytes) and physical memory is divided into the |
||
346 | frames (same sized as a page, of course). Each page may be mapped to |
||
347 | some frame and then, upon memory access to the virtual address, CPU |
||
348 | performs <emphasis>address translation</emphasis> during the instruction |
||
349 | execution. Non-existing mapping generates page fault exception, calling |
||
350 | kernel exception handler, thus allowing kernel to manipulate rules of |
||
351 | memory access. Information for pages mapping is stored by kernel in the |
||
352 | <link linkend="page_tables">page tables</link></para> |
||
353 | |||
354 | <para>The majority of the architectures use multi-level page tables, |
||
355 | which means need to access physical memory several times before getting |
||
356 | physical address. This fact would make serios performance overhead in |
||
357 | virtual memory management. To avoid this <link linkend="tlb">Traslation |
||
358 | Lookaside Buffer (TLB)</link> is used.</para> |
||
359 | </section> |
||
360 | |||
361 | <section> |
||
67 | jermar | 362 | <title>Address spaces</title> |
363 | |||
364 | <section> |
||
70 | jermar | 365 | <title>Address space areas</title> |
67 | jermar | 366 | |
367 | <para>Each address space consists of mutually disjunctive continuous |
||
368 | address space areas. Address space area is precisely defined by its |
||
369 | base address and the number of frames/pages is contains.</para> |
||
370 | |||
371 | <para>Address space area , that define behaviour and permissions on |
||
372 | the particular area. <itemizedlist> |
||
373 | <listitem> |
||
374 | |||
375 | |||
376 | <emphasis>AS_AREA_READ</emphasis> |
||
377 | |||
378 | flag indicates reading permission. |
||
379 | </listitem> |
||
380 | |||
381 | <listitem> |
||
382 | |||
383 | |||
384 | <emphasis>AS_AREA_WRITE</emphasis> |
||
385 | |||
386 | flag indicates writing permission. |
||
387 | </listitem> |
||
388 | |||
389 | <listitem> |
||
390 | |||
391 | |||
392 | <emphasis>AS_AREA_EXEC</emphasis> |
||
393 | |||
394 | flag indicates code execution permission. Some architectures do not support execution persmission restriction. In this case this flag has no effect. |
||
395 | </listitem> |
||
396 | |||
397 | <listitem> |
||
398 | |||
399 | |||
400 | <emphasis>AS_AREA_DEVICE</emphasis> |
||
401 | |||
402 | marks area as mapped to the device memory. |
||
403 | </listitem> |
||
404 | </itemizedlist></para> |
||
405 | |||
406 | <para>Kernel provides possibility tasks create/expand/shrink/share its |
||
407 | address space via the set of syscalls.</para> |
||
408 | </section> |
||
409 | |||
410 | <section> |
||
411 | <title>Address Space ID (ASID)</title> |
||
412 | |||
413 | <para>When switching to the different task, kernel also require to |
||
414 | switch mappings to the different address space. In case TLB cannot |
||
415 | distinguish address space mappings, all mapping information in TLB |
||
416 | from the old address space must be flushed, which can create certain |
||
417 | uncessary overhead during the task switching. To avoid this, some |
||
418 | architectures have capability to segregate different address spaces on |
||
419 | hardware level introducing the address space identifier as a part of |
||
420 | TLB record, telling the virtual address space translation unit to |
||
421 | which address space this record is applicable.</para> |
||
422 | |||
423 | <para>HelenOS kernel can take advantage of this hardware supported |
||
424 | identifier by having an ASID abstraction which is somehow related to |
||
425 | the corresponding architecture identifier. I.e. on ia64 kernel ASID is |
||
426 | derived from RID (region identifier) and on the mips32 kernel ASID is |
||
427 | actually the hardware identifier. As expected, this ASID information |
||
428 | record is the part of <emphasis>as_t</emphasis> structure.</para> |
||
429 | |||
430 | <para>Due to the hardware limitations, hardware ASID has limited |
||
431 | length from 8 bits on ia64 to 24 bits on mips32, which makes it |
||
432 | impossible to use it as unique address space identifier for all tasks |
||
433 | running in the system. In such situations special ASID stealing |
||
434 | algoritm is used, which takes ASID from inactive task and assigns it |
||
435 | to the active task.</para> |
||
436 | |||
437 | <para><classname>ASID stealing algoritm here.</classname></para> |
||
438 | </section> |
||
439 | </section> |
||
440 | |||
441 | <section> |
||
442 | <title>Virtual address translation</title> |
||
443 | |||
70 | jermar | 444 | <section id="page_tables"> |
445 | <title>Page tables</title> |
||
67 | jermar | 446 | |
70 | jermar | 447 | <para>HelenOS kernel has two different approaches to the paging |
448 | implementation: <emphasis>4 level page tables</emphasis> and |
||
449 | <emphasis>global hash tables</emphasis>, which are accessible via |
||
450 | generic paging abstraction layer. Such different functionality was |
||
451 | caused by the major architectural differences between supported |
||
452 | platforms. This abstraction is implemented with help of the global |
||
453 | structure of pointers to basic mapping functions |
||
454 | <emphasis>page_mapping_operations</emphasis>. To achieve different |
||
455 | functionality of page tables, corresponding layer must implement |
||
456 | functions, declared in |
||
457 | <emphasis>page_mapping_operations</emphasis></para> |
||
67 | jermar | 458 | |
70 | jermar | 459 | <formalpara> |
460 | <title>4-level page tables</title> |
||
67 | jermar | 461 | |
70 | jermar | 462 | <para>4-level page tables are the generalization of the hardware |
463 | capabilities of several architectures.<itemizedlist> |
||
67 | jermar | 464 | <listitem> |
465 | ia32 uses 2-level page tables, with full hardware support. |
||
466 | </listitem> |
||
467 | |||
468 | <listitem> |
||
469 | amd64 uses 4-level page tables, also coming with full hardware support. |
||
470 | </listitem> |
||
471 | |||
472 | <listitem> |
||
473 | mips and ppc32 have 2-level tables, software simulated support. |
||
474 | </listitem> |
||
475 | </itemizedlist></para> |
||
70 | jermar | 476 | </formalpara> |
67 | jermar | 477 | |
70 | jermar | 478 | <formalpara> |
479 | <title>Global hash tables</title> |
||
67 | jermar | 480 | |
70 | jermar | 481 | <para>- global page hash table: existuje jen jedna v celem systemu |
482 | (vyuziva ji ia64), pozn. ia64 ma zatim vypnuty VHPT. Pouziva se |
||
483 | genericke hash table s oddelenymi collision chains. ASID support is |
||
484 | required to use global hash tables.</para> |
||
485 | </formalpara> |
||
67 | jermar | 486 | |
70 | jermar | 487 | <para>Thanks to the abstract paging interface, there is possibility |
488 | left have more paging implementations, for example B-Tree page |
||
489 | tables.</para> |
||
67 | jermar | 490 | </section> |
491 | |||
492 | <section id="tlb"> |
||
493 | <title>Translation Lookaside buffer</title> |
||
494 | |||
495 | <para>Due to the extensive overhead during the page mapping lookup in |
||
496 | the page tables, all architectures has fast assotiative cache memory |
||
497 | built-in CPU. This memory called TLB stores recently used page table |
||
498 | entries.</para> |
||
499 | |||
500 | <section id="tlb_shootdown"> |
||
501 | <title>TLB consistency. TLB shootdown algorithm.</title> |
||
502 | |||
503 | <para>Operating system is responsible for keeping TLB consistent by |
||
504 | invalidating the contents of TLB, whenever there is some change in |
||
505 | page tables. Those changes may occur when page or group of pages |
||
506 | were unmapped, mapping is changed or system switching active address |
||
70 | jermar | 507 | space to schedule a new system task (which is a batch unmap of all |
508 | address space mappings). Moreover, this invalidation operation must |
||
509 | be done an all system CPUs because each CPU has its own independent |
||
510 | TLB cache. Thus maintaining TLB consistency on SMP configuration as |
||
511 | not as trivial task as it looks at the first glance. Naive solution |
||
512 | would assume remote TLB invalidatation, which is not possible on the |
||
513 | most of the architectures, because of the simple fact - flushing TLB |
||
514 | is allowed only on the local CPU and there is no possibility to |
||
515 | access other CPUs' TLB caches.</para> |
||
67 | jermar | 516 | |
517 | <para>Technique of remote invalidation of TLB entries is called "TLB |
||
518 | shootdown". HelenOS uses a variation of the algorithm described by |
||
519 | D. Black et al., "Translation Lookaside Buffer Consistency: A |
||
520 | Software Approach," Proc. Third Int'l Conf. Architectural Support |
||
521 | for Programming Languages and Operating Systems, 1989, pp. |
||
522 | 113-122.</para> |
||
523 | |||
524 | <para>As the situation demands, you will want partitial invalidation |
||
525 | of TLB caches. In case of simple memory mapping change it is |
||
526 | necessary to invalidate only one or more adjacent pages. In case if |
||
70 | jermar | 527 | the architecture is aware of ASIDs, during the address space |
528 | switching, kernel invalidates only entries from this particular |
||
529 | address space. Final option of the TLB invalidation is the complete |
||
530 | TLB cache invalidation, which is the operation that flushes all |
||
531 | entries in TLB.</para> |
||
67 | jermar | 532 | |
70 | jermar | 533 | <para>TLB shootdown is performed in two phases. First, the initiator |
534 | process sends an IPI message indicating the TLB shootdown request to |
||
535 | the rest of the CPUs. Then, it waits until all CPUs confirm TLB |
||
536 | invalidating action execution.</para> |
||
537 | </section> |
||
538 | </section> |
||
539 | </section> |
||
67 | jermar | 540 | |
70 | jermar | 541 | <section> |
542 | <title>---</title> |
||
67 | jermar | 543 | |
70 | jermar | 544 | <para>At the moment HelenOS does not support swapping.</para> |
67 | jermar | 545 | |
70 | jermar | 546 | <para>- pouzivame vypadky stranky k alokaci ramcu on-demand v ramci |
547 | as_area - na architekturach, ktere to podporuji, podporujeme non-exec |
||
548 | stranky</para> |
||
67 | jermar | 549 | </section> |
26 | bondari | 550 | </section> |
11 | bondari | 551 | </chapter> |