Rev 69 | Rev 71 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 69 | Rev 70 | ||
---|---|---|---|
Line 12... | Line 12... | ||
12 | 12 | ||
13 | <section> |
13 | <section> |
14 | <title>Physical memory management</title> |
14 | <title>Physical memory management</title> |
15 | 15 | ||
16 | <section id="zones_and_frames"> |
16 | <section id="zones_and_frames"> |
17 | <indexterm><primary>memory management</primary><secondary>memory zone</secondary></indexterm> |
- | |
18 | <title>Zones and frames</title> |
17 | <title>Zones and frames</title> |
19 | 18 | ||
20 | <para>HelenOS represents continuous areas of physical memory in |
19 | <para>HelenOS represents continuous areas of physical memory in |
21 | structures called frame zones (abbreviated as zones). Each zone contains |
20 | structures called frame zones (abbreviated as zones). Each zone contains |
22 | information about the number of allocated and unallocated physical |
21 | information about the number of allocated and unallocated physical |
Line 43... | Line 42... | ||
43 | structure that contains number of references and data used by buddy |
42 | structure that contains number of references and data used by buddy |
44 | system.</para> |
43 | system.</para> |
45 | </section> |
44 | </section> |
46 | 45 | ||
47 | <section id="frame_allocator"> |
46 | <section id="frame_allocator"> |
48 | <indexterm><primary>memory management</primary><secondary>frame allocator</secondary></indexterm> |
- | |
49 | <title>Frame allocator</title> |
47 | <title>Frame allocator</title> |
50 | 48 | ||
51 | <para>The frame allocator satisfies kernel requests to allocate |
49 | <para>The frame allocator satisfies kernel requests to allocate |
52 | power-of-two sized blocks of physical memory. Because of zonal |
50 | power-of-two sized blocks of physical memory. Because of zonal |
53 | organization of physical memory, the frame allocator is always working |
51 | organization of physical memory, the frame allocator is always working |
Line 83... | Line 81... | ||
83 | to take care of deallocation within the zone's buddy system.</para> |
81 | to take care of deallocation within the zone's buddy system.</para> |
84 | </formalpara> |
82 | </formalpara> |
85 | </section> |
83 | </section> |
86 | 84 | ||
87 | <section id="buddy_allocator"> |
85 | <section id="buddy_allocator"> |
88 | <indexterm><primary>memory management</primary><secondary>buddy system</secondary></indexterm> |
- | |
89 | <title>Buddy allocator</title> |
86 | <title>Buddy allocator</title> |
90 | 87 | ||
91 | <para>In the buddy system, the memory is broken down into power-of-two |
88 | <para>In the buddy system, the memory is broken down into power-of-two |
92 | sized naturally aligned blocks. These blocks are organized in an array |
89 | sized naturally aligned blocks. These blocks are organized in an array |
93 | of lists, in which the list with index i contains all unallocated blocks |
90 | of lists, in which the list with index i contains all unallocated blocks |
Line 158... | Line 155... | ||
158 | </formalpara> |
155 | </formalpara> |
159 | </section> |
156 | </section> |
160 | </section> |
157 | </section> |
161 | 158 | ||
162 | <section id="slab"> |
159 | <section id="slab"> |
163 | <indexterm><primary>memory management</primary><secondary>slab</secondary></indexterm> |
- | |
164 | <title>Slab allocator</title> |
160 | <title>Slab allocator</title> |
165 | 161 | ||
166 | <para>The majority of memory allocation requests in the kernel is for |
162 | <para>The majority of memory allocation requests in the kernel is for |
167 | small, frequently used data structures. The basic idea behind the slab |
163 | small, frequently used data structures. The basic idea behind the slab |
168 | allocator is that commonly used objects are preallocated in continuous |
164 | allocator is that commonly used objects are preallocated in continuous |
169 | areas of physical memory called slabs<footnote> |
165 | areas of physical memory called slabs<footnote> |
Line 200... | Line 196... | ||
200 | list of full magazines. Similarily, when allocating from an empty |
196 | list of full magazines. Similarily, when allocating from an empty |
201 | processor magazine cache, the kernel will reload only one magazine from |
197 | processor magazine cache, the kernel will reload only one magazine from |
202 | the list of full magazines. In other words, the slab allocator tries to |
198 | the list of full magazines. In other words, the slab allocator tries to |
203 | keep the processor magazine cache only half-full in order to prevent |
199 | keep the processor magazine cache only half-full in order to prevent |
204 | thrashing when allocations and deallocations interleave on magazine |
200 | thrashing when allocations and deallocations interleave on magazine |
- | 201 | boundaries. The advantage of this setup is that during most of the |
|
205 | boundaries.</para> |
202 | allocations, no global spinlock needs to be held.</para> |
206 | 203 | ||
207 | <para>Should HelenOS run short of memory, it would start deallocating |
204 | <para>Should HelenOS run short of memory, it would start deallocating |
208 | objects from magazines, calling slab cache destructor on them and |
205 | objects from magazines, calling slab cache destructor on them and |
209 | putting them back into slabs. When a slab contanins no allocated object, |
206 | putting them back into slabs. When a slab contanins no allocated object, |
210 | it is immediately freed.</para> |
207 | it is immediately freed.</para> |
Line 224... | Line 221... | ||
224 | 221 | ||
225 | <para>The slab allocator is closely modelled after OpenSolaris slab |
222 | <para>The slab allocator is closely modelled after OpenSolaris slab |
226 | allocator by Jeff Bonwick and Jonathan Adams with the following |
223 | allocator by Jeff Bonwick and Jonathan Adams with the following |
227 | exceptions:<itemizedlist> |
224 | exceptions:<itemizedlist> |
228 | <listitem> |
225 | <listitem> |
229 | empty slabs are immediately deallocated |
226 | empty slabs are immediately deallocated and |
230 | </listitem> |
227 | </listitem> |
231 | 228 | ||
232 | <listitem> |
229 | <listitem> |
233 | <para>empty magazines are deallocated when not needed</para> |
230 | <para>empty magazines are deallocated when not needed.</para> |
234 | </listitem> |
231 | </listitem> |
235 | </itemizedlist> Following features are not currently supported but |
232 | </itemizedlist>The following features are not currently supported |
236 | would be easy to do: <itemizedlist> |
233 | but would be easy to do: <itemizedlist> |
237 | <listitem> |
234 | <listitem> |
238 | cache coloring |
235 | cache coloring and |
239 | </listitem> |
236 | </listitem> |
240 | 237 | ||
241 | <listitem> |
238 | <listitem> |
242 | dynamic magazine grow (different magazine sizes are already supported, but the allocation strategy would need to be adjusted) |
239 | dynamic magazine grow (different magazine sizes are already supported, but the allocation strategy would need to be adjusted). |
243 | </listitem> |
240 | </listitem> |
244 | </itemizedlist></para> |
241 | </itemizedlist></para> |
245 | 242 | ||
246 | <section> |
243 | <section> |
247 | <indexterm><primary>memory management</primary><secondary>slab magazine</secondary></indexterm> |
- | |
248 | <title>Magazine layer</title> |
- | |
249 | - | ||
250 | <para>Due to the extensive bottleneck on SMP architures, caused by |
- | |
251 | global slab locking mechanism, making processing of all slab |
- | |
252 | allocation requests serialized, a new layer was introduced to the |
- | |
253 | classic slab allocator design. Slab allocator was extended to |
- | |
254 | support per-CPU caches 'magazines' to achieve good SMP scaling. |
- | |
255 | <termdef>Slab SMP perfromance bottleneck was resolved by introducing |
- | |
256 | a per-CPU caching scheme called as <glossterm>magazine |
- | |
257 | layer</glossterm></termdef>.</para> |
- | |
258 | - | ||
259 | <para>Magazine is a N-element cache of objects, so each magazine can |
- | |
260 | satisfy N allocations. Magazine behaves like a automatic weapon |
- | |
261 | magazine (LIFO, stack), so the allocation/deallocation become simple |
- | |
262 | push/pop pointer operation. Trick is that CPU does not access global |
- | |
263 | slab allocator data during the allocation from its magazine, thus |
- | |
264 | making possible parallel allocations between CPUs.</para> |
- | |
265 | - | ||
266 | <para>Implementation also requires adding another feature as the |
- | |
267 | CPU-bound magazine is actually a pair of magazines to avoid |
- | |
268 | thrashing when during allocation/deallocatiion of 1 item at the |
- | |
269 | magazine size boundary. LIFO order is enforced, which should avoid |
- | |
270 | fragmentation as much as possible.</para> |
- | |
271 | - | ||
272 | <para>Another important entity of magazine layer is the common full |
- | |
273 | magazine list (also called a depot), that stores full magazines that |
- | |
274 | may be used by any of the CPU magazine caches to reload active CPU |
- | |
275 | magazine. This list of magazines can be pre-filled with full |
- | |
276 | magazines during initialization, but in current implementation it is |
- | |
277 | filled during object deallocation, when CPU magazine becomes |
- | |
278 | full.</para> |
- | |
279 | - | ||
280 | <para>Slab allocator control structures are allocated from special |
- | |
281 | slabs, that are marked by special flag, indicating that it should |
- | |
282 | not be used for slab magazine layer. This is done to avoid possible |
- | |
283 | infinite recursions and deadlock during conventional slab allocaiton |
- | |
284 | requests.</para> |
- | |
285 | </section> |
- | |
286 | - | ||
287 | <section> |
- | |
288 | <title>Allocation/deallocation</title> |
244 | <title>Allocation/deallocation</title> |
289 | 245 | ||
290 | <para>Every cache contains list of full slabs and list of partialy |
- | |
291 | full slabs. Empty slabs are immediately freed (thrashing will be |
- | |
292 | avoided because of magazines).</para> |
- | |
293 | - | ||
294 | <para>The SLAB allocator allocates lots of space and does not free |
246 | <para>The following two paragraphs summarize and complete the |
295 | it. When frame allocator fails to allocate the frame, it calls |
247 | description of the slab allocator operation (i.e. |
296 | slab_reclaim(). It tries 'light reclaim' first, then brutal reclaim. |
- | |
297 | The light reclaim releases slabs from cpu-shared magazine-list, |
- | |
298 | until at least 1 slab is deallocated in each cache (this algorithm |
- | |
299 | should probably change). The brutal reclaim removes all cached |
248 | <code>slab_alloc</code> and <code>slab_free</code> |
300 | objects, even from CPU-bound magazines.</para> |
249 | operations).</para> |
301 | 250 | ||
302 | <formalpara> |
251 | <formalpara> |
303 | <title>Allocation</title> |
252 | <title>Allocation</title> |
304 | 253 | ||
305 | <para><emphasis>Step 1.</emphasis> When it comes to the allocation |
254 | <para><emphasis>Step 1.</emphasis> When an allocation request |
306 | request, slab allocator first of all checks availability of memory |
255 | comes, the slab allocator checks availability of memory in the |
- | 256 | current magazine of the local processor magazine cache. If the |
|
307 | in local CPU-bound magazine. If it is there, we would just "pop" |
257 | available memory is there, the allocator just pops the magazine |
308 | the CPU magazine and return the pointer to object.</para> |
258 | and returns pointer to the object.</para> |
309 | 259 | ||
310 | <para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is |
260 | <para><emphasis>Step 2.</emphasis> If the current magazine in the |
311 | empty, allocator will attempt to reload magazin, swapping it with |
261 | processor magazine cache is empty, the allocator will attempt to |
312 | second CPU magazine and returns to the first step.</para> |
262 | swap it with the last magazine from the cache and return to the |
- | 263 | first step. If also the last magazine is empty, the algorithm will |
|
- | 264 | fall through to Step 3.</para> |
|
313 | 265 | ||
314 | <para><emphasis>Step 3.</emphasis> Now we are in the situation |
266 | <para><emphasis>Step 3.</emphasis> Now the allocator is in the |
315 | when both CPU-bound magazines are empty, which makes allocator to |
267 | situation when both magazines in the processor magazine cache are |
316 | access shared full-magazines depot to reload CPU-bound magazines. |
268 | empty. The allocator reloads one magazine from the shared list of |
317 | If reload is succesful (meaning there are full magazines in depot) |
269 | full magazines. If the reload is successful (i.e. there are full |
318 | algoritm continues at Step 1.</para> |
270 | magazines in the list), the algorithm continues with Step |
- | 271 | 1.</para> |
|
319 | 272 | ||
320 | <para><emphasis>Step 4.</emphasis> Final step of the allocation. |
273 | <para><emphasis>Step 4.</emphasis> In this fail-safe step, an |
321 | In this step object is allocated from the conventional slab layer |
274 | object is allocated from the conventional slab layer and a pointer |
322 | and pointer is returned.</para> |
275 | to it is returned. If also the last magazine is full,</para> |
323 | </formalpara> |
276 | </formalpara> |
324 | 277 | ||
325 | <formalpara> |
278 | <formalpara> |
326 | <title>Deallocation</title> |
279 | <title>Deallocation</title> |
327 | 280 | ||
328 | <para><emphasis>Step 1.</emphasis> During deallocation request, |
281 | <para><emphasis>Step 1.</emphasis> During a deallocation request, |
329 | slab allocator will check if the local CPU-bound magazine is not |
282 | the slab allocator checks if the current magazine of the local |
330 | full. In this case we will just push the pointer to this |
283 | processor magazine cache is not full. If yes, the pointer to the |
- | 284 | objects is just pushed into the magazine and the algorithm |
|
331 | magazine.</para> |
285 | returns.</para> |
332 | 286 | ||
333 | <para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is |
287 | <para><emphasis>Step 2.</emphasis> If the current magazine is |
334 | full, allocator will attempt to reload magazin, swapping it with |
288 | full, the allocator will attempt to swap it with the last magazine |
335 | second CPU magazine and returns to the first step.</para> |
289 | from the cache and return to the first step. If also the last |
- | 290 | magazine is empty, the algorithm will fall through to Step |
|
- | 291 | 3.</para> |
|
336 | 292 | ||
337 | <para><emphasis>Step 3.</emphasis> Now we are in the situation |
293 | <para><emphasis>Step 3.</emphasis> Now the allocator is in the |
338 | when both CPU-bound magazines are full, which makes allocator to |
294 | situation when both magazines in the processor magazine cache are |
339 | access shared full-magazines depot to put one of the magazines to |
295 | full. The allocator moves one magazine to the shared list of full |
340 | the depot and creating new empty magazine. Algoritm continues at |
296 | magazines. The algoritm continues with Step 1.</para> |
341 | Step 1.</para> |
- | |
342 | </formalpara> |
297 | </formalpara> |
343 | </section> |
298 | </section> |
344 | </section> |
299 | </section> |
345 | </section> |
300 | </section> |
346 | - | ||
347 | <!-- End of Physmem --> |
- | |
348 | </section> |
301 | </section> |
349 | 302 | ||
350 | <section> |
303 | <section> |
351 | <title>Virtual memory management</title> |
304 | <title>Virtual memory management</title> |
352 | 305 | ||
Line 368... | Line 321... | ||
368 | </listitem> |
321 | </listitem> |
369 | </itemizedlist></para> |
322 | </itemizedlist></para> |
370 | 323 | ||
371 | <para><!-- |
324 | <para><!-- |
372 | 325 | ||
- | 326 | TLB shootdown ASID/ASID:PAGE/ALL. |
|
- | 327 | TLB shootdown requests can come in asynchroniously |
|
- | 328 | so there is a cache of TLB shootdown requests. Upon cache overflow TLB shootdown ALL is executed |
|
- | 329 | ||
- | 330 | ||
373 | <para> |
331 | <para> |
374 | Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc). |
332 | Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc). |
375 | Special address space area type - device - prohibits shrink/extend syscalls to call on it. |
333 | Special address space area type - device - prohibits shrink/extend syscalls to call on it. |
376 | Address space has link to mapping tables (hierarchical - per Address space, hash - global tables). |
334 | Address space has link to mapping tables (hierarchical - per Address space, hash - global tables). |
377 | </para> |
335 | </para> |
378 | 336 | ||
379 | --></para> |
337 | --></para> |
380 | </section> |
338 | </section> |
381 | 339 | ||
382 | <section> |
340 | <section> |
- | 341 | <title>Paging</title> |
|
- | 342 | ||
- | 343 | <para>Virtual memory is usually using paged memory model, where virtual |
|
- | 344 | memory address space is divided into the <emphasis>pages</emphasis> |
|
- | 345 | (usually having size 4096 bytes) and physical memory is divided into the |
|
- | 346 | frames (same sized as a page, of course). Each page may be mapped to |
|
- | 347 | some frame and then, upon memory access to the virtual address, CPU |
|
- | 348 | performs <emphasis>address translation</emphasis> during the instruction |
|
- | 349 | execution. Non-existing mapping generates page fault exception, calling |
|
- | 350 | kernel exception handler, thus allowing kernel to manipulate rules of |
|
- | 351 | memory access. Information for pages mapping is stored by kernel in the |
|
- | 352 | <link linkend="page_tables">page tables</link></para> |
|
- | 353 | ||
- | 354 | <para>The majority of the architectures use multi-level page tables, |
|
- | 355 | which means need to access physical memory several times before getting |
|
- | 356 | physical address. This fact would make serios performance overhead in |
|
- | 357 | virtual memory management. To avoid this <link linkend="tlb">Traslation |
|
- | 358 | Lookaside Buffer (TLB)</link> is used.</para> |
|
- | 359 | </section> |
|
- | 360 | ||
- | 361 | <section> |
|
383 | <title>Address spaces</title> |
362 | <title>Address spaces</title> |
384 | 363 | ||
385 | <section> |
364 | <section> |
386 | <indexterm><primary>memory management</primary><secondary>address space area</secondary></indexterm> |
- | |
387 | <title>Address space areas</title> |
365 | <title>Address space areas</title> |
388 | 366 | ||
389 | <para>Each address space consists of mutually disjunctive continuous |
367 | <para>Each address space consists of mutually disjunctive continuous |
390 | address space areas. Address space area is precisely defined by its |
368 | address space areas. Address space area is precisely defined by its |
391 | base address and the number of frames/pages is contains.</para> |
369 | base address and the number of frames/pages is contains.</para> |
392 | 370 | ||
Line 428... | Line 406... | ||
428 | <para>Kernel provides possibility tasks create/expand/shrink/share its |
406 | <para>Kernel provides possibility tasks create/expand/shrink/share its |
429 | address space via the set of syscalls.</para> |
407 | address space via the set of syscalls.</para> |
430 | </section> |
408 | </section> |
431 | 409 | ||
432 | <section> |
410 | <section> |
433 | <indexterm><primary>memory management</primary><secondary>asid</secondary></indexterm> |
- | |
434 | <title>Address Space ID (ASID)</title> |
411 | <title>Address Space ID (ASID)</title> |
435 | 412 | ||
436 | <para>When switching to the different task, kernel also require to |
413 | <para>When switching to the different task, kernel also require to |
437 | switch mappings to the different address space. In case TLB cannot |
414 | switch mappings to the different address space. In case TLB cannot |
438 | distinguish address space mappings, all mapping information in TLB |
415 | distinguish address space mappings, all mapping information in TLB |
Line 462... | Line 439... | ||
462 | </section> |
439 | </section> |
463 | 440 | ||
464 | <section> |
441 | <section> |
465 | <title>Virtual address translation</title> |
442 | <title>Virtual address translation</title> |
466 | 443 | ||
467 | <section id="paging"> |
444 | <section id="page_tables"> |
468 | <title>Paging</title> |
445 | <title>Page tables</title> |
469 | - | ||
470 | <section> |
- | |
471 | <title>Introduction</title> |
- | |
472 | 446 | ||
473 | <para>Virtual memory is usually using paged memory model, where |
- | |
474 | virtual memory address space is divided into the |
- | |
475 | <emphasis>pages</emphasis> (usually having size 4096 bytes) and |
- | |
476 | physical memory is divided into the frames (same sized as a page, of |
- | |
477 | course). Each page may be mapped to some frame and then, upon memory |
- | |
478 | access to the virtual address, CPU performs <emphasis>address |
- | |
479 | translation</emphasis> during the instruction execution. |
- | |
480 | Non-existing mapping generates page fault exception, calling kernel |
- | |
481 | exception handler, thus allowing kernel to manipulate rules of |
- | |
482 | memory access. Information for pages mapping is stored by kernel in |
- | |
483 | the page tables</para> |
- | |
484 | - | ||
485 | <para>The majority of the architectures use multi-level page tables, |
- | |
486 | which means need to access physical memory several times before |
- | |
487 | getting physical address. This fact would make serios performance |
- | |
488 | overhead in virtual memory management. To avoid this <link |
- | |
489 | linkend="tlb">Traslation Lookaside Buffer (TLB)</link> is |
- | |
490 | used.</para> |
- | |
491 | - | ||
492 | <para>HelenOS kernel has two different approaches to the paging |
447 | <para>HelenOS kernel has two different approaches to the paging |
493 | implementation: <emphasis>4 level page tables</emphasis> and |
448 | implementation: <emphasis>4 level page tables</emphasis> and |
494 | <emphasis>global hash table</emphasis>, which are accessible via |
449 | <emphasis>global hash tables</emphasis>, which are accessible via |
495 | generic paging abstraction layer. Such different functionality was |
450 | generic paging abstraction layer. Such different functionality was |
496 | caused by the major architectural differences between supported |
451 | caused by the major architectural differences between supported |
497 | platforms. This abstraction is implemented with help of the global |
452 | platforms. This abstraction is implemented with help of the global |
498 | structure of pointers to basic mapping functions |
453 | structure of pointers to basic mapping functions |
499 | <emphasis>page_mapping_operations</emphasis>. To achieve different |
454 | <emphasis>page_mapping_operations</emphasis>. To achieve different |
500 | functionality of page tables, corresponding layer must implement |
455 | functionality of page tables, corresponding layer must implement |
501 | functions, declared in |
456 | functions, declared in |
502 | <emphasis>page_mapping_operations</emphasis></para> |
457 | <emphasis>page_mapping_operations</emphasis></para> |
503 | - | ||
504 | <para>Thanks to the abstract paging interface, there was a place |
- | |
505 | left for more paging implementations (besides already implemented |
- | |
506 | hieararchical page tables and hash table), for example B-Tree based |
- | |
507 | page tables.</para> |
- | |
508 | </section> |
- | |
509 | 458 | ||
510 | <section> |
459 | <formalpara> |
511 | <title>Hierarchical 4-level page tables</title> |
460 | <title>4-level page tables</title> |
512 | 461 | ||
513 | <para>Hierarchical 4-level page tables are the generalization of the |
462 | <para>4-level page tables are the generalization of the hardware |
514 | hardware capabilities of most architectures. Each address space has |
- | |
515 | its own page tables.<itemizedlist> |
463 | capabilities of several architectures.<itemizedlist> |
516 | <listitem> |
464 | <listitem> |
517 | ia32 uses 2-level page tables, with full hardware support. |
465 | ia32 uses 2-level page tables, with full hardware support. |
518 | </listitem> |
466 | </listitem> |
519 | 467 | ||
520 | <listitem> |
468 | <listitem> |
Line 523... | Line 471... | ||
523 | 471 | ||
524 | <listitem> |
472 | <listitem> |
525 | mips and ppc32 have 2-level tables, software simulated support. |
473 | mips and ppc32 have 2-level tables, software simulated support. |
526 | </listitem> |
474 | </listitem> |
527 | </itemizedlist></para> |
475 | </itemizedlist></para> |
528 | </section> |
476 | </formalpara> |
529 | 477 | ||
530 | <section> |
478 | <formalpara> |
531 | <indexterm><primary>memory management</primary><secondary>hash table</secondary></indexterm> |
- | |
532 | <title>Global hash table</title> |
479 | <title>Global hash tables</title> |
533 | 480 | ||
534 | <para>Implementation of the global hash table was encouraged by the |
481 | <para>- global page hash table: existuje jen jedna v celem systemu |
535 | ia64 architecture support. One of the major differences between |
482 | (vyuziva ji ia64), pozn. ia64 ma zatim vypnuty VHPT. Pouziva se |
536 | global hash table and hierarchical tables is that global hash table |
483 | genericke hash table s oddelenymi collision chains. ASID support is |
537 | exists only once in the system and the hierarchical tables are |
- | |
538 | maintained per address space.</para> |
484 | required to use global hash tables.</para> |
539 | - | ||
540 | <para>Thus, hash table contains information about all address spaces |
- | |
541 | mappings in the system, so, the hash of an entry must contain |
- | |
542 | information of both address space pointer or id and the virtual |
- | |
543 | address of the page. Generic hash table implementation assumes that |
- | |
544 | the addresses of the pointers to the address spaces are likely to be |
- | |
545 | on the close addresses, so it uses least significant bits for hash; |
- | |
546 | also it assumes that the virtual page addresses have roughly the |
- | |
547 | same probability of occurring, so the least significant bits of VPN |
- | |
548 | compose the hash index.</para> |
485 | </formalpara> |
549 | 486 | ||
- | 487 | <para>Thanks to the abstract paging interface, there is possibility |
|
550 | <para>Collision chains ...</para> |
488 | left have more paging implementations, for example B-Tree page |
551 | </section> |
489 | tables.</para> |
552 | </section> |
490 | </section> |
553 | 491 | ||
554 | <section id="tlb"> |
492 | <section id="tlb"> |
555 | <indexterm><primary>memory management</primary><secondary>tlb</secondary></indexterm> |
- | |
556 | <title>Translation Lookaside buffer</title> |
493 | <title>Translation Lookaside buffer</title> |
557 | 494 | ||
558 | <para>Due to the extensive overhead during the page mapping lookup in |
495 | <para>Due to the extensive overhead during the page mapping lookup in |
559 | the page tables, all architectures has fast assotiative cache memory |
496 | the page tables, all architectures has fast assotiative cache memory |
560 | built-in CPU. This memory called TLB stores recently used page table |
497 | built-in CPU. This memory called TLB stores recently used page table |
Line 565... | Line 502... | ||
565 | 502 | ||
566 | <para>Operating system is responsible for keeping TLB consistent by |
503 | <para>Operating system is responsible for keeping TLB consistent by |
567 | invalidating the contents of TLB, whenever there is some change in |
504 | invalidating the contents of TLB, whenever there is some change in |
568 | page tables. Those changes may occur when page or group of pages |
505 | page tables. Those changes may occur when page or group of pages |
569 | were unmapped, mapping is changed or system switching active address |
506 | were unmapped, mapping is changed or system switching active address |
570 | space to schedule a new system task. Moreover, this invalidation |
507 | space to schedule a new system task (which is a batch unmap of all |
571 | operation must be done an all system CPUs because each CPU has its |
508 | address space mappings). Moreover, this invalidation operation must |
572 | own independent TLB cache. Thus maintaining TLB consistency on SMP |
509 | be done an all system CPUs because each CPU has its own independent |
573 | configuration as not as trivial task as it looks on the first |
510 | TLB cache. Thus maintaining TLB consistency on SMP configuration as |
574 | glance. Naive solution would assume that is the CPU which wants to |
511 | not as trivial task as it looks at the first glance. Naive solution |
575 | invalidate TLB will invalidate TLB caches on other CPUs. It is not |
512 | would assume remote TLB invalidatation, which is not possible on the |
576 | possible on the most of the architectures, because of the simple |
513 | most of the architectures, because of the simple fact - flushing TLB |
577 | fact - flushing TLB is allowed only on the local CPU and there is no |
514 | is allowed only on the local CPU and there is no possibility to |
578 | possibility to access other CPUs' TLB caches, thus invalidate TLB |
515 | access other CPUs' TLB caches.</para> |
579 | remotely.</para> |
- | |
580 | 516 | ||
581 | <para>Technique of remote invalidation of TLB entries is called "TLB |
517 | <para>Technique of remote invalidation of TLB entries is called "TLB |
582 | shootdown". HelenOS uses a variation of the algorithm described by |
518 | shootdown". HelenOS uses a variation of the algorithm described by |
583 | D. Black et al., "Translation Lookaside Buffer Consistency: A |
519 | D. Black et al., "Translation Lookaside Buffer Consistency: A |
584 | Software Approach," Proc. Third Int'l Conf. Architectural Support |
520 | Software Approach," Proc. Third Int'l Conf. Architectural Support |
Line 586... | Line 522... | ||
586 | 113-122.</para> |
522 | 113-122.</para> |
587 | 523 | ||
588 | <para>As the situation demands, you will want partitial invalidation |
524 | <para>As the situation demands, you will want partitial invalidation |
589 | of TLB caches. In case of simple memory mapping change it is |
525 | of TLB caches. In case of simple memory mapping change it is |
590 | necessary to invalidate only one or more adjacent pages. In case if |
526 | necessary to invalidate only one or more adjacent pages. In case if |
591 | the architecture is aware of ASIDs, when kernel needs to dump some |
527 | the architecture is aware of ASIDs, during the address space |
592 | ASID to use by another task, it invalidates only entries from this |
528 | switching, kernel invalidates only entries from this particular |
593 | particular address space. Final option of the TLB invalidation is |
529 | address space. Final option of the TLB invalidation is the complete |
594 | the complete TLB cache invalidation, which is the operation that |
530 | TLB cache invalidation, which is the operation that flushes all |
595 | flushes all entries in TLB.</para> |
531 | entries in TLB.</para> |
596 | 532 | ||
597 | <para>TLB shootdown is performed in two phases.</para> |
533 | <para>TLB shootdown is performed in two phases. First, the initiator |
598 | - | ||
- | 534 | process sends an IPI message indicating the TLB shootdown request to |
|
- | 535 | the rest of the CPUs. Then, it waits until all CPUs confirm TLB |
|
- | 536 | invalidating action execution.</para> |
|
599 | <formalpara> |
537 | </section> |
600 | <title>Phase 1.</title> |
538 | </section> |
- | 539 | </section> |
|
601 | 540 | ||
602 | <para>First, initiator locks a global TLB spinlock, then request |
- | |
603 | is being put to the local request cache of every other CPU in the |
- | |
604 | system protected by its spinlock. In case the cache is full, all |
- | |
605 | requests in the cache are replaced by one request, indicating |
- | |
606 | global TLB flush. Then the initiator thread sends an IPI message |
- | |
607 | indicating the TLB shootdown request to the rest of the CPUs and |
- | |
608 | waits actively until all CPUs confirm TLB invalidating action |
- | |
609 | execution by setting up a special flag. After setting this flag |
- | |
610 | this thread is blocked on the TLB spinlock, held by the |
- | |
611 | initiator.</para> |
541 | <section> |
612 | </formalpara> |
542 | <title>---</title> |
613 | 543 | ||
614 | <formalpara> |
- | |
615 | <title>Phase 2.</title> |
544 | <para>At the moment HelenOS does not support swapping.</para> |
616 | 545 | ||
617 | <para>All CPUs are waiting on the TLB spinlock to execute TLB |
546 | <para>- pouzivame vypadky stranky k alokaci ramcu on-demand v ramci |
618 | invalidation action and have indicated their intention to the |
- | |
619 | initiator. Initiator continues, cleaning up its TLB and releasing |
547 | as_area - na architekturach, ktere to podporuji, podporujeme non-exec |
620 | the global TLB spinlock. After this all other CPUs gain and |
- | |
621 | immidiately release TLB spinlock and perform TLB invalidation |
- | |
622 | actions.</para> |
548 | stranky</para> |
623 | </formalpara> |
- | |
624 | </section> |
- | |
625 | </section> |
- | |
626 | </section> |
549 | </section> |
627 | </section> |
550 | </section> |
628 | </chapter> |
551 | </chapter> |
629 | 552 |