Rev 70 | Rev 72 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
| Rev 70 | Rev 71 | ||
|---|---|---|---|
| Line 42... | Line 42... | ||
| 42 | structure that contains number of references and data used by buddy |
42 | structure that contains number of references and data used by buddy |
| 43 | system.</para> |
43 | system.</para> |
| 44 | </section> |
44 | </section> |
| 45 | 45 | ||
| 46 | <section id="frame_allocator"> |
46 | <section id="frame_allocator"> |
| - | 47 | <indexterm> |
|
| - | 48 | <primary>frame allocator</primary> |
|
| - | 49 | </indexterm> |
|
| - | 50 | ||
| 47 | <title>Frame allocator</title> |
51 | <title>Frame allocator</title> |
| 48 | 52 | ||
| 49 | <para>The frame allocator satisfies kernel requests to allocate |
53 | <para>The frame allocator satisfies kernel requests to allocate |
| 50 | power-of-two sized blocks of physical memory. Because of zonal |
54 | power-of-two sized blocks of physical memory. Because of zonal |
| 51 | organization of physical memory, the frame allocator is always working |
55 | organization of physical memory, the frame allocator is always working |
| Line 81... | Line 85... | ||
| 81 | to take care of deallocation within the zone's buddy system.</para> |
85 | to take care of deallocation within the zone's buddy system.</para> |
| 82 | </formalpara> |
86 | </formalpara> |
| 83 | </section> |
87 | </section> |
| 84 | 88 | ||
| 85 | <section id="buddy_allocator"> |
89 | <section id="buddy_allocator"> |
| - | 90 | <indexterm> |
|
| - | 91 | <primary>buddy system</primary> |
|
| - | 92 | </indexterm> |
|
| - | 93 | ||
| 86 | <title>Buddy allocator</title> |
94 | <title>Buddy allocator</title> |
| 87 | 95 | ||
| 88 | <para>In the buddy system, the memory is broken down into power-of-two |
96 | <para>In the buddy system, the memory is broken down into power-of-two |
| 89 | sized naturally aligned blocks. These blocks are organized in an array |
97 | sized naturally aligned blocks. These blocks are organized in an array |
| 90 | of lists, in which the list with index i contains all unallocated blocks |
98 | of lists, in which the list with index i contains all unallocated blocks |
| Line 155... | Line 163... | ||
| 155 | </formalpara> |
163 | </formalpara> |
| 156 | </section> |
164 | </section> |
| 157 | </section> |
165 | </section> |
| 158 | 166 | ||
| 159 | <section id="slab"> |
167 | <section id="slab"> |
| - | 168 | <indexterm> |
|
| - | 169 | <primary>slab allocator</primary> |
|
| - | 170 | </indexterm> |
|
| - | 171 | ||
| 160 | <title>Slab allocator</title> |
172 | <title>Slab allocator</title> |
| 161 | 173 | ||
| 162 | <para>The majority of memory allocation requests in the kernel is for |
174 | <para>The majority of memory allocation requests in the kernel is for |
| 163 | small, frequently used data structures. The basic idea behind the slab |
175 | small, frequently used data structures. The basic idea behind the slab |
| 164 | allocator is that commonly used objects are preallocated in continuous |
176 | allocator is that commonly used objects are preallocated in continuous |
| Line 172... | Line 184... | ||
| 172 | paragraph.</para> |
184 | paragraph.</para> |
| 173 | </footnote>. Due to the fact that the sizes of the requested and |
185 | </footnote>. Due to the fact that the sizes of the requested and |
| 174 | allocated object match, the slab allocator significantly reduces |
186 | allocated object match, the slab allocator significantly reduces |
| 175 | internal fragmentation.</para> |
187 | internal fragmentation.</para> |
| 176 | 188 | ||
| - | 189 | <indexterm> |
|
| - | 190 | <primary>slab allocator</primary> |
|
| - | 191 | ||
| - | 192 | <secondary>slab cache</secondary> |
|
| - | 193 | </indexterm> |
|
| - | 194 | ||
| 177 | <para>Slabs of one object type are organized in a structure called slab |
195 | <para>Slabs of one object type are organized in a structure called slab |
| 178 | cache. There are ususally more slabs in the slab cache, depending on |
196 | cache. There are ususally more slabs in the slab cache, depending on |
| 179 | previous allocations. If the the slab cache runs out of available slabs, |
197 | previous allocations. If the the slab cache runs out of available slabs, |
| 180 | new slabs are allocated. In order to exploit parallelism and to avoid |
198 | new slabs are allocated. In order to exploit parallelism and to avoid |
| 181 | locking of shared spinlocks, slab caches can have variants of |
199 | locking of shared spinlocks, slab caches can have variants of |
| 182 | processor-private slabs called magazines. On each processor, there is a |
200 | processor-private slabs called magazines. On each processor, there is a |
| 183 | two-magazine cache. Full magazines that are not part of any |
201 | two-magazine cache. Full magazines that are not part of any |
| 184 | per-processor magazine cache are stored in a global list of full |
202 | per-processor magazine cache are stored in a global list of full |
| 185 | magazines.</para> |
203 | magazines.</para> |
| 186 | 204 | ||
| - | 205 | <indexterm> |
|
| - | 206 | <primary>slab allocator</primary> |
|
| - | 207 | ||
| - | 208 | <secondary>magazine</secondary> |
|
| - | 209 | </indexterm> |
|
| - | 210 | ||
| 187 | <para>Each object begins its life in a slab. When it is allocated from |
211 | <para>Each object begins its life in a slab. When it is allocated from |
| 188 | there, the slab allocator calls a constructor that is registered in the |
212 | there, the slab allocator calls a constructor that is registered in the |
| 189 | respective slab cache. The constructor initializes and brings the object |
213 | respective slab cache. The constructor initializes and brings the object |
| 190 | into a known state. The object is then used by the user. When the user |
214 | into a known state. The object is then used by the user. When the user |
| 191 | later frees the object, the slab allocator puts it into a processor |
215 | later frees the object, the slab allocator puts it into a processor |
| - | 216 | private <indexterm> |
|
| - | 217 | <primary>slab allocator</primary> |
|
| - | 218 | ||
| - | 219 | <secondary>magazine</secondary> |
|
| 192 | private magazine cache, from where it can be precedently allocated |
220 | </indexterm>magazine cache, from where it can be precedently allocated |
| 193 | again. Note that allocations satisfied from a magazine are already |
221 | again. Note that allocations satisfied from a magazine are already |
| 194 | initialized by the constructor. When both of the processor cached |
222 | initialized by the constructor. When both of the processor cached |
| 195 | magazines get full, the allocator will move one of the magazines to the |
223 | magazines get full, the allocator will move one of the magazines to the |
| 196 | list of full magazines. Similarily, when allocating from an empty |
224 | list of full magazines. Similarily, when allocating from an empty |
| 197 | processor magazine cache, the kernel will reload only one magazine from |
225 | processor magazine cache, the kernel will reload only one magazine from |
| Line 204... | Line 232... | ||
| 204 | <para>Should HelenOS run short of memory, it would start deallocating |
232 | <para>Should HelenOS run short of memory, it would start deallocating |
| 205 | objects from magazines, calling slab cache destructor on them and |
233 | objects from magazines, calling slab cache destructor on them and |
| 206 | putting them back into slabs. When a slab contanins no allocated object, |
234 | putting them back into slabs. When a slab contanins no allocated object, |
| 207 | it is immediately freed.</para> |
235 | it is immediately freed.</para> |
| 208 | 236 | ||
| - | 237 | <para> |
|
| 209 | <para><figure> |
238 | <figure> |
| 210 | <mediaobject id="slab_alloc"> |
239 | <mediaobject id="slab_alloc"> |
| 211 | <imageobject role="html"> |
240 | <imageobject role="html"> |
| 212 | <imagedata fileref="images/slab_alloc.png" format="PNG" /> |
241 | <imagedata fileref="images/slab_alloc.png" format="PNG" /> |
| 213 | </imageobject> |
242 | </imageobject> |
| 214 | </mediaobject> |
243 | </mediaobject> |
| 215 | 244 | ||
| 216 | <title>Slab allocator scheme.</title> |
245 | <title>Slab allocator scheme.</title> |
| 217 | </figure></para> |
246 | </figure> |
| - | 247 | </para> |
|
| 218 | 248 | ||
| 219 | <section> |
249 | <section> |
| 220 | <title>Implementation</title> |
250 | <title>Implementation</title> |
| 221 | 251 | ||
| 222 | <para>The slab allocator is closely modelled after OpenSolaris slab |
252 | <para>The slab allocator is closely modelled after OpenSolaris slab |
| 223 | allocator by Jeff Bonwick and Jonathan Adams with the following |
253 | allocator by Jeff Bonwick and Jonathan Adams <xref |
| 224 | exceptions:<itemizedlist> |
254 | linkend="Bonwick01" /> with the following exceptions:<itemizedlist> |
| 225 | <listitem> |
- | |
| 226 | empty slabs are immediately deallocated and |
255 | <listitem>empty slabs are immediately deallocated and</listitem> |
| 227 | </listitem> |
- | |
| 228 | 256 | ||
| 229 | <listitem> |
257 | <listitem> |
| 230 | <para>empty magazines are deallocated when not needed.</para> |
258 | <para>empty magazines are deallocated when not needed.</para> |
| 231 | </listitem> |
259 | </listitem> |
| 232 | </itemizedlist>The following features are not currently supported |
260 | </itemizedlist>The following features are not currently supported |
| 233 | but would be easy to do: <itemizedlist> |
261 | but would be easy to do: <itemizedlist> |
| 234 | <listitem> |
- | |
| 235 | cache coloring and |
262 | <listitem>cache coloring and</listitem> |
| 236 | </listitem> |
- | |
| 237 | 263 | ||
| 238 | <listitem> |
264 | <listitem>dynamic magazine grow (different magazine sizes are |
| 239 | dynamic magazine grow (different magazine sizes are already supported, but the allocation strategy would need to be adjusted). |
265 | already supported, but the allocation strategy would need to be |
| 240 | </listitem> |
266 | adjusted).</listitem> |
| 241 | </itemizedlist></para> |
267 | </itemizedlist></para> |
| 242 | 268 | ||
| 243 | <section> |
269 | <section> |
| 244 | <title>Allocation/deallocation</title> |
270 | <title>Allocation/deallocation</title> |
| 245 | 271 | ||
| Line 320... | Line 346... | ||
| 320 | Allowing, in general, to load and execute two programs that are linked on the same address without complicated relocations. |
346 | Allowing, in general, to load and execute two programs that are linked on the same address without complicated relocations. |
| 321 | </listitem> |
347 | </listitem> |
| 322 | </itemizedlist></para> |
348 | </itemizedlist></para> |
| 323 | 349 | ||
| 324 | <para><!-- |
350 | <para><!-- |
| 325 | - | ||
| 326 | TLB shootdown ASID/ASID:PAGE/ALL. |
- | |
| 327 | TLB shootdown requests can come in asynchroniously |
- | |
| 328 | so there is a cache of TLB shootdown requests. Upon cache overflow TLB shootdown ALL is executed |
- | |
| 329 | - | ||
| 330 | - | ||
| 331 | <para> |
351 | <para> |
| 332 | Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc). |
352 | Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc). |
| 333 | Special address space area type - device - prohibits shrink/extend syscalls to call on it. |
353 | Special address space area type - device - prohibits shrink/extend syscalls to call on it. |
| 334 | Address space has link to mapping tables (hierarchical - per Address space, hash - global tables). |
354 | Address space has link to mapping tables (hierarchical - per Address space, hash - global tables). |
| 335 | </para> |
355 | </para> |
| 336 | 356 | ||
| 337 | --></para> |
357 | --></para> |
| 338 | </section> |
358 | </section> |
| 339 | 359 | ||
| 340 | <section> |
360 | <section> |
| 341 | <title>Paging</title> |
- | |
| 342 | - | ||
| 343 | <para>Virtual memory is usually using paged memory model, where virtual |
- | |
| 344 | memory address space is divided into the <emphasis>pages</emphasis> |
- | |
| 345 | (usually having size 4096 bytes) and physical memory is divided into the |
- | |
| 346 | frames (same sized as a page, of course). Each page may be mapped to |
- | |
| 347 | some frame and then, upon memory access to the virtual address, CPU |
- | |
| 348 | performs <emphasis>address translation</emphasis> during the instruction |
- | |
| 349 | execution. Non-existing mapping generates page fault exception, calling |
- | |
| 350 | kernel exception handler, thus allowing kernel to manipulate rules of |
- | |
| 351 | memory access. Information for pages mapping is stored by kernel in the |
- | |
| 352 | <link linkend="page_tables">page tables</link></para> |
- | |
| 353 | - | ||
| 354 | <para>The majority of the architectures use multi-level page tables, |
- | |
| 355 | which means need to access physical memory several times before getting |
- | |
| 356 | physical address. This fact would make serios performance overhead in |
- | |
| 357 | virtual memory management. To avoid this <link linkend="tlb">Traslation |
- | |
| 358 | Lookaside Buffer (TLB)</link> is used.</para> |
- | |
| 359 | </section> |
- | |
| 360 | - | ||
| 361 | <section> |
- | |
| 362 | <title>Address spaces</title> |
361 | <title>Address spaces</title> |
| 363 | 362 | ||
| 364 | <section> |
363 | <section> |
| - | 364 | <indexterm> |
|
| - | 365 | <primary>address space</primary> |
|
| - | 366 | ||
| - | 367 | <secondary>area</secondary> |
|
| - | 368 | </indexterm> |
|
| - | 369 | ||
| 365 | <title>Address space areas</title> |
370 | <title>Address space areas</title> |
| 366 | 371 | ||
| 367 | <para>Each address space consists of mutually disjunctive continuous |
372 | <para>Each address space consists of mutually disjunctive continuous |
| 368 | address space areas. Address space area is precisely defined by its |
373 | address space areas. Address space area is precisely defined by its |
| 369 | base address and the number of frames/pages is contains.</para> |
374 | base address and the number of frames/pages is contains.</para> |
| 370 | 375 | ||
| 371 | <para>Address space area , that define behaviour and permissions on |
376 | <para>Address space area , that define behaviour and permissions on |
| 372 | the particular area. <itemizedlist> |
377 | the particular area. <itemizedlist> |
| 373 | <listitem> |
- | |
| 374 | - | ||
| 375 | - | ||
| 376 | <emphasis>AS_AREA_READ</emphasis> |
378 | <listitem><emphasis>AS_AREA_READ</emphasis> flag indicates reading |
| 377 | - | ||
| 378 | flag indicates reading permission. |
- | |
| 379 | </listitem> |
- | |
| 380 | - | ||
| 381 | <listitem> |
- | |
| 382 | - | ||
| 383 | - | ||
| 384 | <emphasis>AS_AREA_WRITE</emphasis> |
- | |
| 385 | - | ||
| 386 | flag indicates writing permission. |
- | |
| 387 | </listitem> |
379 | permission.</listitem> |
| 388 | - | ||
| 389 | <listitem> |
- | |
| 390 | - | ||
| 391 | 380 | ||
| 392 | <emphasis>AS_AREA_EXEC</emphasis> |
381 | <listitem><emphasis>AS_AREA_WRITE</emphasis> flag indicates |
| - | 382 | writing permission.</listitem> |
|
| 393 | 383 | ||
| - | 384 | <listitem><emphasis>AS_AREA_EXEC</emphasis> flag indicates code |
|
| 394 | flag indicates code execution permission. Some architectures do not support execution persmission restriction. In this case this flag has no effect. |
385 | execution permission. Some architectures do not support execution |
| 395 | </listitem> |
386 | persmission restriction. In this case this flag has no |
| 396 | - | ||
| 397 | <listitem> |
387 | effect.</listitem> |
| 398 | - | ||
| 399 | 388 | ||
| 400 | <emphasis>AS_AREA_DEVICE</emphasis> |
389 | <listitem><emphasis>AS_AREA_DEVICE</emphasis> marks area as mapped |
| 401 | - | ||
| 402 | marks area as mapped to the device memory. |
- | |
| 403 | </listitem> |
390 | to the device memory.</listitem> |
| 404 | </itemizedlist></para> |
391 | </itemizedlist></para> |
| 405 | 392 | ||
| 406 | <para>Kernel provides possibility tasks create/expand/shrink/share its |
393 | <para>Kernel provides possibility tasks create/expand/shrink/share its |
| 407 | address space via the set of syscalls.</para> |
394 | address space via the set of syscalls.</para> |
| 408 | </section> |
395 | </section> |
| 409 | 396 | ||
| 410 | <section> |
397 | <section> |
| - | 398 | <indexterm> |
|
| - | 399 | <primary>address space</primary> |
|
| - | 400 | ||
| - | 401 | <secondary>ASID</secondary> |
|
| - | 402 | </indexterm> |
|
| - | 403 | ||
| 411 | <title>Address Space ID (ASID)</title> |
404 | <title>Address Space ID (ASID)</title> |
| 412 | 405 | ||
| 413 | <para>When switching to the different task, kernel also require to |
406 | <para>When switching to the different task, kernel also require to |
| 414 | switch mappings to the different address space. In case TLB cannot |
407 | switch mappings to the different address space. In case TLB cannot |
| 415 | distinguish address space mappings, all mapping information in TLB |
408 | distinguish address space mappings, all mapping information in TLB |
| Line 432... | Line 425... | ||
| 432 | impossible to use it as unique address space identifier for all tasks |
425 | impossible to use it as unique address space identifier for all tasks |
| 433 | running in the system. In such situations special ASID stealing |
426 | running in the system. In such situations special ASID stealing |
| 434 | algoritm is used, which takes ASID from inactive task and assigns it |
427 | algoritm is used, which takes ASID from inactive task and assigns it |
| 435 | to the active task.</para> |
428 | to the active task.</para> |
| 436 | 429 | ||
| - | 430 | <indexterm> |
|
| - | 431 | <primary>address space</primary> |
|
| - | 432 | ||
| - | 433 | <secondary>ASID stealing</secondary> |
|
| - | 434 | </indexterm> |
|
| - | 435 | ||
| - | 436 | <para> |
|
| 437 | <para><classname>ASID stealing algoritm here.</classname></para> |
437 | <classname>ASID stealing algoritm here.</classname> |
| - | 438 | </para> |
|
| 438 | </section> |
439 | </section> |
| 439 | </section> |
440 | </section> |
| 440 | 441 | ||
| 441 | <section> |
442 | <section id="paging"> |
| 442 | <title>Virtual address translation</title> |
443 | <title>Virtual address translation</title> |
| 443 | 444 | ||
| 444 | <section id="page_tables"> |
445 | <section> |
| 445 | <title>Page tables</title> |
446 | <title>Introduction</title> |
| - | 447 | ||
| - | 448 | <para>Virtual memory is usually using paged memory model, where |
|
| - | 449 | virtual memory address space is divided into the |
|
| - | 450 | <emphasis>pages</emphasis> (usually having size 4096 bytes) and |
|
| - | 451 | physical memory is divided into the frames (same sized as a page, of |
|
| - | 452 | course). Each page may be mapped to some frame and then, upon memory |
|
| - | 453 | access to the virtual address, CPU performs <emphasis>address |
|
| - | 454 | translation</emphasis> during the instruction execution. Non-existing |
|
| - | 455 | mapping generates page fault exception, calling kernel exception |
|
| - | 456 | handler, thus allowing kernel to manipulate rules of memory access. |
|
| - | 457 | Information for pages mapping is stored by kernel in the <link |
|
| - | 458 | linkend="page_tables">page tables</link></para> |
|
| - | 459 | ||
| - | 460 | <indexterm> |
|
| - | 461 | <primary>page tables</primary> |
|
| - | 462 | </indexterm> |
|
| - | 463 | ||
| - | 464 | <para>The majority of the architectures use multi-level page tables, |
|
| - | 465 | which means need to access physical memory several times before |
|
| - | 466 | getting physical address. This fact would make serios performance |
|
| - | 467 | overhead in virtual memory management. To avoid this <link |
|
| - | 468 | linkend="tlb">Traslation Lookaside Buffer (TLB)</link> is used.</para> |
|
| 446 | 469 | ||
| 447 | <para>HelenOS kernel has two different approaches to the paging |
470 | <para>HelenOS kernel has two different approaches to the paging |
| 448 | implementation: <emphasis>4 level page tables</emphasis> and |
471 | implementation: <emphasis>4 level page tables</emphasis> and |
| 449 | <emphasis>global hash tables</emphasis>, which are accessible via |
472 | <emphasis>global hash table</emphasis>, which are accessible via |
| 450 | generic paging abstraction layer. Such different functionality was |
473 | generic paging abstraction layer. Such different functionality was |
| 451 | caused by the major architectural differences between supported |
474 | caused by the major architectural differences between supported |
| 452 | platforms. This abstraction is implemented with help of the global |
475 | platforms. This abstraction is implemented with help of the global |
| 453 | structure of pointers to basic mapping functions |
476 | structure of pointers to basic mapping functions |
| 454 | <emphasis>page_mapping_operations</emphasis>. To achieve different |
477 | <emphasis>page_mapping_operations</emphasis>. To achieve different |
| 455 | functionality of page tables, corresponding layer must implement |
478 | functionality of page tables, corresponding layer must implement |
| 456 | functions, declared in |
479 | functions, declared in |
| 457 | <emphasis>page_mapping_operations</emphasis></para> |
480 | <emphasis>page_mapping_operations</emphasis></para> |
| 458 | 481 | ||
| - | 482 | <para>Thanks to the abstract paging interface, there was a place left |
|
| - | 483 | for more paging implementations (besides already implemented |
|
| - | 484 | hieararchical page tables and hash table), for example B-Tree based |
|
| 459 | <formalpara> |
485 | page tables.</para> |
| 460 | <title>4-level page tables</title> |
486 | </section> |
| 461 | 487 | ||
| 462 | <para>4-level page tables are the generalization of the hardware |
- | |
| 463 | capabilities of several architectures.<itemizedlist> |
- | |
| 464 | <listitem> |
- | |
| 465 | ia32 uses 2-level page tables, with full hardware support. |
- | |
| 466 | </listitem> |
488 | <section id="page_tables"> |
| 467 | - | ||
| 468 | <listitem> |
- | |
| 469 | amd64 uses 4-level page tables, also coming with full hardware support. |
- | |
| 470 | </listitem> |
- | |
| 471 | - | ||
| 472 | <listitem> |
489 | <indexterm> |
| 473 | mips and ppc32 have 2-level tables, software simulated support. |
- | |
| 474 | </listitem> |
- | |
| 475 | </itemizedlist></para> |
490 | <primary>page tables</primary> |
| 476 | </formalpara> |
- | |
| 477 | 491 | ||
| 478 | <formalpara> |
492 | <secondary>hierarchical</secondary> |
| 479 | <title>Global hash tables</title> |
493 | </indexterm> |
| 480 | 494 | ||
| - | 495 | <title>Hierarchical 4-level page tables</title> |
|
| - | 496 | ||
| 481 | <para>- global page hash table: existuje jen jedna v celem systemu |
497 | <para>Hierarchical 4-level page tables are the generalization of the |
| 482 | (vyuziva ji ia64), pozn. ia64 ma zatim vypnuty VHPT. Pouziva se |
498 | hardware capabilities of most architectures. Each address space has |
| 483 | genericke hash table s oddelenymi collision chains. ASID support is |
499 | its own page tables.<itemizedlist> |
| 484 | required to use global hash tables.</para> |
500 | <listitem>ia32 uses 2-level page tables, with full hardware |
| 485 | </formalpara> |
501 | support.</listitem> |
| - | 502 | ||
| - | 503 | <listitem>amd64 uses 4-level page tables, also coming with full |
|
| - | 504 | hardware support.</listitem> |
|
| 486 | 505 | ||
| 487 | <para>Thanks to the abstract paging interface, there is possibility |
506 | <listitem>mips and ppc32 have 2-level tables, software simulated |
| 488 | left have more paging implementations, for example B-Tree page |
507 | support.</listitem> |
| 489 | tables.</para> |
508 | </itemizedlist></para> |
| 490 | </section> |
509 | </section> |
| 491 | 510 | ||
| 492 | <section id="tlb"> |
511 | <section> |
| - | 512 | <indexterm> |
|
| 493 | <title>Translation Lookaside buffer</title> |
513 | <primary>page tables</primary> |
| 494 | 514 | ||
| 495 | <para>Due to the extensive overhead during the page mapping lookup in |
- | |
| 496 | the page tables, all architectures has fast assotiative cache memory |
- | |
| 497 | built-in CPU. This memory called TLB stores recently used page table |
515 | <secondary>hashing</secondary> |
| 498 | entries.</para> |
516 | </indexterm> |
| 499 | 517 | ||
| 500 | <section id="tlb_shootdown"> |
518 | <title>Global hash table</title> |
| 501 | <title>TLB consistency. TLB shootdown algorithm.</title> |
- | |
| 502 | 519 | ||
| 503 | <para>Operating system is responsible for keeping TLB consistent by |
520 | <para>Implementation of the global hash table was encouraged by the |
| 504 | invalidating the contents of TLB, whenever there is some change in |
521 | ia64 architecture support. One of the major differences between global |
| 505 | page tables. Those changes may occur when page or group of pages |
522 | hash table and hierarchical tables is that global hash table exists |
| 506 | were unmapped, mapping is changed or system switching active address |
- | |
| 507 | space to schedule a new system task (which is a batch unmap of all |
- | |
| 508 | address space mappings). Moreover, this invalidation operation must |
- | |
| 509 | be done an all system CPUs because each CPU has its own independent |
523 | only once in the system and the hierarchical tables are maintained per |
| 510 | TLB cache. Thus maintaining TLB consistency on SMP configuration as |
- | |
| 511 | not as trivial task as it looks at the first glance. Naive solution |
- | |
| 512 | would assume remote TLB invalidatation, which is not possible on the |
- | |
| 513 | most of the architectures, because of the simple fact - flushing TLB |
- | |
| 514 | is allowed only on the local CPU and there is no possibility to |
- | |
| 515 | access other CPUs' TLB caches.</para> |
524 | address space.</para> |
| 516 | 525 | ||
| 517 | <para>Technique of remote invalidation of TLB entries is called "TLB |
526 | <para>Thus, hash table contains information about all address spaces |
| 518 | shootdown". HelenOS uses a variation of the algorithm described by |
527 | mappings in the system, so, the hash of an entry must contain |
| 519 | D. Black et al., "Translation Lookaside Buffer Consistency: A |
- | |
| 520 | Software Approach," Proc. Third Int'l Conf. Architectural Support |
- | |
| 521 | for Programming Languages and Operating Systems, 1989, pp. |
528 | information of both address space pointer or id and the virtual |
| 522 | 113-122.</para> |
- | |
| 523 | - | ||
| 524 | <para>As the situation demands, you will want partitial invalidation |
- | |
| 525 | of TLB caches. In case of simple memory mapping change it is |
529 | address of the page. Generic hash table implementation assumes that |
| 526 | necessary to invalidate only one or more adjacent pages. In case if |
530 | the addresses of the pointers to the address spaces are likely to be |
| 527 | the architecture is aware of ASIDs, during the address space |
531 | on the close addresses, so it uses least significant bits for hash; |
| 528 | switching, kernel invalidates only entries from this particular |
532 | also it assumes that the virtual page addresses have roughly the same |
| 529 | address space. Final option of the TLB invalidation is the complete |
533 | probability of occurring, so the least significant bits of VPN compose |
| 530 | TLB cache invalidation, which is the operation that flushes all |
- | |
| 531 | entries in TLB.</para> |
534 | the hash index.</para> |
| 532 | 535 | ||
| 533 | <para>TLB shootdown is performed in two phases. First, the initiator |
536 | <para>- global page hash table: existuje jen jedna v celem systemu |
| 534 | process sends an IPI message indicating the TLB shootdown request to |
537 | (vyuziva ji ia64), pozn. ia64 ma zatim vypnuty VHPT. Pouziva se |
| 535 | the rest of the CPUs. Then, it waits until all CPUs confirm TLB |
538 | genericke hash table s oddelenymi collision chains. ASID support is |
| 536 | invalidating action execution.</para> |
539 | required to use global hash tables.</para> |
| 537 | </section> |
- | |
| 538 | </section> |
540 | </section> |
| 539 | </section> |
541 | </section> |
| 540 | 542 | ||
| 541 | <section> |
543 | <section id="tlb"> |
| - | 544 | <indexterm> |
|
| - | 545 | <primary>TLB</primary> |
|
| 542 | <title>---</title> |
546 | </indexterm> |
| - | 547 | ||
| - | 548 | <title>Translation Lookaside buffer</title> |
|
| - | 549 | ||
| - | 550 | <para>Due to the extensive overhead during the page mapping lookup in |
|
| - | 551 | the page tables, all architectures has fast assotiative cache memory |
|
| - | 552 | built-in CPU. This memory called TLB stores recently used page table |
|
| - | 553 | entries.</para> |
|
| - | 554 | ||
| - | 555 | <section id="tlb_shootdown"> |
|
| - | 556 | <indexterm> |
|
| - | 557 | <primary>TLB</primary> |
|
| - | 558 | ||
| - | 559 | <secondary>TLB shootdown</secondary> |
|
| - | 560 | </indexterm> |
|
| - | 561 | ||
| - | 562 | <title>TLB consistency. TLB shootdown algorithm.</title> |
|
| - | 563 | ||
| - | 564 | <para>Operating system is responsible for keeping TLB consistent by |
|
| - | 565 | invalidating the contents of TLB, whenever there is some change in |
|
| - | 566 | page tables. Those changes may occur when page or group of pages were |
|
| - | 567 | unmapped, mapping is changed or system switching active address space |
|
| - | 568 | to schedule a new system task. Moreover, this invalidation operation |
|
| - | 569 | must be done an all system CPUs because each CPU has its own |
|
| - | 570 | independent TLB cache. Thus maintaining TLB consistency on SMP |
|
| - | 571 | configuration as not as trivial task as it looks on the first glance. |
|
| - | 572 | Naive solution would assume that is the CPU which wants to invalidate |
|
| - | 573 | TLB will invalidate TLB caches on other CPUs. It is not possible on |
|
| - | 574 | the most of the architectures, because of the simple fact - flushing |
|
| - | 575 | TLB is allowed only on the local CPU and there is no possibility to |
|
| - | 576 | access other CPUs' TLB caches, thus invalidate TLB remotely.</para> |
|
| - | 577 | ||
| - | 578 | <para>Technique of remote invalidation of TLB entries is called "TLB |
|
| - | 579 | shootdown". HelenOS uses a variation of the algorithm described by D. |
|
| - | 580 | Black et al., "Translation Lookaside Buffer Consistency: A Software |
|
| - | 581 | Approach," Proc. Third Int'l Conf. Architectural Support for |
|
| - | 582 | Programming Languages and Operating Systems, 1989, pp. 113-122. <xref |
|
| - | 583 | linkend="Black89" /></para> |
|
| - | 584 | ||
| - | 585 | <para>As the situation demands, you will want partitial invalidation |
|
| - | 586 | of TLB caches. In case of simple memory mapping change it is necessary |
|
| - | 587 | to invalidate only one or more adjacent pages. In case if the |
|
| - | 588 | architecture is aware of ASIDs, when kernel needs to dump some ASID to |
|
| - | 589 | use by another task, it invalidates only entries from this particular |
|
| - | 590 | address space. Final option of the TLB invalidation is the complete |
|
| - | 591 | TLB cache invalidation, which is the operation that flushes all |
|
| - | 592 | entries in TLB.</para> |
|
| 543 | 593 | ||
| 544 | <para>At the moment HelenOS does not support swapping.</para> |
594 | <para>TLB shootdown is performed in two phases.</para> |
| 545 | 595 | ||
| - | 596 | <formalpara> |
|
| - | 597 | <title>Phase 1.</title> |
|
| - | 598 | ||
| 546 | <para>- pouzivame vypadky stranky k alokaci ramcu on-demand v ramci |
599 | <para>First, initiator locks a global TLB spinlock, then request is |
| - | 600 | being put to the local request cache of every other CPU in the |
|
| - | 601 | system protected by its spinlock. In case the cache is full, all |
|
| 547 | as_area - na architekturach, ktere to podporuji, podporujeme non-exec |
602 | requests in the cache are replaced by one request, indicating global |
| - | 603 | TLB flush. Then the initiator thread sends an IPI message indicating |
|
| - | 604 | the TLB shootdown request to the rest of the CPUs and waits actively |
|
| - | 605 | until all CPUs confirm TLB invalidating action execution by setting |
|
| - | 606 | up a special flag. After setting this flag this thread is blocked on |
|
| - | 607 | the TLB spinlock, held by the initiator.</para> |
|
| - | 608 | </formalpara> |
|
| - | 609 | ||
| - | 610 | <formalpara> |
|
| - | 611 | <title>Phase 2.</title> |
|
| - | 612 | ||
| - | 613 | <para>All CPUs are waiting on the TLB spinlock to execute TLB |
|
| - | 614 | invalidation action and have indicated their intention to the |
|
| - | 615 | initiator. Initiator continues, cleaning up its TLB and releasing |
|
| - | 616 | the global TLB spinlock. After this all other CPUs gain and |
|
| - | 617 | immidiately release TLB spinlock and perform TLB invalidation |
|
| 548 | stranky</para> |
618 | actions.</para> |
| - | 619 | </formalpara> |
|
| - | 620 | </section> |
|
| 549 | </section> |
621 | </section> |
| 550 | </section> |
622 | </section> |
| 551 | </chapter> |
623 | </chapter> |
| 552 | 624 | ||