44,6 → 44,10 |
</section> |
|
<section id="frame_allocator"> |
<indexterm> |
<primary>frame allocator</primary> |
</indexterm> |
|
<title>Frame allocator</title> |
|
<para>The frame allocator satisfies kernel requests to allocate |
83,6 → 87,10 |
</section> |
|
<section id="buddy_allocator"> |
<indexterm> |
<primary>buddy system</primary> |
</indexterm> |
|
<title>Buddy allocator</title> |
|
<para>In the buddy system, the memory is broken down into power-of-two |
157,6 → 165,10 |
</section> |
|
<section id="slab"> |
<indexterm> |
<primary>slab allocator</primary> |
</indexterm> |
|
<title>Slab allocator</title> |
|
<para>The majority of memory allocation requests in the kernel is for |
174,6 → 186,12 |
allocated object match, the slab allocator significantly reduces |
internal fragmentation.</para> |
|
<indexterm> |
<primary>slab allocator</primary> |
|
<secondary>slab cache</secondary> |
</indexterm> |
|
<para>Slabs of one object type are organized in a structure called slab |
cache. There are ususally more slabs in the slab cache, depending on |
previous allocations. If the the slab cache runs out of available slabs, |
184,12 → 202,22 |
per-processor magazine cache are stored in a global list of full |
magazines.</para> |
|
<indexterm> |
<primary>slab allocator</primary> |
|
<secondary>magazine</secondary> |
</indexterm> |
|
<para>Each object begins its life in a slab. When it is allocated from |
there, the slab allocator calls a constructor that is registered in the |
respective slab cache. The constructor initializes and brings the object |
into a known state. The object is then used by the user. When the user |
later frees the object, the slab allocator puts it into a processor |
private magazine cache, from where it can be precedently allocated |
private <indexterm> |
<primary>slab allocator</primary> |
|
<secondary>magazine</secondary> |
</indexterm>magazine cache, from where it can be precedently allocated |
again. Note that allocations satisfied from a magazine are already |
initialized by the constructor. When both of the processor cached |
magazines get full, the allocator will move one of the magazines to the |
206,7 → 234,8 |
putting them back into slabs. When a slab contanins no allocated object, |
it is immediately freed.</para> |
|
<para><figure> |
<para> |
<figure> |
<mediaobject id="slab_alloc"> |
<imageobject role="html"> |
<imagedata fileref="images/slab_alloc.png" format="PNG" /> |
214,17 → 243,16 |
</mediaobject> |
|
<title>Slab allocator scheme.</title> |
</figure></para> |
</figure> |
</para> |
|
<section> |
<title>Implementation</title> |
|
<para>The slab allocator is closely modelled after OpenSolaris slab |
allocator by Jeff Bonwick and Jonathan Adams with the following |
exceptions:<itemizedlist> |
<listitem> |
empty slabs are immediately deallocated and |
</listitem> |
allocator by Jeff Bonwick and Jonathan Adams <xref |
linkend="Bonwick01" /> with the following exceptions:<itemizedlist> |
<listitem>empty slabs are immediately deallocated and</listitem> |
|
<listitem> |
<para>empty magazines are deallocated when not needed.</para> |
231,13 → 259,11 |
</listitem> |
</itemizedlist>The following features are not currently supported |
but would be easy to do: <itemizedlist> |
<listitem> |
cache coloring and |
</listitem> |
<listitem>cache coloring and</listitem> |
|
<listitem> |
dynamic magazine grow (different magazine sizes are already supported, but the allocation strategy would need to be adjusted). |
</listitem> |
<listitem>dynamic magazine grow (different magazine sizes are |
already supported, but the allocation strategy would need to be |
adjusted).</listitem> |
</itemizedlist></para> |
|
<section> |
322,12 → 348,6 |
</itemizedlist></para> |
|
<para><!-- |
|
TLB shootdown ASID/ASID:PAGE/ALL. |
TLB shootdown requests can come in asynchroniously |
so there is a cache of TLB shootdown requests. Upon cache overflow TLB shootdown ALL is executed |
|
|
<para> |
Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc). |
Special address space area type - device - prohibits shrink/extend syscalls to call on it. |
338,30 → 358,15 |
</section> |
|
<section> |
<title>Paging</title> |
<title>Address spaces</title> |
|
<para>Virtual memory is usually using paged memory model, where virtual |
memory address space is divided into the <emphasis>pages</emphasis> |
(usually having size 4096 bytes) and physical memory is divided into the |
frames (same sized as a page, of course). Each page may be mapped to |
some frame and then, upon memory access to the virtual address, CPU |
performs <emphasis>address translation</emphasis> during the instruction |
execution. Non-existing mapping generates page fault exception, calling |
kernel exception handler, thus allowing kernel to manipulate rules of |
memory access. Information for pages mapping is stored by kernel in the |
<link linkend="page_tables">page tables</link></para> |
<section> |
<indexterm> |
<primary>address space</primary> |
|
<para>The majority of the architectures use multi-level page tables, |
which means need to access physical memory several times before getting |
physical address. This fact would make serios performance overhead in |
virtual memory management. To avoid this <link linkend="tlb">Traslation |
Lookaside Buffer (TLB)</link> is used.</para> |
</section> |
<secondary>area</secondary> |
</indexterm> |
|
<section> |
<title>Address spaces</title> |
|
<section> |
<title>Address space areas</title> |
|
<para>Each address space consists of mutually disjunctive continuous |
370,37 → 375,19 |
|
<para>Address space area , that define behaviour and permissions on |
the particular area. <itemizedlist> |
<listitem> |
<listitem><emphasis>AS_AREA_READ</emphasis> flag indicates reading |
permission.</listitem> |
|
<listitem><emphasis>AS_AREA_WRITE</emphasis> flag indicates |
writing permission.</listitem> |
|
<emphasis>AS_AREA_READ</emphasis> |
<listitem><emphasis>AS_AREA_EXEC</emphasis> flag indicates code |
execution permission. Some architectures do not support execution |
persmission restriction. In this case this flag has no |
effect.</listitem> |
|
flag indicates reading permission. |
</listitem> |
|
<listitem> |
|
|
<emphasis>AS_AREA_WRITE</emphasis> |
|
flag indicates writing permission. |
</listitem> |
|
<listitem> |
|
|
<emphasis>AS_AREA_EXEC</emphasis> |
|
flag indicates code execution permission. Some architectures do not support execution persmission restriction. In this case this flag has no effect. |
</listitem> |
|
<listitem> |
|
|
<emphasis>AS_AREA_DEVICE</emphasis> |
|
marks area as mapped to the device memory. |
</listitem> |
<listitem><emphasis>AS_AREA_DEVICE</emphasis> marks area as mapped |
to the device memory.</listitem> |
</itemizedlist></para> |
|
<para>Kernel provides possibility tasks create/expand/shrink/share its |
408,6 → 395,12 |
</section> |
|
<section> |
<indexterm> |
<primary>address space</primary> |
|
<secondary>ASID</secondary> |
</indexterm> |
|
<title>Address Space ID (ASID)</title> |
|
<para>When switching to the different task, kernel also require to |
434,19 → 427,49 |
algoritm is used, which takes ASID from inactive task and assigns it |
to the active task.</para> |
|
<para><classname>ASID stealing algoritm here.</classname></para> |
<indexterm> |
<primary>address space</primary> |
|
<secondary>ASID stealing</secondary> |
</indexterm> |
|
<para> |
<classname>ASID stealing algoritm here.</classname> |
</para> |
</section> |
</section> |
|
<section> |
<section id="paging"> |
<title>Virtual address translation</title> |
|
<section id="page_tables"> |
<title>Page tables</title> |
<section> |
<title>Introduction</title> |
|
<para>Virtual memory is usually using paged memory model, where |
virtual memory address space is divided into the |
<emphasis>pages</emphasis> (usually having size 4096 bytes) and |
physical memory is divided into the frames (same sized as a page, of |
course). Each page may be mapped to some frame and then, upon memory |
access to the virtual address, CPU performs <emphasis>address |
translation</emphasis> during the instruction execution. Non-existing |
mapping generates page fault exception, calling kernel exception |
handler, thus allowing kernel to manipulate rules of memory access. |
Information for pages mapping is stored by kernel in the <link |
linkend="page_tables">page tables</link></para> |
|
<indexterm> |
<primary>page tables</primary> |
</indexterm> |
|
<para>The majority of the architectures use multi-level page tables, |
which means need to access physical memory several times before |
getting physical address. This fact would make serios performance |
overhead in virtual memory management. To avoid this <link |
linkend="tlb">Traslation Lookaside Buffer (TLB)</link> is used.</para> |
|
<para>HelenOS kernel has two different approaches to the paging |
implementation: <emphasis>4 level page tables</emphasis> and |
<emphasis>global hash tables</emphasis>, which are accessible via |
<emphasis>global hash table</emphasis>, which are accessible via |
generic paging abstraction layer. Such different functionality was |
caused by the major architectural differences between supported |
platforms. This abstraction is implemented with help of the global |
456,40 → 479,72 |
functions, declared in |
<emphasis>page_mapping_operations</emphasis></para> |
|
<formalpara> |
<title>4-level page tables</title> |
<para>Thanks to the abstract paging interface, there was a place left |
for more paging implementations (besides already implemented |
hieararchical page tables and hash table), for example B-Tree based |
page tables.</para> |
</section> |
|
<para>4-level page tables are the generalization of the hardware |
capabilities of several architectures.<itemizedlist> |
<listitem> |
ia32 uses 2-level page tables, with full hardware support. |
</listitem> |
<section id="page_tables"> |
<indexterm> |
<primary>page tables</primary> |
|
<listitem> |
amd64 uses 4-level page tables, also coming with full hardware support. |
</listitem> |
<secondary>hierarchical</secondary> |
</indexterm> |
|
<listitem> |
mips and ppc32 have 2-level tables, software simulated support. |
</listitem> |
<title>Hierarchical 4-level page tables</title> |
|
<para>Hierarchical 4-level page tables are the generalization of the |
hardware capabilities of most architectures. Each address space has |
its own page tables.<itemizedlist> |
<listitem>ia32 uses 2-level page tables, with full hardware |
support.</listitem> |
|
<listitem>amd64 uses 4-level page tables, also coming with full |
hardware support.</listitem> |
|
<listitem>mips and ppc32 have 2-level tables, software simulated |
support.</listitem> |
</itemizedlist></para> |
</formalpara> |
</section> |
|
<formalpara> |
<title>Global hash tables</title> |
<section> |
<indexterm> |
<primary>page tables</primary> |
|
<secondary>hashing</secondary> |
</indexterm> |
|
<title>Global hash table</title> |
|
<para>Implementation of the global hash table was encouraged by the |
ia64 architecture support. One of the major differences between global |
hash table and hierarchical tables is that global hash table exists |
only once in the system and the hierarchical tables are maintained per |
address space.</para> |
|
<para>Thus, hash table contains information about all address spaces |
mappings in the system, so, the hash of an entry must contain |
information of both address space pointer or id and the virtual |
address of the page. Generic hash table implementation assumes that |
the addresses of the pointers to the address spaces are likely to be |
on the close addresses, so it uses least significant bits for hash; |
also it assumes that the virtual page addresses have roughly the same |
probability of occurring, so the least significant bits of VPN compose |
the hash index.</para> |
|
<para>- global page hash table: existuje jen jedna v celem systemu |
(vyuziva ji ia64), pozn. ia64 ma zatim vypnuty VHPT. Pouziva se |
genericke hash table s oddelenymi collision chains. ASID support is |
required to use global hash tables.</para> |
</formalpara> |
|
<para>Thanks to the abstract paging interface, there is possibility |
left have more paging implementations, for example B-Tree page |
tables.</para> |
</section> |
</section> |
|
<section id="tlb"> |
<indexterm> |
<primary>TLB</primary> |
</indexterm> |
|
<title>Translation Lookaside buffer</title> |
|
<para>Due to the extensive overhead during the page mapping lookup in |
498,54 → 553,71 |
entries.</para> |
|
<section id="tlb_shootdown"> |
<indexterm> |
<primary>TLB</primary> |
|
<secondary>TLB shootdown</secondary> |
</indexterm> |
|
<title>TLB consistency. TLB shootdown algorithm.</title> |
|
<para>Operating system is responsible for keeping TLB consistent by |
invalidating the contents of TLB, whenever there is some change in |
page tables. Those changes may occur when page or group of pages |
were unmapped, mapping is changed or system switching active address |
space to schedule a new system task (which is a batch unmap of all |
address space mappings). Moreover, this invalidation operation must |
be done an all system CPUs because each CPU has its own independent |
TLB cache. Thus maintaining TLB consistency on SMP configuration as |
not as trivial task as it looks at the first glance. Naive solution |
would assume remote TLB invalidatation, which is not possible on the |
most of the architectures, because of the simple fact - flushing TLB |
is allowed only on the local CPU and there is no possibility to |
access other CPUs' TLB caches.</para> |
page tables. Those changes may occur when page or group of pages were |
unmapped, mapping is changed or system switching active address space |
to schedule a new system task. Moreover, this invalidation operation |
must be done an all system CPUs because each CPU has its own |
independent TLB cache. Thus maintaining TLB consistency on SMP |
configuration as not as trivial task as it looks on the first glance. |
Naive solution would assume that is the CPU which wants to invalidate |
TLB will invalidate TLB caches on other CPUs. It is not possible on |
the most of the architectures, because of the simple fact - flushing |
TLB is allowed only on the local CPU and there is no possibility to |
access other CPUs' TLB caches, thus invalidate TLB remotely.</para> |
|
<para>Technique of remote invalidation of TLB entries is called "TLB |
shootdown". HelenOS uses a variation of the algorithm described by |
D. Black et al., "Translation Lookaside Buffer Consistency: A |
Software Approach," Proc. Third Int'l Conf. Architectural Support |
for Programming Languages and Operating Systems, 1989, pp. |
113-122.</para> |
shootdown". HelenOS uses a variation of the algorithm described by D. |
Black et al., "Translation Lookaside Buffer Consistency: A Software |
Approach," Proc. Third Int'l Conf. Architectural Support for |
Programming Languages and Operating Systems, 1989, pp. 113-122. <xref |
linkend="Black89" /></para> |
|
<para>As the situation demands, you will want partitial invalidation |
of TLB caches. In case of simple memory mapping change it is |
necessary to invalidate only one or more adjacent pages. In case if |
the architecture is aware of ASIDs, during the address space |
switching, kernel invalidates only entries from this particular |
of TLB caches. In case of simple memory mapping change it is necessary |
to invalidate only one or more adjacent pages. In case if the |
architecture is aware of ASIDs, when kernel needs to dump some ASID to |
use by another task, it invalidates only entries from this particular |
address space. Final option of the TLB invalidation is the complete |
TLB cache invalidation, which is the operation that flushes all |
entries in TLB.</para> |
|
<para>TLB shootdown is performed in two phases. First, the initiator |
process sends an IPI message indicating the TLB shootdown request to |
the rest of the CPUs. Then, it waits until all CPUs confirm TLB |
invalidating action execution.</para> |
</section> |
</section> |
</section> |
<para>TLB shootdown is performed in two phases.</para> |
|
<section> |
<title>---</title> |
<formalpara> |
<title>Phase 1.</title> |
|
<para>At the moment HelenOS does not support swapping.</para> |
<para>First, initiator locks a global TLB spinlock, then request is |
being put to the local request cache of every other CPU in the |
system protected by its spinlock. In case the cache is full, all |
requests in the cache are replaced by one request, indicating global |
TLB flush. Then the initiator thread sends an IPI message indicating |
the TLB shootdown request to the rest of the CPUs and waits actively |
until all CPUs confirm TLB invalidating action execution by setting |
up a special flag. After setting this flag this thread is blocked on |
the TLB spinlock, held by the initiator.</para> |
</formalpara> |
|
<para>- pouzivame vypadky stranky k alokaci ramcu on-demand v ramci |
as_area - na architekturach, ktere to podporuji, podporujeme non-exec |
stranky</para> |
<formalpara> |
<title>Phase 2.</title> |
|
<para>All CPUs are waiting on the TLB spinlock to execute TLB |
invalidation action and have indicated their intention to the |
initiator. Initiator continues, cleaning up its TLB and releasing |
the global TLB spinlock. After this all other CPUs gain and |
immidiately release TLB spinlock and perform TLB invalidation |
actions.</para> |
</formalpara> |
</section> |
</section> |
</section> |
</chapter> |