365,11 → 365,6 |
|
<para><!-- |
|
TLB shootdown ASID/ASID:PAGE/ALL. |
TLB shootdown requests can come in asynchroniously |
so there is a cache of TLB shootdown requests. Upon cache overflow TLB shootdown ALL is executed |
|
|
<para> |
Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc). |
Special address space area type - device - prohibits shrink/extend syscalls to call on it. |
380,27 → 375,6 |
</section> |
|
<section> |
<title>Paging</title> |
|
<para>Virtual memory is usually using paged memory model, where virtual |
memory address space is divided into the <emphasis>pages</emphasis> |
(usually having size 4096 bytes) and physical memory is divided into the |
frames (same sized as a page, of course). Each page may be mapped to |
some frame and then, upon memory access to the virtual address, CPU |
performs <emphasis>address translation</emphasis> during the instruction |
execution. Non-existing mapping generates page fault exception, calling |
kernel exception handler, thus allowing kernel to manipulate rules of |
memory access. Information for pages mapping is stored by kernel in the |
<link linkend="page_tables">page tables</link></para> |
|
<para>The majority of the architectures use multi-level page tables, |
which means need to access physical memory several times before getting |
physical address. This fact would make serios performance overhead in |
virtual memory management. To avoid this <link linkend="tlb">Traslation |
Lookaside Buffer (TLB)</link> is used.</para> |
</section> |
|
<section> |
<title>Address spaces</title> |
|
<section> |
483,12 → 457,34 |
<section> |
<title>Virtual address translation</title> |
|
<section id="page_tables"> |
<title>Page tables</title> |
<section id="paging"> |
<title>Paging</title> |
|
<section> |
<title>Introduction</title> |
|
<para>Virtual memory is usually using paged memory model, where |
virtual memory address space is divided into the |
<emphasis>pages</emphasis> (usually having size 4096 bytes) and |
physical memory is divided into the frames (same sized as a page, of |
course). Each page may be mapped to some frame and then, upon memory |
access to the virtual address, CPU performs <emphasis>address |
translation</emphasis> during the instruction execution. |
Non-existing mapping generates page fault exception, calling kernel |
exception handler, thus allowing kernel to manipulate rules of |
memory access. Information for pages mapping is stored by kernel in |
the <link linkend="page_tables">page tables</link></para> |
|
<para>The majority of the architectures use multi-level page tables, |
which means need to access physical memory several times before |
getting physical address. This fact would make serios performance |
overhead in virtual memory management. To avoid this <link |
linkend="tlb">Traslation Lookaside Buffer (TLB)</link> is |
used.</para> |
|
<para>HelenOS kernel has two different approaches to the paging |
implementation: <emphasis>4 level page tables</emphasis> and |
<emphasis>global hash tables</emphasis>, which are accessible via |
<emphasis>global hash table</emphasis>, which are accessible via |
generic paging abstraction layer. Such different functionality was |
caused by the major architectural differences between supported |
platforms. This abstraction is implemented with help of the global |
498,11 → 494,18 |
functions, declared in |
<emphasis>page_mapping_operations</emphasis></para> |
|
<formalpara> |
<title>4-level page tables</title> |
<para>Thanks to the abstract paging interface, there was a place |
left for more paging implementations (besides already implemented |
hieararchical page tables and hash table), for example B-Tree based |
page tables.</para> |
</section> |
|
<para>4-level page tables are the generalization of the hardware |
capabilities of several architectures.<itemizedlist> |
<section> |
<title>Hierarchical 4-level page tables</title> |
|
<para>Hierarchical 4-level page tables are the generalization of the |
hardware capabilities of most architectures. Each address space has |
its own page tables.<itemizedlist> |
<listitem> |
ia32 uses 2-level page tables, with full hardware support. |
</listitem> |
515,21 → 518,30 |
mips and ppc32 have 2-level tables, software simulated support. |
</listitem> |
</itemizedlist></para> |
</formalpara> |
</section> |
|
<formalpara> |
<title>Global hash tables</title> |
<section> |
<title>Global hash table</title> |
|
<para>- global page hash table: existuje jen jedna v celem systemu |
(vyuziva ji ia64), pozn. ia64 ma zatim vypnuty VHPT. Pouziva se |
genericke hash table s oddelenymi collision chains. ASID support is |
required to use global hash tables.</para> |
</formalpara> |
<para>Implementation of the global hash table was encouraged by the |
ia64 architecture support. One of the major differences between |
global hash table and hierarchical tables is that global hash table |
exists only once in the system and the hierarchical tables are |
maintained per address space.</para> |
|
<para>Thanks to the abstract paging interface, there is possibility |
left have more paging implementations, for example B-Tree page |
tables.</para> |
<para>Thus, hash table contains information about all address spaces |
mappings in the system, so, the hash of an entry must contain |
information of both address space pointer or id and the virtual |
address of the page. Generic hash table implementation assumes that |
the addresses of the pointers to the address spaces are likely to be |
on the close addresses, so it uses least significant bits for hash; |
also it assumes that the virtual page addresses have roughly the |
same probability of occurring, so the least significant bits of VPN |
compose the hash index.</para> |
|
<para>Collision chains ...</para> |
</section> |
</section> |
|
<section id="tlb"> |
<title>Translation Lookaside buffer</title> |
546,15 → 558,16 |
invalidating the contents of TLB, whenever there is some change in |
page tables. Those changes may occur when page or group of pages |
were unmapped, mapping is changed or system switching active address |
space to schedule a new system task (which is a batch unmap of all |
address space mappings). Moreover, this invalidation operation must |
be done an all system CPUs because each CPU has its own independent |
TLB cache. Thus maintaining TLB consistency on SMP configuration as |
not as trivial task as it looks at the first glance. Naive solution |
would assume remote TLB invalidatation, which is not possible on the |
most of the architectures, because of the simple fact - flushing TLB |
is allowed only on the local CPU and there is no possibility to |
access other CPUs' TLB caches.</para> |
space to schedule a new system task. Moreover, this invalidation |
operation must be done an all system CPUs because each CPU has its |
own independent TLB cache. Thus maintaining TLB consistency on SMP |
configuration as not as trivial task as it looks on the first |
glance. Naive solution would assume that is the CPU which wants to |
invalidate TLB will invalidate TLB caches on other CPUs. It is not |
possible on the most of the architectures, because of the simple |
fact - flushing TLB is allowed only on the local CPU and there is no |
possibility to access other CPUs' TLB caches, thus invalidate TLB |
remotely.</para> |
|
<para>Technique of remote invalidation of TLB entries is called "TLB |
shootdown". HelenOS uses a variation of the algorithm described by |
566,28 → 579,41 |
<para>As the situation demands, you will want partitial invalidation |
of TLB caches. In case of simple memory mapping change it is |
necessary to invalidate only one or more adjacent pages. In case if |
the architecture is aware of ASIDs, during the address space |
switching, kernel invalidates only entries from this particular |
address space. Final option of the TLB invalidation is the complete |
TLB cache invalidation, which is the operation that flushes all |
entries in TLB.</para> |
the architecture is aware of ASIDs, when kernel needs to dump some |
ASID to use by another task, it invalidates only entries from this |
particular address space. Final option of the TLB invalidation is |
the complete TLB cache invalidation, which is the operation that |
flushes all entries in TLB.</para> |
|
<para>TLB shootdown is performed in two phases. First, the initiator |
process sends an IPI message indicating the TLB shootdown request to |
the rest of the CPUs. Then, it waits until all CPUs confirm TLB |
invalidating action execution.</para> |
</section> |
</section> |
</section> |
<para>TLB shootdown is performed in two phases.</para> |
|
<section> |
<title>---</title> |
<formalpara> |
<title>Phase 1.</title> |
|
<para>At the moment HelenOS does not support swapping.</para> |
<para>First, initiator locks a global TLB spinlock, then request |
is being put to the local request cache of every other CPU in the |
system protected by its spinlock. In case the cache is full, all |
requests in the cache are replaced by one request, indicating |
global TLB flush. Then the initiator thread sends an IPI message |
indicating the TLB shootdown request to the rest of the CPUs and |
waits actively until all CPUs confirm TLB invalidating action |
execution by setting up a special flag. After setting this flag |
this thread is blocked on the TLB spinlock, held by the |
initiator.</para> |
</formalpara> |
|
<para>- pouzivame vypadky stranky k alokaci ramcu on-demand v ramci |
as_area - na architekturach, ktere to podporuji, podporujeme non-exec |
stranky</para> |
<formalpara> |
<title>Phase 2.</title> |
|
<para>All CPUs are waiting on the TLB spinlock to execute TLB |
invalidation action and have indicated their intention to the |
initiator. Initiator continues, cleaning up its TLB and releasing |
the global TLB spinlock. After this all other CPUs gain and |
immidiately release TLB spinlock and perform TLB invalidation |
actions.</para> |
</formalpara> |
</section> |
</section> |
</section> |
</section> |
</chapter> |