WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_memory_management.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
11	bondari	2	<chapter id="mm">
		3	<?dbhtml filename="mm.html"?>
9	bondari	4
11	bondari	5	<title>Memory management</title>
9	bondari	6
67	jermar	7	<para>In previous chapters, this book described the scheduling subsystem as
		8	the creator of the impression that threads execute in parallel. The memory
		9	management subsystem, on the other hand, creates the impression that there
		10	is enough physical memory for the kernel and that userspace tasks have the
		11	entire address space only for themselves.</para>
64	jermar	12
26	bondari	13	<section>
64	jermar	14	<title>Physical memory management</title>
		15
		16	<section id="zones_and_frames">
		17	<title>Zones and frames</title>
		18
67	jermar	19	<para>HelenOS represents continuous areas of physical memory in
		20	structures called frame zones (abbreviated as zones). Each zone contains
		21	information about the number of allocated and unallocated physical
		22	memory frames as well as the physical base address of the zone and
		23	number of frames contained in it. A zone also contains an array of frame
		24	structures describing each frame of the zone and, in the last, but not
		25	the least important, front, each zone is equipped with a buddy system
		26	that faciliates effective allocation of power-of-two sized block of
		27	frames.</para>
64	jermar	28
67	jermar	29	<para>This organization of physical memory provides good preconditions
		30	for hot-plugging of more zones. There is also one currently unused zone
		31	attribute: <code>flags</code>. The attribute could be used to give a
		32	special meaning to some zones in the future.</para>
64	jermar	33
67	jermar	34	<para>The zones are linked in a doubly-linked list. This might seem a
		35	bit ineffective because the zone list is walked everytime a frame is
		36	allocated or deallocated. However, this does not represent a significant
		37	performance problem as it is expected that the number of zones will be
		38	rather low. Moreover, most architectures merge all zones into
		39	one.</para>
		40
76	palkovsky	41	<para>Every physical memory frame in a zone, is described by a structure
		42	that contains number of references and other data used by buddy
67	jermar	43	system.</para>
64	jermar	44	</section>
		45
		46	<section id="frame_allocator">
71	bondari	47	<indexterm>
		48	<primary>frame allocator</primary>
		49	</indexterm>
		50
64	jermar	51	<title>Frame allocator</title>
		52
67	jermar	53	<para>The frame allocator satisfies kernel requests to allocate
		54	power-of-two sized blocks of physical memory. Because of zonal
		55	organization of physical memory, the frame allocator is always working
76	palkovsky	56	within a context of a particular frame zone. In order to carry out the
67	jermar	57	allocation requests, the frame allocator is tightly integrated with the
		58	buddy system belonging to the zone. The frame allocator is also
		59	responsible for updating information about the number of free and busy
		60	frames in the zone. <figure>
		61	<mediaobject id="frame_alloc">
77	bondari	62	<imageobject role="eps">
		63	<imagedata fileref="images.vector/frame_alloc.eps" format="EPS" />
		64	</imageobject>
		65
67	jermar	66	<imageobject role="html">
		67	<imagedata fileref="images/frame_alloc.png" format="PNG" />
		68	</imageobject>
64	jermar	69
67	jermar	70	<imageobject role="fop">
		71	<imagedata fileref="images.vector/frame_alloc.svg" format="SVG" />
		72	</imageobject>
		73	</mediaobject>
64	jermar	74
67	jermar	75	<title>Frame allocator scheme.</title>
		76	</figure></para>
64	jermar	77
		78	<formalpara>
		79	<title>Allocation / deallocation</title>
		80
82	jermar	81	<para>Upon allocation request via function <code>frame_alloc()</code>,
67	jermar	82	the frame allocator first tries to find a zone that can satisfy the
		83	request (i.e. has the required amount of free frames). Once a suitable
		84	zone is found, the frame allocator uses the buddy allocator on the
		85	zone's buddy system to perform the allocation. During deallocation,
82	jermar	86	which is triggered by a call to <code>frame_free()</code>, the frame
67	jermar	87	allocator looks up the respective zone that contains the frame being
		88	deallocated. Afterwards, it calls the buddy allocator again, this time
		89	to take care of deallocation within the zone's buddy system.</para>
64	jermar	90	</formalpara>
		91	</section>
		92
		93	<section id="buddy_allocator">
71	bondari	94	<indexterm>
		95	<primary>buddy system</primary>
		96	</indexterm>
		97
64	jermar	98	<title>Buddy allocator</title>
		99
67	jermar	100	<para>In the buddy system, the memory is broken down into power-of-two
		101	sized naturally aligned blocks. These blocks are organized in an array
82	jermar	102	of lists, in which the list with index <emphasis>i</emphasis> contains
		103	all unallocated blocks of size
		104	<emphasis>2<superscript>i</superscript></emphasis>. The index
		105	<emphasis>i</emphasis> is called the order of block. Should there be two
		106	adjacent equally sized blocks in the list <emphasis>i</emphasis> (i.e.
		107	buddies), the buddy allocator would coalesce them and put the resulting
		108	block in list <emphasis>i + 1</emphasis>, provided that the resulting
		109	block would be naturally aligned. Similarily, when the allocator is
		110	asked to allocate a block of size
		111	<emphasis>2<superscript>i</superscript></emphasis>, it first tries to
		112	satisfy the request from the list with index <emphasis>i</emphasis>. If
		113	the request cannot be satisfied (i.e. the list <emphasis>i</emphasis> is
		114	empty), the buddy allocator will try to allocate and split a larger
		115	block from the list with index <emphasis>i + 1</emphasis>. Both of these
		116	algorithms are recursive. The recursion ends either when there are no
		117	blocks to coalesce in the former case or when there are no blocks that
		118	can be split in the latter case.</para>
64	jermar	119
67	jermar	120	<para>This approach greatly reduces external fragmentation of memory and
		121	helps in allocating bigger continuous blocks of memory aligned to their
		122	size. On the other hand, the buddy allocator suffers increased internal
		123	fragmentation of memory and is not suitable for general kernel
		124	allocations. This purpose is better addressed by the <link
		125	linkend="slab">slab allocator</link>.<figure>
64	jermar	126	<mediaobject id="buddy_alloc">
77	bondari	127	<imageobject role="eps">
		128	<imagedata fileref="images.vector/buddy_alloc.eps" format="EPS" />
		129	</imageobject>
		130
64	jermar	131	<imageobject role="html">
		132	<imagedata fileref="images/buddy_alloc.png" format="PNG" />
		133	</imageobject>
		134
		135	<imageobject role="fop">
		136	<imagedata fileref="images.vector/buddy_alloc.svg" format="SVG" />
		137	</imageobject>
		138	</mediaobject>
		139
		140	<title>Buddy system scheme.</title>
67	jermar	141	</figure></para>
64	jermar	142
		143	<section>
		144	<title>Implementation</title>
		145
		146	<para>The buddy allocator is, in fact, an abstract framework wich can
		147	be easily specialized to serve one particular task. It knows nothing
		148	about the nature of memory it helps to allocate. In order to beat the
		149	lack of this knowledge, the buddy allocator exports an interface that
		150	each of its clients is required to implement. When supplied with an
		151	implementation of this interface, the buddy allocator can use
		152	specialized external functions to find a buddy for a block, split and
		153	coalesce blocks, manipulate block order and mark blocks busy or
67	jermar	154	available.</para>
64	jermar	155
		156	<formalpara>
		157	<title>Data organization</title>
		158
		159	<para>Each entity allocable by the buddy allocator is required to
		160	contain space for storing block order number and a link variable
		161	used to interconnect blocks within the same order.</para>
		162
		163	<para>Whatever entities are allocated by the buddy allocator, the
		164	first entity within a block is used to represent the entire block.
		165	The first entity keeps the order of the whole block. Other entities
		166	within the block are assigned the magic value
		167	<constant>BUDDY_INNER_BLOCK</constant>. This is especially important
		168	for effective identification of buddies in a one-dimensional array
		169	because the entity that represents a potential buddy cannot be
		170	associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it
		171	is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is
		172	not a buddy).</para>
		173	</formalpara>
		174	</section>
		175	</section>
		176
		177	<section id="slab">
71	bondari	178	<indexterm>
		179	<primary>slab allocator</primary>
		180	</indexterm>
		181
70	jermar	182	<title>Slab allocator</title>
64	jermar	183
67	jermar	184	<para>The majority of memory allocation requests in the kernel is for
		185	small, frequently used data structures. The basic idea behind the slab
		186	allocator is that commonly used objects are preallocated in continuous
		187	areas of physical memory called slabs<footnote>
		188	<para>Slabs are in fact blocks of physical memory frames allocated
		189	from the frame allocator.</para>
		190	</footnote>. Whenever an object is to be allocated, the slab allocator
		191	returns the first available item from a suitable slab corresponding to
		192	the object type<footnote>
		193	<para>The mechanism is rather more complicated, see the next
		194	paragraph.</para>
		195	</footnote>. Due to the fact that the sizes of the requested and
		196	allocated object match, the slab allocator significantly reduces
		197	internal fragmentation.</para>
64	jermar	198
71	bondari	199	<indexterm>
		200	<primary>slab allocator</primary>
		201
72	bondari	202	<secondary>- slab cache</secondary>
71	bondari	203	</indexterm>
		204
67	jermar	205	<para>Slabs of one object type are organized in a structure called slab
		206	cache. There are ususally more slabs in the slab cache, depending on
		207	previous allocations. If the the slab cache runs out of available slabs,
		208	new slabs are allocated. In order to exploit parallelism and to avoid
		209	locking of shared spinlocks, slab caches can have variants of
		210	processor-private slabs called magazines. On each processor, there is a
		211	two-magazine cache. Full magazines that are not part of any
		212	per-processor magazine cache are stored in a global list of full
		213	magazines.</para>
64	jermar	214
71	bondari	215	<indexterm>
		216	<primary>slab allocator</primary>
		217
72	bondari	218	<secondary>- magazine</secondary>
71	bondari	219	</indexterm>
		220
67	jermar	221	<para>Each object begins its life in a slab. When it is allocated from
		222	there, the slab allocator calls a constructor that is registered in the
		223	respective slab cache. The constructor initializes and brings the object
		224	into a known state. The object is then used by the user. When the user
		225	later frees the object, the slab allocator puts it into a processor
71	bondari	226	private <indexterm>
		227	<primary>slab allocator</primary>
		228
72	bondari	229	<secondary>- magazine</secondary>
71	bondari	230	</indexterm>magazine cache, from where it can be precedently allocated
67	jermar	231	again. Note that allocations satisfied from a magazine are already
		232	initialized by the constructor. When both of the processor cached
		233	magazines get full, the allocator will move one of the magazines to the
		234	list of full magazines. Similarily, when allocating from an empty
		235	processor magazine cache, the kernel will reload only one magazine from
		236	the list of full magazines. In other words, the slab allocator tries to
		237	keep the processor magazine cache only half-full in order to prevent
		238	thrashing when allocations and deallocations interleave on magazine
70	jermar	239	boundaries. The advantage of this setup is that during most of the
		240	allocations, no global spinlock needs to be held.</para>
65	jermar	241
67	jermar	242	<para>Should HelenOS run short of memory, it would start deallocating
		243	objects from magazines, calling slab cache destructor on them and
		244	putting them back into slabs. When a slab contanins no allocated object,
		245	it is immediately freed.</para>
66	bondari	246
71	bondari	247	<para>
		248	<figure>
64	jermar	249	<mediaobject id="slab_alloc">
77	bondari	250	<imageobject role="eps">
		251	<imagedata fileref="images.vector/slab_alloc.eps" format="EPS" />
		252	</imageobject>
		253
64	jermar	254	<imageobject role="html">
		255	<imagedata fileref="images/slab_alloc.png" format="PNG" />
		256	</imageobject>
77	bondari	257
		258	<imageobject role="fop">
		259	<imagedata fileref="images.vector/slab_alloc.svg" format="SVG" />
		260	</imageobject>
64	jermar	261	</mediaobject>
		262
		263	<title>Slab allocator scheme.</title>
71	bondari	264	</figure>
		265	</para>
64	jermar	266
67	jermar	267	<section>
		268	<title>Implementation</title>
		269
		270	<para>The slab allocator is closely modelled after OpenSolaris slab
71	bondari	271	allocator by Jeff Bonwick and Jonathan Adams <xref
		272	linkend="Bonwick01" /> with the following exceptions:<itemizedlist>
82	jermar	273	<listitem>
		274	<para>empty slabs are immediately deallocated and</para>
		275	</listitem>
66	bondari	276
		277	<listitem>
70	jermar	278	<para>empty magazines are deallocated when not needed.</para>
66	bondari	279	</listitem>
70	jermar	280	</itemizedlist>The following features are not currently supported
		281	but would be easy to do: <itemizedlist>
71	bondari	282	<listitem>cache coloring and</listitem>
64	jermar	283
71	bondari	284	<listitem>dynamic magazine grow (different magazine sizes are
		285	already supported, but the allocation strategy would need to be
		286	adjusted).</listitem>
64	jermar	287	</itemizedlist></para>
		288
		289	<section>
		290	<title>Allocation/deallocation</title>
		291
70	jermar	292	<para>The following two paragraphs summarize and complete the
		293	description of the slab allocator operation (i.e.
82	jermar	294	<code>slab_alloc()</code> and <code>slab_free()</code>
		295	functions).</para>
64	jermar	296
		297	<formalpara>
		298	<title>Allocation</title>
		299
70	jermar	300	<para><emphasis>Step 1.</emphasis> When an allocation request
		301	comes, the slab allocator checks availability of memory in the
		302	current magazine of the local processor magazine cache. If the
76	palkovsky	303	available memory is there, the allocator just pops the object from
		304	magazine and returns it.</para>
64	jermar	305
70	jermar	306	<para><emphasis>Step 2.</emphasis> If the current magazine in the
		307	processor magazine cache is empty, the allocator will attempt to
		308	swap it with the last magazine from the cache and return to the
		309	first step. If also the last magazine is empty, the algorithm will
		310	fall through to Step 3.</para>
64	jermar	311
70	jermar	312	<para><emphasis>Step 3.</emphasis> Now the allocator is in the
		313	situation when both magazines in the processor magazine cache are
		314	empty. The allocator reloads one magazine from the shared list of
		315	full magazines. If the reload is successful (i.e. there are full
		316	magazines in the list), the algorithm continues with Step
		317	1.</para>
64	jermar	318
70	jermar	319	<para><emphasis>Step 4.</emphasis> In this fail-safe step, an
		320	object is allocated from the conventional slab layer and a pointer
		321	to it is returned. If also the last magazine is full,</para>
64	jermar	322	</formalpara>
		323
		324	<formalpara>
		325	<title>Deallocation</title>
		326
70	jermar	327	<para><emphasis>Step 1.</emphasis> During a deallocation request,
		328	the slab allocator checks if the current magazine of the local
76	palkovsky	329	processor magazine cache is not full. If it is, the pointer to the
70	jermar	330	objects is just pushed into the magazine and the algorithm
		331	returns.</para>
64	jermar	332
70	jermar	333	<para><emphasis>Step 2.</emphasis> If the current magazine is
		334	full, the allocator will attempt to swap it with the last magazine
		335	from the cache and return to the first step. If also the last
		336	magazine is empty, the algorithm will fall through to Step
		337	3.</para>
64	jermar	338
70	jermar	339	<para><emphasis>Step 3.</emphasis> Now the allocator is in the
		340	situation when both magazines in the processor magazine cache are
76	palkovsky	341	full. The allocator tries to allocate a new empty magazine and
		342	flush one of the full magazines to the shared list of full
		343	magazines. If it is successfull, the algoritm continues with Step
		344	1.</para>
		345
		346	<para><emphasis>Step 4. </emphasis>In case of low memory condition
		347	when the allocation of empty magazine fails, the object is moved
		348	directly into slab. In the worst case object deallocation does not
		349	need to allocate any additional memory.</para>
64	jermar	350	</formalpara>
		351	</section>
		352	</section>
		353	</section>
		354	</section>
		355
		356	<section>
67	jermar	357	<title>Virtual memory management</title>
9	bondari	358
82	jermar	359	<para>Virtual memory is essential for an operating system because it makes
		360	several things possible. First, it helps to isolate tasks from each other
		361	by encapsulating them in their private address spaces. Second, virtual
		362	memory can give tasks the feeling of more memory available than is
		363	actually possible. And third, by using virtual memory, there might be
		364	multiple copies of the same program, linked to the same addresses, running
		365	in the system. There are at least two known mechanisms for implementing
		366	virtual memory: segmentation and paging. Even though some processor
		367	architectures supported by HelenOS<footnote>
		368	<para>ia32 has full-fledged segmentation.</para>
		369	</footnote> provide both mechanism, the kernel makes use solely of
		370	paging.</para>
67	jermar	371
82	jermar	372	<section id="paging">
		373	<title>VAT subsystem</title>
67	jermar	374
82	jermar	375	<para>In a paged virtual memory, the entire virtual address space is
		376	divided into small power-of-two sized naturally aligned blocks called
		377	pages. The processor implements a translation mechanism, that allows the
		378	operating system to manage mappings between set of pages and set of
		379	indentically sized and identically aligned pieces of physical memory
		380	called frames. In a result, references to continuous virtual memory
		381	areas don't necessarily need to reference continuos area of physical
		382	memory. Supported page sizes usually range from several kilobytes to
		383	several megabytes. Each page that takes part in the mapping is
		384	associated with certain attributes that further desribe the mapping
		385	(e.g. access rights, dirty and accessed bits and present bit).</para>
67	jermar	386
82	jermar	387	<para>When the processor accesses a page that is not present (i.e. its
		388	present bit is not set), the operating system is notified through a
		389	special exception called page fault. It is then up to the operating
		390	system to service the page fault. In HelenOS, some page faults are fatal
		391	and result in either task termination or, in the worse case, kernel
		392	panic<footnote>
		393	<para>Such a condition would be either caused by a hardware failure
		394	or a bug in the kernel.</para>
		395	</footnote>, while other page faults are used to load memory on demand
		396	or to notify the kernel about certain events.</para>
67	jermar	397
82	jermar	398	<indexterm>
		399	<primary>page tables</primary>
		400	</indexterm>
67	jermar	401
82	jermar	402	<para>The set of all page mappings is stored in a memory structure
		403	called page tables. Some architectures have no hardware support for page
		404	tables<footnote>
		405	<para>On mips32, TLB-only model is used and the operating system is
		406	responsible for managing software defined page tables.</para>
		407	</footnote> while other processor architectures<footnote>
		408	<para>Like amd64 and ia32.</para>
		409	</footnote> understand the whole memory format thereof. Despite all
		410	the possible differences in page table formats, the HelenOS VAT
		411	subsystem<footnote>
		412	<para>Virtual Address Translation subsystem.</para>
		413	</footnote> unifies the page table operations under one programming
		414	interface. For all parts of the kernel, three basic functions are
		415	provided:</para>
		416
		417	<itemizedlist>
		418	<listitem>
		419	<para><code>page_mapping_insert()</code>,</para>
		420	</listitem>
		421
		422	<listitem>
		423	<para><code>page_mapping_find()</code> and</para>
		424	</listitem>
		425
		426	<listitem>
		427	<para><code>page_mapping_remove()</code>.</para>
		428	</listitem>
		429	</itemizedlist>
		430
		431	<para>The <code>page_mapping_insert()</code> function is used to
		432	introduce a mapping for one virtual memory page belonging to a
		433	particular address space into the page tables. Once the mapping is in
		434	the page tables, it can be searched by <code>page_mapping_find()</code>
		435	and removed by <code>page_mapping_remove()</code>. All of these
		436	functions internally select the page table mechanism specific functions
		437	that carry out the self operation.</para>
		438
		439	<para>There are currently two supported mechanisms: generic 4-level
		440	hierarchical page tables and global page hash table. Both of the
		441	mechanisms are generic as they cover several hardware platforms. For
		442	instance, the 4-level hierarchical page table mechanism is used by
		443	amd64, ia32, mips32 and ppc32, respectively. These architectures have
		444	the following page table format: 4-level, 2-level, TLB-only and hardware
		445	hash table, respectively. On the other hand, the global page hash table
		446	is used on ia64 that can be TLB-only or use a hardware hash table.
		447	Although only two mechanisms are currently implemented, other mechanisms
		448	(e.g. B+tree) can be easily added.</para>
		449
		450	<section id="page_tables">
		451	<indexterm>
		452	<primary>page tables</primary>
		453
		454	<secondary>- hierarchical</secondary>
		455	</indexterm>
		456
		457	<title>Hierarchical 4-level page tables</title>
		458
		459	<para>Hierarchical 4-level page tables are generalization of the
		460	frequently used hierarchical model of page tables. In this mechanism,
		461	each address space has its own page tables. To avoid confusion in
		462	terminology used by hardware vendors, in HelenOS, the root level page
		463	table is called PTL0, the two middle levels are called PTL1 and PTL2,
		464	and, finally, the leaf level is called PTL3. All architectures using
		465	this mechanism are required to use PTL0 and PTL3. However, the middle
		466	levels can be left out, depending on the hardware hierachy or
		467	structure of software-only page tables. The genericity is achieved
		468	through a set of macros that define transitions from one level to
		469	another. Unused levels are optimised out by the compiler.</para>
		470	</section>
		471
		472	<section>
		473	<indexterm>
		474	<primary>page tables</primary>
		475
		476	<secondary>- hashing</secondary>
		477	</indexterm>
		478
		479	<title>Global page hash table</title>
		480
		481	<para>Implementation of the global page hash table was encouraged by
		482	64-bit architectures that can have rather sparse address spaces. The
		483	hash table contains valid mappings only. Each entry of the hash table
		484	contains an address space pointer, virtual memory page number (VPN),
		485	physical memory frame number (PFN) and a set of flags. The pair of the
		486	address space pointer and the virtual memory page number is used as a
		487	key for the hash table. One of the major differences between the
		488	global page hash table and hierarchical 4-level page tables is that
		489	there is only a single global page hash table in the system while
		490	hierarchical page tables exist per address space. Thus, the global
		491	page hash table contains information about mappings of all address
		492	spaces in the system. </para>
		493
		494	<para>The global page hash table mechanism uses the generic hash table
		495	type as described in the chapter about <link linkend="hashtables">data
		496	structures</link> earlier in this book.</para>
		497	</section>
67	jermar	498	</section>
82	jermar	499	</section>
67	jermar	500
82	jermar	501	<section id="tlb">
		502	<indexterm>
		503	<primary>TLB</primary>
		504	</indexterm>
		505
		506	<title>Translation Lookaside buffer</title>
		507
		508	<para>Due to the extensive overhead during the page mapping lookup in the
		509	page tables, all architectures has fast assotiative cache memory built-in
		510	CPU. This memory called TLB stores recently used page table
		511	entries.</para>
		512
		513	<section id="tlb_shootdown">
		514	<indexterm>
		515	<primary>TLB</primary>
		516
		517	<secondary>- TLB shootdown</secondary>
		518	</indexterm>
		519
		520	<title>TLB consistency. TLB shootdown algorithm.</title>
		521
		522	<para>Operating system is responsible for keeping TLB consistent by
		523	invalidating the contents of TLB, whenever there is some change in page
		524	tables. Those changes may occur when page or group of pages were
		525	unmapped, mapping is changed or system switching active address space to
		526	schedule a new system task. Moreover, this invalidation operation must
		527	be done an all system CPUs because each CPU has its own independent TLB
		528	cache. Thus maintaining TLB consistency on SMP configuration as not as
		529	trivial task as it looks on the first glance. Naive solution would
		530	assume that is the CPU which wants to invalidate TLB will invalidate TLB
		531	caches on other CPUs. It is not possible on the most of the
		532	architectures, because of the simple fact - flushing TLB is allowed only
		533	on the local CPU and there is no possibility to access other CPUs' TLB
		534	caches, thus invalidate TLB remotely.</para>
		535
		536	<para>Technique of remote invalidation of TLB entries is called "TLB
		537	shootdown". HelenOS uses a variation of the algorithm described by D.
		538	Black et al., "Translation Lookaside Buffer Consistency: A Software
		539	Approach," Proc. Third Int'l Conf. Architectural Support for Programming
		540	Languages and Operating Systems, 1989, pp. 113-122. <xref
		541	linkend="Black89" /></para>
		542
		543	<para>As the situation demands, you will want partitial invalidation of
		544	TLB caches. In case of simple memory mapping change it is necessary to
		545	invalidate only one or more adjacent pages. In case if the architecture
		546	is aware of ASIDs, when kernel needs to dump some ASID to use by another
		547	task, it invalidates only entries from this particular address space.
		548	Final option of the TLB invalidation is the complete TLB cache
		549	invalidation, which is the operation that flushes all entries in
		550	TLB.</para>
		551
		552	<para>TLB shootdown is performed in two phases.</para>
		553
		554	<formalpara>
		555	<title>Phase 1.</title>
		556
		557	<para>First, initiator locks a global TLB spinlock, then request is
		558	being put to the local request cache of every other CPU in the system
		559	protected by its spinlock. In case the cache is full, all requests in
		560	the cache are replaced by one request, indicating global TLB flush.
		561	Then the initiator thread sends an IPI message indicating the TLB
		562	shootdown request to the rest of the CPUs and waits actively until all
		563	CPUs confirm TLB invalidating action execution by setting up a special
		564	flag. After setting this flag this thread is blocked on the TLB
		565	spinlock, held by the initiator.</para>
		566	</formalpara>
		567
		568	<formalpara>
		569	<title>Phase 2.</title>
		570
		571	<para>All CPUs are waiting on the TLB spinlock to execute TLB
		572	invalidation action and have indicated their intention to the
		573	initiator. Initiator continues, cleaning up its TLB and releasing the
		574	global TLB spinlock. After this all other CPUs gain and immidiately
		575	release TLB spinlock and perform TLB invalidation actions.</para>
		576	</formalpara>
		577	</section>
		578
67	jermar	579	<section>
71	bondari	580	<title>Address spaces</title>
70	jermar	581
71	bondari	582	<section>
		583	<indexterm>
		584	<primary>address space</primary>
70	jermar	585
72	bondari	586	<secondary>- area</secondary>
71	bondari	587	</indexterm>
70	jermar	588
		589	<title>Address space areas</title>
67	jermar	590
		591	<para>Each address space consists of mutually disjunctive continuous
		592	address space areas. Address space area is precisely defined by its
		593	base address and the number of frames/pages is contains.</para>
		594
		595	<para>Address space area , that define behaviour and permissions on
		596	the particular area. <itemizedlist>
71	bondari	597	<listitem><emphasis>AS_AREA_READ</emphasis> flag indicates reading
		598	permission.</listitem>
67	jermar	599
71	bondari	600	<listitem><emphasis>AS_AREA_WRITE</emphasis> flag indicates
		601	writing permission.</listitem>
67	jermar	602
71	bondari	603	<listitem><emphasis>AS_AREA_EXEC</emphasis> flag indicates code
		604	execution permission. Some architectures do not support execution
		605	persmission restriction. In this case this flag has no
		606	effect.</listitem>
67	jermar	607
71	bondari	608	<listitem><emphasis>AS_AREA_DEVICE</emphasis> marks area as mapped
		609	to the device memory.</listitem>
67	jermar	610	</itemizedlist></para>
		611
		612	<para>Kernel provides possibility tasks create/expand/shrink/share its
		613	address space via the set of syscalls.</para>
		614	</section>
		615
		616	<section>
71	bondari	617	<indexterm>
		618	<primary>address space</primary>
		619
72	bondari	620	<secondary>- ASID</secondary>
71	bondari	621	</indexterm>
		622
67	jermar	623	<title>Address Space ID (ASID)</title>
		624
76	palkovsky	625	<para>Every task in the operating system has it's own view of the
		626	virtual memory. When performing context switch between different
		627	tasks, the kernel must switch the address space mapping as well. As
		628	modern processors perform very aggressive caching of virtual mappings,
		629	flushing the complete TLB on every context switch would be very
		630	inefficient. To avoid such performance penalty, some architectures
		631	introduce an address space identifier, which allows storing several
		632	different mappings inside TLB.</para>
67	jermar	633
76	palkovsky	634	<para>HelenOS kernel can take advantage of this hardware support by
		635	having an ASID abstraction. I.e. on ia64 kernel ASID is derived from
		636	RID (region identifier) and on the mips32 kernel ASID is actually the
		637	hardware identifier. As expected, this ASID information record is the
		638	part of <emphasis>as_t</emphasis> structure.</para>
67	jermar	639
		640	<para>Due to the hardware limitations, hardware ASID has limited
		641	length from 8 bits on ia64 to 24 bits on mips32, which makes it
		642	impossible to use it as unique address space identifier for all tasks
		643	running in the system. In such situations special ASID stealing
		644	algoritm is used, which takes ASID from inactive task and assigns it
		645	to the active task.</para>
		646
71	bondari	647	<indexterm>
		648	<primary>address space</primary>
		649
72	bondari	650	<secondary>- ASID stealing</secondary>
71	bondari	651	</indexterm>
		652
		653	<para>
		654	<classname>ASID stealing algoritm here.</classname>
		655	</para>
67	jermar	656	</section>
		657	</section>
26	bondari	658	</section>
11	bondari	659	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_memory_management.xml – Rev 82