WebSVN – HelenOS-doc – Blame – /design/trunk/src/ch_memory_management.xml

Rev	Author	Line No.	Line
9	bondari	1	<?xml version="1.0" encoding="UTF-8"?>
11	bondari	2	<chapter id="mm">
		3	<?dbhtml filename="mm.html"?>
9	bondari	4
11	bondari	5	<title>Memory management</title>
9	bondari	6
26	bondari	7	<section>
11	bondari	8	<title>Virtual memory management</title>
9	bondari	9
		10	<section>
35	bondari	11	<title>Introduction</title>
		12
		13	<para>Virtual memory is a special memory management technique, used by
		14	kernel to achieve a bunch of mission critical goals. <itemizedlist>
		15	<listitem>
		16	Isolate each task from other tasks that are running on the system at the same time.
		17	</listitem>
		18
		19	<listitem>
		20	Allow to allocate more memory, than is actual physical memory size of the machine.
		21	</listitem>
		22
		23	<listitem>
		24	Allowing, in general, to load and execute two programs that are linked on the same address without complicated relocations.
		25	</listitem>
		26	</itemizedlist></para>
38	bondari	27
39	bondari	28	<para><!--
		29
38	bondari	30	TLB shootdown ASID/ASID:PAGE/ALL.
		31	TLB shootdown requests can come in asynchroniously
		32	so there is a cache of TLB shootdown requests. Upon cache overflow TLB shootdown ALL is executed
		33
		34
		35	<para>
		36	Address spaces. Address space area (B+ tree). Only for uspace. Set of syscalls (shrink/extend etc).
		37	Special address space area type - device - prohibits shrink/extend syscalls to call on it.
		38	Address space has link to mapping tables (hierarchical - per Address space, hash - global tables).
		39	</para>
		40
		41	--></para>
35	bondari	42	</section>
		43
		44	<section>
		45	<title>Paging</title>
		46
		47	<para>Virtual memory is usually using paged memory model, where virtual
		48	memory address space is divided into the <emphasis>pages</emphasis>
		49	(usually having size 4096 bytes) and physical memory is divided into the
39	bondari	50	frames (same sized as a page, of course). Each page may be mapped to
		51	some frame and then, upon memory access to the virtual address, CPU
		52	performs <emphasis>address translation</emphasis> during the instruction
35	bondari	53	execution. Non-existing mapping generates page fault exception, calling
		54	kernel exception handler, thus allowing kernel to manipulate rules of
		55	memory access. Information for pages mapping is stored by kernel in the
		56	<link linkend="page_tables">page tables</link></para>
		57
		58	<para>The majority of the architectures use multi-level page tables,
		59	which means need to access physical memory several times before getting
		60	physical address. This fact would make serios performance overhead in
		61	virtual memory management. To avoid this <link linkend="tlb">Traslation
		62	Lookaside Buffer (TLB)</link> is used.</para>
		63	</section>
		64
		65	<section>
11	bondari	66	<title>Address spaces</title>
9	bondari	67
35	bondari	68	<section>
46	bondari	69	<title>Address space areas</title>
35	bondari	70
46	bondari	71	<para>Each address space consists of mutually disjunctive continuous
		72	address space areas. Address space area is precisely defined by its
47	bondari	73	base address and the number of frames/pages is contains.</para>
35	bondari	74
47	bondari	75	<para>Address space area , that define behaviour and permissions on
		76	the particular area. <itemizedlist>
46	bondari	77	<listitem>
		78
		79
		80	<emphasis>AS_AREA_READ</emphasis>
		81
		82	flag indicates reading permission.
		83	</listitem>
		84
		85	<listitem>
		86
		87
		88	<emphasis>AS_AREA_WRITE</emphasis>
		89
		90	flag indicates writing permission.
		91	</listitem>
		92
		93	<listitem>
		94
		95
		96	<emphasis>AS_AREA_EXEC</emphasis>
		97
		98	flag indicates code execution permission. Some architectures do not support execution persmission restriction. In this case this flag has no effect.
		99	</listitem>
		100
		101	<listitem>
		102
		103
		104	<emphasis>AS_AREA_DEVICE</emphasis>
		105
		106	marks area as mapped to the device memory.
		107	</listitem>
		108	</itemizedlist></para>
		109
		110	<para>Kernel provides possibility tasks create/expand/shrink/share its
		111	address space via the set of syscalls.</para>
35	bondari	112	</section>
		113
		114	<section>
		115	<title>Address Space ID (ASID)</title>
		116
46	bondari	117	<para>When switching to the different task, kernel also require to
		118	switch mappings to the different address space. In case TLB cannot
47	bondari	119	distinguish address space mappings, all mapping information in TLB
		120	from the old address space must be flushed, which can create certain
		121	uncessary overhead during the task switching. To avoid this, some
		122	architectures have capability to segregate different address spaces on
		123	hardware level introducing the address space identifier as a part of
		124	TLB record, telling the virtual address space translation unit to
		125	which address space this record is applicable.</para>
35	bondari	126
46	bondari	127	<para>HelenOS kernel can take advantage of this hardware supported
47	bondari	128	identifier by having an ASID abstraction which is somehow related to
		129	the corresponding architecture identifier. I.e. on ia64 kernel ASID is
		130	derived from RID (region identifier) and on the mips32 kernel ASID is
		131	actually the hardware identifier. As expected, this ASID information
		132	record is the part of <emphasis>as_t</emphasis> structure.</para>
35	bondari	133
47	bondari	134	<para>Due to the hardware limitations, hardware ASID has limited
		135	length from 8 bits on ia64 to 24 bits on mips32, which makes it
		136	impossible to use it as unique address space identifier for all tasks
		137	running in the system. In such situations special ASID stealing
		138	algoritm is used, which takes ASID from inactive task and assigns it
		139	to the active task.</para>
		140
		141	<para><classname>ASID stealing algoritm here.</classname></para>
35	bondari	142	</section>
9	bondari	143	</section>
		144
		145	<section>
11	bondari	146	<title>Virtual address translation</title>
9	bondari	147
35	bondari	148	<section id="page_tables">
		149	<title>Page tables</title>
34	bondari	150
35	bondari	151	<para>HelenOS kernel has two different approaches to the paging
		152	implementation: <emphasis>4 level page tables</emphasis> and
		153	<emphasis>global hash tables</emphasis>, which are accessible via
47	bondari	154	generic paging abstraction layer. Such different functionality was
		155	caused by the major architectural differences between supported
		156	platforms. This abstraction is implemented with help of the global
		157	structure of pointers to basic mapping functions
		158	<emphasis>page_mapping_operations</emphasis>. To achieve different
		159	functionality of page tables, corresponding layer must implement
		160	functions, declared in
		161	<emphasis>page_mapping_operations</emphasis></para>
34	bondari	162
35	bondari	163	<formalpara>
		164	<title>4-level page tables</title>
34	bondari	165
35	bondari	166	<para>4-level page tables are the generalization of the hardware
47	bondari	167	capabilities of several architectures.<itemizedlist>
35	bondari	168	<listitem>
		169	ia32 uses 2-level page tables, with full hardware support.
		170	</listitem>
34	bondari	171
35	bondari	172	<listitem>
		173	amd64 uses 4-level page tables, also coming with full hardware support.
		174	</listitem>
		175
		176	<listitem>
		177	mips and ppc32 have 2-level tables, software simulated support.
		178	</listitem>
		179	</itemizedlist></para>
		180	</formalpara>
		181
		182	<formalpara>
		183	<title>Global hash tables</title>
		184
		185	<para>- global page hash table: existuje jen jedna v celem systemu
		186	(vyuziva ji ia64), pozn. ia64 ma zatim vypnuty VHPT. Pouziva se
46	bondari	187	genericke hash table s oddelenymi collision chains. ASID support is
		188	required to use global hash tables.</para>
35	bondari	189	</formalpara>
		190
		191	<para>Thanks to the abstract paging interface, there is possibility
		192	left have more paging implementations, for example B-Tree page
		193	tables.</para>
		194	</section>
		195
		196	<section id="tlb">
54	bondari	197	<title>Translation Lookaside buffer</title>
35	bondari	198
54	bondari	199	<para>Due to the extensive overhead during the page mapping lookup in
		200	the page tables, all architectures has fast assotiative cache memory
		201	built-in CPU. This memory called TLB stores recently used page table
		202	entries.</para>
35	bondari	203
54	bondari	204	<section id="tlb_shootdown">
		205	<title>TLB consistency. TLB shootdown algorithm.</title>
35	bondari	206
54	bondari	207	<para>Operating system is responsible for keeping TLB consistent by
		208	invalidating the contents of TLB, whenever there is some change in
		209	page tables. Those changes may occur when page or group of pages
		210	were unmapped, mapping is changed or system switching active address
		211	space to schedule a new system task (which is a batch unmap of all
		212	address space mappings). Moreover, this invalidation operation must
		213	be done an all system CPUs because each CPU has its own independent
		214	TLB cache. Thus maintaining TLB consistency on SMP configuration as
		215	not as trivial task as it looks at the first glance. Naive solution
		216	would assume remote TLB invalidatation, which is not possible on the
		217	most of the architectures, because of the simple fact - flushing TLB
		218	is allowed only on the local CPU and there is no possibility to
		219	access other CPUs' TLB caches.</para>
		220
		221	<para>Technique of remote invalidation of TLB entries is called "TLB
		222	shootdown". HelenOS uses a variation of the algorithm described by
		223	D. Black et al., "Translation Lookaside Buffer Consistency: A
		224	Software Approach," Proc. Third Int'l Conf. Architectural Support
		225	for Programming Languages and Operating Systems, 1989, pp.
		226	113-122.</para>
		227
		228	<para>As the situation demands, you will want partitial invalidation
		229	of TLB caches. In case of simple memory mapping change it is
		230	necessary to invalidate only one or more adjacent pages. In case if
		231	the architecture is aware of ASIDs, during the address space
		232	switching, kernel invalidates only entries from this particular
		233	address space. Final option of the TLB invalidation is the complete
		234	TLB cache invalidation, which is the operation that flushes all
		235	entries in TLB.</para>
		236
		237	<para>TLB shootdown is performed in two phases. First, the initiator
		238	process sends an IPI message indicating the TLB shootdown request to
		239	the rest of the CPUs. Then, it waits until all CPUs confirm TLB
		240	invalidating action execution.</para>
		241	</section>
35	bondari	242	</section>
		243	</section>
46	bondari	244
		245	<section>
		246	<title>---</title>
		247
		248	<para>At the moment HelenOS does not support swapping.</para>
		249
		250	<para>- pouzivame vypadky stranky k alokaci ramcu on-demand v ramci
		251	as_area - na architekturach, ktere to podporuji, podporujeme non-exec
		252	stranky</para>
		253	</section>
26	bondari	254	</section>
9	bondari	255
26	bondari	256	<!-- End of VM -->
24	bondari	257
26	bondari	258	<section>
		259	<!-- Phys mem -->
		260
11	bondari	261	<title>Physical memory management</title>
9	bondari	262
24	bondari	263	<section id="zones_and_frames">
		264	<title>Zones and frames</title>
		265
34	bondari	266	<para><!--graphic fileref="images/mm2.png" /--><!--graphic fileref="images/buddy_alloc.svg" format="SVG" /--></para>
26	bondari	267
		268	<para>On some architectures not whole physical memory is available for
		269	conventional usage. This limitations require from kernel to maintain a
		270	table of available and unavailable ranges of physical memory addresses.
		271	Main idea of zones is in creating memory zone entity, that is a
		272	continuous chunk of memory available for allocation. If some chunk is
		273	not available, we simply do not put it in any zone.</para>
		274
		275	<para>Zone is also serves for informational purposes, containing
		276	information about number of free and busy frames. Physical memory
		277	allocation is also done inside the certain zone. Allocation of zone
		278	frame must be organized by the <link linkend="frame_allocator">frame
		279	allocator</link> associated with the zone.</para>
		280
		281	<para>Some of the architectures (mips32, ppc32) have only one zone, that
		282	covers whole physical memory, and the others (like ia32) may have
		283	multiple zones. Information about zones on current machine is stored in
		284	BIOS hardware tables or can be hardcoded into kernel during compile
		285	time.</para>
24	bondari	286	</section>
		287
		288	<section id="frame_allocator">
		289	<title>Frame allocator</title>
		290
39	bondari	291	<para><mediaobject id="frame_alloc">
		292	<imageobject role="html">
		293	<imagedata fileref="images/frame_alloc.png" format="PNG" />
		294	</imageobject>
		295
		296	<imageobject role="fop">
		297	<imagedata fileref="images.vector/frame_alloc.svg" format="SVG" />
		298	</imageobject>
		299	</mediaobject></para>
		300
26	bondari	301	<formalpara>
		302	<title>Overview</title>
24	bondari	303
26	bondari	304	<para>Frame allocator provides physical memory allocation for the
		305	kernel. Because of zonal organization of physical memory, frame
		306	allocator is always working in context of some zone, thus making
		307	impossible to allocate a piece of memory, which lays in different
		308	zone, which cannot happen, because two adjacent zones can be merged
		309	into one. Frame allocator is also being responsible to update
		310	information on the number of free/busy frames in zone. Physical memory
		311	allocation inside one <link linkend="zones_and_frames">memory
		312	zone</link> is being handled by an instance of <link
		313	linkend="buddy_allocator">buddy allocator</link> tailored to allocate
		314	blocks of physical memory frames.</para>
		315	</formalpara>
24	bondari	316
26	bondari	317	<formalpara>
		318	<title>Allocation / deallocation</title>
24	bondari	319
26	bondari	320	<para>Upon allocation request, frame allocator tries to find first
		321	zone, that can satisfy the incoming request (has required amount of
		322	free frames to allocate). During deallocation, frame allocator needs
		323	to find zone, that contain deallocated frame. This approach could
		324	bring up two potential problems: <itemizedlist>
		325	<listitem>
		326	Linear search of zones does not any good to performance, but number of zones is not expected to be high. And if yes, list of zones can be replaced with more time-efficient B-tree.
		327	</listitem>
24	bondari	328
26	bondari	329	<listitem>
		330	Quickly find out if zone contains required number of frames to allocate and if this chunk of memory is properly aligned. This issue is perfectly solved bu the buddy allocator.
		331	</listitem>
		332	</itemizedlist></para>
		333	</formalpara>
		334	</section>
17	jermar	335
34	bondari	336	<section id="buddy_allocator">
		337	<title>Buddy allocator</title>
17	jermar	338
34	bondari	339	<section>
		340	<title>Overview</title>
17	jermar	341
39	bondari	342	<para><mediaobject id="buddy_alloc">
		343	<imageobject role="html">
		344	<imagedata fileref="images/buddy_alloc.png" format="PNG" />
		345	</imageobject>
		346
		347	<imageobject role="fop">
		348	<imagedata fileref="images.vector/buddy_alloc.svg" format="SVG" />
		349	</imageobject>
		350	</mediaobject></para>
		351
45	jermar	352	<para>In the buddy allocator, the memory is broken down into
		353	power-of-two sized naturally aligned blocks. These blocks are
		354	organized in an array of lists, in which the list with index i
		355	contains all unallocated blocks of size
		356	<mathphrase>2<superscript>i</superscript></mathphrase>. The index i is
		357	called the order of block. Should there be two adjacent equally sized
46	bondari	358	blocks in the list i<mathphrase />(i.e. buddies), the buddy allocator
		359	would coalesce them and put the resulting block in list <mathphrase>i
		360	+ 1</mathphrase>, provided that the resulting block would be naturally
		361	aligned. Similarily, when the allocator is asked to allocate a block
		362	of size <mathphrase>2<superscript>i</superscript></mathphrase>, it
		363	first tries to satisfy the request from the list with index i. If the
		364	request cannot be satisfied (i.e. the list i is empty), the buddy
		365	allocator will try to allocate and split a larger block from the list
		366	with index i + 1. Both of these algorithms are recursive. The
		367	recursion ends either when there are no blocks to coalesce in the
		368	former case or when there are no blocks that can be split in the
		369	latter case.</para>
17	jermar	370
34	bondari	371	<!--graphic fileref="images/mm1.png" format="EPS" /-->
17	jermar	372
34	bondari	373	<para>This approach greatly reduces external fragmentation of memory
		374	and helps in allocating bigger continuous blocks of memory aligned to
		375	their size. On the other hand, the buddy allocator suffers increased
		376	internal fragmentation of memory and is not suitable for general
		377	kernel allocations. This purpose is better addressed by the <link
		378	linkend="slab">slab allocator</link>.</para>
		379	</section>
17	jermar	380
34	bondari	381	<section>
		382	<title>Implementation</title>
17	jermar	383
34	bondari	384	<para>The buddy allocator is, in fact, an abstract framework wich can
		385	be easily specialized to serve one particular task. It knows nothing
		386	about the nature of memory it helps to allocate. In order to beat the
		387	lack of this knowledge, the buddy allocator exports an interface that
45	jermar	388	each of its clients is required to implement. When supplied with an
34	bondari	389	implementation of this interface, the buddy allocator can use
45	jermar	390	specialized external functions to find a buddy for a block, split and
34	bondari	391	coalesce blocks, manipulate block order and mark blocks busy or
45	jermar	392	available. For precise documentation of this interface, refer to
39	bondari	393	<emphasis>"HelenOS Generic Kernel Reference Manual"</emphasis>.</para>
17	jermar	394
34	bondari	395	<formalpara>
		396	<title>Data organization</title>
17	jermar	397
34	bondari	398	<para>Each entity allocable by the buddy allocator is required to
		399	contain space for storing block order number and a link variable
		400	used to interconnect blocks within the same order.</para>
15	bondari	401
34	bondari	402	<para>Whatever entities are allocated by the buddy allocator, the
		403	first entity within a block is used to represent the entire block.
		404	The first entity keeps the order of the whole block. Other entities
		405	within the block are assigned the magic value
		406	<constant>BUDDY_INNER_BLOCK</constant>. This is especially important
45	jermar	407	for effective identification of buddies in a one-dimensional array
34	bondari	408	because the entity that represents a potential buddy cannot be
		409	associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it
		410	is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is
		411	not a buddy).</para>
15	bondari	412
45	jermar	413	<para>The buddy allocator always uses the first frame to represent
		414	the frame block. This frame contains <varname>buddy_order</varname>
		415	variable to provide information about the block size it actually
		416	represents (
34	bondari	417	<mathphrase>2<superscript>buddy_order</superscript></mathphrase>
		418	frames block). Other frames in block have this value set to magic
		419	<constant>BUDDY_INNER_BLOCK</constant> that is much greater than
		420	buddy <varname>max_order</varname> value.</para>
15	bondari	421
34	bondari	422	<para>Each <varname>frame_t</varname> also contains pointer member
		423	to hold frame structure in the linked list inside one order.</para>
		424	</formalpara>
15	bondari	425
34	bondari	426	<formalpara>
		427	<title>Allocation algorithm</title>
15	bondari	428
34	bondari	429	<para>Upon <mathphrase>2<superscript>i</superscript></mathphrase>
		430	frames block allocation request, allocator checks if there are any
		431	blocks available at the order list <varname>i</varname>. If yes,
		432	removes block from order list and returns its address. If no,
		433	recursively allocates
		434	<mathphrase>2<superscript>i+1</superscript></mathphrase> frame
		435	block, splits it into two
		436	<mathphrase>2<superscript>i</superscript></mathphrase> frame blocks.
		437	Then adds one of the blocks to the <varname>i</varname> order list
		438	and returns address of another.</para>
		439	</formalpara>
15	bondari	440
34	bondari	441	<formalpara>
		442	<title>Deallocation algorithm</title>
17	jermar	443
34	bondari	444	<para>Check if block has so called buddy (another free
		445	<mathphrase>2<superscript>i</superscript></mathphrase> frame block
		446	that can be linked with freed block into the
		447	<mathphrase>2<superscript>i+1</superscript></mathphrase> block).
		448	Technically, buddy is a odd/even block for even/odd block
		449	respectively. Plus we can put an extra requirement, that resulting
		450	block must be aligned to its size. This requirement guarantees
		451	natural block alignment for the blocks coming out the allocation
		452	system.</para>
9	bondari	453
34	bondari	454	<para>Using direct pointer arithmetics,
		455	<varname>frame_t::ref_count</varname> and
		456	<varname>frame_t::buddy_order</varname> variables, finding buddy is
		457	done at constant time.</para>
		458	</formalpara>
		459	</section>
26	bondari	460	</section>
		461
15	bondari	462	<section id="slab">
11	bondari	463	<title>Slab allocator</title>
9	bondari	464
26	bondari	465	<section>
34	bondari	466	<title>Overview</title>
9	bondari	467
34	bondari	468	<para><termdef><glossterm>Slab</glossterm> represents a contiguous
		469	piece of memory, usually made of several physically contiguous
		470	pages.</termdef> <termdef><glossterm>Slab cache</glossterm> consists
		471	of one or more slabs.</termdef></para>
		472
26	bondari	473	<para>The majority of memory allocation requests in the kernel are for
		474	small, frequently used data structures. For this purpose the slab
34	bondari	475	allocator is a perfect solution. The basic idea behind the slab
26	bondari	476	allocator is to have lists of commonly used objects available packed
		477	into pages. This avoids the overhead of allocating and destroying
34	bondari	478	commonly used types of objects such threads, virtual memory structures
		479	etc. Also due to the exact allocated size matching, slab allocation
		480	completely eliminates internal fragmentation issue.</para>
26	bondari	481	</section>
24	bondari	482
26	bondari	483	<section>
34	bondari	484	<title>Implementation</title>
9	bondari	485
39	bondari	486	<para><mediaobject id="slab_alloc">
		487	<imageobject role="html">
		488	<imagedata fileref="images/slab_alloc.png" format="PNG" />
		489	</imageobject>
		490
		491	<imageobject role="fop">
		492	<imagedata fileref="images.vector/slab_alloc.svg" format="SVG" />
		493	</imageobject>
		494	</mediaobject></para>
		495
26	bondari	496	<para>The SLAB allocator is closely modelled after <ulink
		497	url="http://www.usenix.org/events/usenix01/full_papers/bonwick/bonwick_html/">
		498	OpenSolaris SLAB allocator by Jeff Bonwick and Jonathan Adams </ulink>
		499	with the following exceptions: <itemizedlist>
		500	<listitem>
		501	empty SLABS are deallocated immediately (in Linux they are kept in linked list, in Solaris ???)
		502	</listitem>
		503
		504	<listitem>
		505	empty magazines are deallocated when not needed (in Solaris they are held in linked list in slab cache)
		506	</listitem>
		507	</itemizedlist> Following features are not currently supported but
		508	would be easy to do: <itemizedlist>
		509	<listitem>
		510	- cache coloring
		511	</listitem>
		512
		513	<listitem>
34	bondari	514	- dynamic magazine grow (different magazine sizes are already supported, but we would need to adjust allocation strategy)
26	bondari	515	</listitem>
		516	</itemizedlist></para>
		517
34	bondari	518	<section>
		519	<title>Magazine layer</title>
26	bondari	520
34	bondari	521	<para>Due to the extensive bottleneck on SMP architures, caused by
		522	global SLAB locking mechanism, making processing of all slab
		523	allocation requests serialized, a new layer was introduced to the
		524	classic slab allocator design. Slab allocator was extended to
		525	support per-CPU caches 'magazines' to achieve good SMP scaling.
		526	<termdef>Slab SMP perfromance bottleneck was resolved by introducing
		527	a per-CPU caching scheme called as <glossterm>magazine
		528	layer</glossterm></termdef>.</para>
26	bondari	529
34	bondari	530	<para>Magazine is a N-element cache of objects, so each magazine can
		531	satisfy N allocations. Magazine behaves like a automatic weapon
		532	magazine (LIFO, stack), so the allocation/deallocation become simple
		533	push/pop pointer operation. Trick is that CPU does not access global
		534	slab allocator data during the allocation from its magazine, thus
		535	making possible parallel allocations between CPUs.</para>
26	bondari	536
34	bondari	537	<para>Implementation also requires adding another feature as the
		538	CPU-bound magazine is actually a pair of magazines to avoid
		539	thrashing when during allocation/deallocatiion of 1 item at the
		540	magazine size boundary. LIFO order is enforced, which should avoid
		541	fragmentation as much as possible.</para>
26	bondari	542
46	bondari	543	<para>Another important entity of magazine layer is the common full
		544	magazine list (also called a depot), that stores full magazines that
		545	may be used by any of the CPU magazine caches to reload active CPU
		546	magazine. This list of magazines can be pre-filled with full
		547	magazines during initialization, but in current implementation it is
		548	filled during object deallocation, when CPU magazine becomes
		549	full.</para>
26	bondari	550
34	bondari	551	<para>Slab allocator control structures are allocated from special
		552	slabs, that are marked by special flag, indicating that it should
		553	not be used for slab magazine layer. This is done to avoid possible
		554	infinite recursions and deadlock during conventional slab allocaiton
		555	requests.</para>
		556	</section>
26	bondari	557
34	bondari	558	<section>
		559	<title>Allocation/deallocation</title>
26	bondari	560
34	bondari	561	<para>Every cache contains list of full slabs and list of partialy
		562	full slabs. Empty slabs are immediately freed (thrashing will be
		563	avoided because of magazines).</para>
26	bondari	564
34	bondari	565	<para>The SLAB allocator allocates lots of space and does not free
		566	it. When frame allocator fails to allocate the frame, it calls
		567	slab_reclaim(). It tries 'light reclaim' first, then brutal reclaim.
		568	The light reclaim releases slabs from cpu-shared magazine-list,
		569	until at least 1 slab is deallocated in each cache (this algorithm
		570	should probably change). The brutal reclaim removes all cached
		571	objects, even from CPU-bound magazines.</para>
		572
		573	<formalpara>
		574	<title>Allocation</title>
		575
		576	<para><emphasis>Step 1.</emphasis> When it comes to the allocation
		577	request, slab allocator first of all checks availability of memory
		578	in local CPU-bound magazine. If it is there, we would just "pop"
		579	the CPU magazine and return the pointer to object.</para>
		580
		581	<para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is
		582	empty, allocator will attempt to reload magazin, swapping it with
		583	second CPU magazine and returns to the first step.</para>
		584
		585	<para><emphasis>Step 3.</emphasis> Now we are in the situation
		586	when both CPU-bound magazines are empty, which makes allocator to
		587	access shared full-magazines depot to reload CPU-bound magazines.
		588	If reload is succesful (meaning there are full magazines in depot)
		589	algoritm continues at Step 1.</para>
		590
		591	<para><emphasis>Step 4.</emphasis> Final step of the allocation.
		592	In this step object is allocated from the conventional slab layer
		593	and pointer is returned.</para>
		594	</formalpara>
		595
		596	<formalpara>
		597	<title>Deallocation</title>
		598
		599	<para><emphasis>Step 1.</emphasis> During deallocation request,
		600	slab allocator will check if the local CPU-bound magazine is not
		601	full. In this case we will just push the pointer to this
		602	magazine.</para>
		603
		604	<para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is
		605	full, allocator will attempt to reload magazin, swapping it with
		606	second CPU magazine and returns to the first step.</para>
		607
		608	<para><emphasis>Step 3.</emphasis> Now we are in the situation
		609	when both CPU-bound magazines are full, which makes allocator to
		610	access shared full-magazines depot to put one of the magazines to
		611	the depot and creating new empty magazine. Algoritm continues at
		612	Step 1.</para>
		613	</formalpara>
		614	</section>
26	bondari	615	</section>
15	bondari	616	</section>
26	bondari	617
		618	<!-- End of Physmem -->
		619	</section>
		620
		621	<section>
		622	<title>Memory sharing</title>
		623
		624	<para>Not implemented yet(?)</para>
		625	</section>
11	bondari	626	</chapter>

Subversion Repositories HelenOS-doc

(root)/design/trunk/src/ch_memory_management.xml @ 185 – Rev 54