Rev 27 | Rev 35 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 27 | Rev 34 | ||
---|---|---|---|
Line 3... | Line 3... | ||
3 | <?dbhtml filename="mm.html"?> |
3 | <?dbhtml filename="mm.html"?> |
4 | 4 | ||
5 | <title>Memory management</title> |
5 | <title>Memory management</title> |
6 | 6 | ||
7 | <section> |
7 | <section> |
8 | <!-- VM --> |
- | |
9 | - | ||
10 | <title>Virtual memory management</title> |
8 | <title>Virtual memory management</title> |
11 | 9 | ||
12 | <section> |
10 | <section> |
13 | <title>Address spaces</title> |
11 | <title>Address spaces</title> |
14 | 12 | ||
15 | <para></para> |
13 | <para /> |
16 | </section> |
14 | </section> |
17 | 15 | ||
18 | <section> |
16 | <section> |
19 | <title>Virtual address translation</title> |
17 | <title>Virtual address translation</title> |
20 | 18 | ||
21 | <para></para> |
19 | <para /> |
22 | </section> |
20 | </section> |
- | 21 | ||
- | 22 | <para>Page tables. 4 level hierarchical and hash directly supported. B+ |
|
- | 23 | Tree can be implemented.</para> |
|
- | 24 | ||
- | 25 | <para>For paging there is an abstract layer</para> |
|
- | 26 | ||
- | 27 | <para>TLB shootdown implementation (update TLB upon mapping |
|
- | 28 | update/remove). TLB shootdown ASID/ASID:PAGE/ALL. TLB shootdown requests |
|
- | 29 | can come in asynchroniously so there is a cache of TLB shootdown requests. |
|
- | 30 | Upon cache overflow TLB shootdown ALL is executed</para> |
|
- | 31 | ||
- | 32 | <para>Address spaces. Address space area (B+ tree). Only for uspace. Set |
|
- | 33 | of syscalls (shrink/extend etc). Special address space area type - device |
|
- | 34 | - prohibits shrink/extend syscalls to call on it. Address space has link |
|
- | 35 | to mapping tables (hierarchical - per Address space, hash - global |
|
- | 36 | tables).</para> |
|
23 | </section> |
37 | </section> |
24 | 38 | ||
25 | <!-- End of VM --> |
39 | <!-- End of VM --> |
26 | 40 | ||
27 | <section> |
41 | <section> |
Line 30... | Line 44... | ||
30 | <title>Physical memory management</title> |
44 | <title>Physical memory management</title> |
31 | 45 | ||
32 | <section id="zones_and_frames"> |
46 | <section id="zones_and_frames"> |
33 | <title>Zones and frames</title> |
47 | <title>Zones and frames</title> |
34 | 48 | ||
35 | <para> |
- | |
36 | <!--graphic fileref="images/mm2.png" /--> |
- | |
37 | - | ||
38 | <!--graphic fileref="images/buddy_alloc.svg" format="SVG" /--> |
49 | <para><!--graphic fileref="images/mm2.png" /--><!--graphic fileref="images/buddy_alloc.svg" format="SVG" /--></para> |
39 | - | ||
40 | - | ||
41 | </para> |
- | |
42 | 50 | ||
43 | <para>On some architectures not whole physical memory is available for |
51 | <para>On some architectures not whole physical memory is available for |
44 | conventional usage. This limitations require from kernel to maintain a |
52 | conventional usage. This limitations require from kernel to maintain a |
45 | table of available and unavailable ranges of physical memory addresses. |
53 | table of available and unavailable ranges of physical memory addresses. |
46 | Main idea of zones is in creating memory zone entity, that is a |
54 | Main idea of zones is in creating memory zone entity, that is a |
Line 95... | Line 103... | ||
95 | Quickly find out if zone contains required number of frames to allocate and if this chunk of memory is properly aligned. This issue is perfectly solved bu the buddy allocator. |
103 | Quickly find out if zone contains required number of frames to allocate and if this chunk of memory is properly aligned. This issue is perfectly solved bu the buddy allocator. |
96 | </listitem> |
104 | </listitem> |
97 | </itemizedlist></para> |
105 | </itemizedlist></para> |
98 | </formalpara> |
106 | </formalpara> |
99 | </section> |
107 | </section> |
100 | </section> |
- | |
101 | - | ||
102 | <section id="buddy_allocator"> |
- | |
103 | <title>Buddy allocator</title> |
- | |
104 | - | ||
105 | <section> |
- | |
106 | <title>Overview</title> |
- | |
107 | - | ||
108 | <para>In buddy allocator, memory is broken down into power-of-two sized |
- | |
109 | naturally aligned blocks. These blocks are organized in an array of |
- | |
110 | lists in which list with index i contains all unallocated blocks of the |
- | |
111 | size <mathphrase>2<superscript>i</superscript></mathphrase>. The index i |
- | |
112 | is called the order of block. Should there be two adjacent equally sized |
- | |
113 | blocks in list <mathphrase>i</mathphrase> (i.e. buddies), the buddy |
- | |
114 | allocator would coalesce them and put the resulting block in list |
- | |
115 | <mathphrase>i + 1</mathphrase>, provided that the resulting block would |
- | |
116 | be naturally aligned. Similarily, when the allocator is asked to |
- | |
117 | allocate a block of size |
- | |
118 | <mathphrase>2<superscript>i</superscript></mathphrase>, it first tries |
- | |
119 | to satisfy the request from list with index i. If the request cannot be |
- | |
120 | satisfied (i.e. the list i is empty), the buddy allocator will try to |
- | |
121 | allocate and split larger block from list with index i + 1. Both of |
- | |
122 | these algorithms are recursive. The recursion ends either when there are |
- | |
123 | no blocks to coalesce in the former case or when there are no blocks |
- | |
124 | that can be split in the latter case.</para> |
- | |
125 | - | ||
126 | <!--graphic fileref="images/mm1.png" format="EPS" /--> |
- | |
127 | - | ||
128 | <para>This approach greatly reduces external fragmentation of memory and |
- | |
129 | helps in allocating bigger continuous blocks of memory aligned to their |
- | |
130 | size. On the other hand, the buddy allocator suffers increased internal |
- | |
131 | fragmentation of memory and is not suitable for general kernel |
- | |
132 | allocations. This purpose is better addressed by the <link |
- | |
133 | linkend="slab">slab allocator</link>.</para> |
- | |
134 | </section> |
- | |
135 | - | ||
136 | <section> |
- | |
137 | <title>Implementation</title> |
- | |
138 | - | ||
139 | <para>The buddy allocator is, in fact, an abstract framework wich can be |
- | |
140 | easily specialized to serve one particular task. It knows nothing about |
- | |
141 | the nature of memory it helps to allocate. In order to beat the lack of |
- | |
142 | this knowledge, the buddy allocator exports an interface that each of |
- | |
143 | its clients is required to implement. When supplied an implementation of |
- | |
144 | this interface, the buddy allocator can use specialized external |
- | |
145 | functions to find buddy for a block, split and coalesce blocks, |
- | |
146 | manipulate block order and mark blocks busy or available. For precize |
- | |
147 | documentation of this interface, refer to <link linkend="???">HelenOS |
- | |
148 | Generic Kernel Reference Manual</link>.</para> |
- | |
149 | - | ||
150 | <formalpara> |
- | |
151 | <title>Data organization</title> |
- | |
152 | - | ||
153 | <para>Each entity allocable by the buddy allocator is required to |
- | |
154 | contain space for storing block order number and a link variable used |
- | |
155 | to interconnect blocks within the same order.</para> |
- | |
156 | - | ||
157 | <para>Whatever entities are allocated by the buddy allocator, the |
- | |
158 | first entity within a block is used to represent the entire block. The |
- | |
159 | first entity keeps the order of the whole block. Other entities within |
- | |
160 | the block are assigned the magic value |
- | |
161 | <constant>BUDDY_INNER_BLOCK</constant>. This is especially important |
- | |
162 | for effective identification of buddies in one-dimensional array |
- | |
163 | because the entity that represents a potential buddy cannot be |
- | |
164 | associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it is |
- | |
165 | associated with <constant>BUDDY_INNER_BLOCK</constant> then it is not |
- | |
166 | a buddy).</para> |
- | |
167 | </formalpara> |
- | |
168 | - | ||
169 | <formalpara> |
- | |
170 | <title>Data organization</title> |
- | |
171 | - | ||
172 | <para>Buddy allocator always uses first frame to represent frame |
- | |
173 | block. This frame contains <varname>buddy_order</varname> variable to |
- | |
174 | provide information about the block size it actually represents ( |
- | |
175 | <mathphrase>2<superscript>buddy_order</superscript></mathphrase> |
- | |
176 | frames block). Other frames in block have this value set to magic |
- | |
177 | <constant>BUDDY_INNER_BLOCK</constant> that is much greater than buddy |
- | |
178 | <varname>max_order</varname> value.</para> |
- | |
179 | 108 | ||
180 | <para>Each <varname>frame_t</varname> also contains pointer member to |
- | |
181 | hold frame structure in the linked list inside one order.</para> |
109 | <section id="buddy_allocator"> |
182 | </formalpara> |
110 | <title>Buddy allocator</title> |
183 | 111 | ||
184 | <formalpara> |
112 | <section> |
185 | <title>Allocation algorithm</title> |
113 | <title>Overview</title> |
186 | 114 | ||
- | 115 | <para>In buddy allocator, memory is broken down into power-of-two |
|
- | 116 | sized naturally aligned blocks. These blocks are organized in an array |
|
- | 117 | of lists in which list with index i contains all unallocated blocks of |
|
187 | <para>Upon <mathphrase>2<superscript>i</superscript></mathphrase> |
118 | the size <mathphrase>2<superscript>i</superscript></mathphrase>. The |
188 | frames block allocation request, allocator checks if there are any |
119 | index i is called the order of block. Should there be two adjacent |
189 | blocks available at the order list <varname>i</varname>. If yes, |
120 | equally sized blocks in list <mathphrase>i</mathphrase> (i.e. |
- | 121 | buddies), the buddy allocator would coalesce them and put the |
|
- | 122 | resulting block in list <mathphrase>i + 1</mathphrase>, provided that |
|
190 | removes block from order list and returns its address. If no, |
123 | the resulting block would be naturally aligned. Similarily, when the |
191 | recursively allocates |
124 | allocator is asked to allocate a block of size |
192 | <mathphrase>2<superscript>i+1</superscript></mathphrase> frame block, |
125 | <mathphrase>2<superscript>i</superscript></mathphrase>, it first tries |
- | 126 | to satisfy the request from list with index i. If the request cannot |
|
- | 127 | be satisfied (i.e. the list i is empty), the buddy allocator will try |
|
- | 128 | to allocate and split larger block from list with index i + 1. Both of |
|
- | 129 | these algorithms are recursive. The recursion ends either when there |
|
- | 130 | are no blocks to coalesce in the former case or when there are no |
|
193 | splits it into two |
131 | blocks that can be split in the latter case.</para> |
- | 132 | ||
- | 133 | <!--graphic fileref="images/mm1.png" format="EPS" /--> |
|
- | 134 | ||
194 | <mathphrase>2<superscript>i</superscript></mathphrase> frame blocks. |
135 | <para>This approach greatly reduces external fragmentation of memory |
- | 136 | and helps in allocating bigger continuous blocks of memory aligned to |
|
195 | Then adds one of the blocks to the <varname>i</varname> order list and |
137 | their size. On the other hand, the buddy allocator suffers increased |
- | 138 | internal fragmentation of memory and is not suitable for general |
|
- | 139 | kernel allocations. This purpose is better addressed by the <link |
|
196 | returns address of another.</para> |
140 | linkend="slab">slab allocator</link>.</para> |
197 | </formalpara> |
141 | </section> |
198 | 142 | ||
199 | <formalpara> |
143 | <section> |
200 | <title>Deallocation algorithm</title> |
144 | <title>Implementation</title> |
201 | 145 | ||
- | 146 | <para>The buddy allocator is, in fact, an abstract framework wich can |
|
- | 147 | be easily specialized to serve one particular task. It knows nothing |
|
- | 148 | about the nature of memory it helps to allocate. In order to beat the |
|
- | 149 | lack of this knowledge, the buddy allocator exports an interface that |
|
- | 150 | each of its clients is required to implement. When supplied an |
|
- | 151 | implementation of this interface, the buddy allocator can use |
|
- | 152 | specialized external functions to find buddy for a block, split and |
|
- | 153 | coalesce blocks, manipulate block order and mark blocks busy or |
|
- | 154 | available. For precize documentation of this interface, refer to <link |
|
- | 155 | linkend="???">HelenOS Generic Kernel Reference Manual</link>.</para> |
|
- | 156 | ||
- | 157 | <formalpara> |
|
- | 158 | <title>Data organization</title> |
|
- | 159 | ||
- | 160 | <para>Each entity allocable by the buddy allocator is required to |
|
- | 161 | contain space for storing block order number and a link variable |
|
- | 162 | used to interconnect blocks within the same order.</para> |
|
- | 163 | ||
- | 164 | <para>Whatever entities are allocated by the buddy allocator, the |
|
- | 165 | first entity within a block is used to represent the entire block. |
|
- | 166 | The first entity keeps the order of the whole block. Other entities |
|
- | 167 | within the block are assigned the magic value |
|
- | 168 | <constant>BUDDY_INNER_BLOCK</constant>. This is especially important |
|
- | 169 | for effective identification of buddies in one-dimensional array |
|
- | 170 | because the entity that represents a potential buddy cannot be |
|
- | 171 | associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it |
|
- | 172 | is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is |
|
- | 173 | not a buddy).</para> |
|
- | 174 | ||
- | 175 | <para>Buddy allocator always uses first frame to represent frame |
|
- | 176 | block. This frame contains <varname>buddy_order</varname> variable |
|
- | 177 | to provide information about the block size it actually represents ( |
|
- | 178 | <mathphrase>2<superscript>buddy_order</superscript></mathphrase> |
|
- | 179 | frames block). Other frames in block have this value set to magic |
|
- | 180 | <constant>BUDDY_INNER_BLOCK</constant> that is much greater than |
|
- | 181 | buddy <varname>max_order</varname> value.</para> |
|
- | 182 | ||
- | 183 | <para>Each <varname>frame_t</varname> also contains pointer member |
|
- | 184 | to hold frame structure in the linked list inside one order.</para> |
|
- | 185 | </formalpara> |
|
- | 186 | ||
- | 187 | <formalpara> |
|
- | 188 | <title>Allocation algorithm</title> |
|
- | 189 | ||
- | 190 | <para>Upon <mathphrase>2<superscript>i</superscript></mathphrase> |
|
- | 191 | frames block allocation request, allocator checks if there are any |
|
- | 192 | blocks available at the order list <varname>i</varname>. If yes, |
|
- | 193 | removes block from order list and returns its address. If no, |
|
- | 194 | recursively allocates |
|
- | 195 | <mathphrase>2<superscript>i+1</superscript></mathphrase> frame |
|
- | 196 | block, splits it into two |
|
- | 197 | <mathphrase>2<superscript>i</superscript></mathphrase> frame blocks. |
|
- | 198 | Then adds one of the blocks to the <varname>i</varname> order list |
|
- | 199 | and returns address of another.</para> |
|
- | 200 | </formalpara> |
|
- | 201 | ||
- | 202 | <formalpara> |
|
- | 203 | <title>Deallocation algorithm</title> |
|
- | 204 | ||
202 | <para>Check if block has so called buddy (another free |
205 | <para>Check if block has so called buddy (another free |
203 | <mathphrase>2<superscript>i</superscript></mathphrase> frame block |
206 | <mathphrase>2<superscript>i</superscript></mathphrase> frame block |
204 | that can be linked with freed block into the |
207 | that can be linked with freed block into the |
205 | <mathphrase>2<superscript>i+1</superscript></mathphrase> block). |
208 | <mathphrase>2<superscript>i+1</superscript></mathphrase> block). |
206 | Technically, buddy is a odd/even block for even/odd block |
209 | Technically, buddy is a odd/even block for even/odd block |
207 | respectively. Plus we can put an extra requirement, that resulting |
210 | respectively. Plus we can put an extra requirement, that resulting |
208 | block must be aligned to its size. This requirement guarantees natural |
211 | block must be aligned to its size. This requirement guarantees |
209 | block alignment for the blocks coming out the allocation |
212 | natural block alignment for the blocks coming out the allocation |
210 | system.</para> |
213 | system.</para> |
211 | 214 | ||
212 | <para>Using direct pointer arithmetics, |
215 | <para>Using direct pointer arithmetics, |
213 | <varname>frame_t::ref_count</varname> and |
216 | <varname>frame_t::ref_count</varname> and |
214 | <varname>frame_t::buddy_order</varname> variables, finding buddy is |
217 | <varname>frame_t::buddy_order</varname> variables, finding buddy is |
215 | done at constant time.</para> |
218 | done at constant time.</para> |
216 | </formalpara> |
219 | </formalpara> |
- | 220 | </section> |
|
217 | </section> |
221 | </section> |
218 | 222 | ||
219 | <section id="slab"> |
223 | <section id="slab"> |
220 | <title>Slab allocator</title> |
224 | <title>Slab allocator</title> |
221 | 225 | ||
222 | <section> |
226 | <section> |
223 | <title>Introduction</title> |
227 | <title>Overview</title> |
- | 228 | ||
- | 229 | <para><termdef><glossterm>Slab</glossterm> represents a contiguous |
|
- | 230 | piece of memory, usually made of several physically contiguous |
|
- | 231 | pages.</termdef> <termdef><glossterm>Slab cache</glossterm> consists |
|
- | 232 | of one or more slabs.</termdef></para> |
|
224 | 233 | ||
225 | <para>The majority of memory allocation requests in the kernel are for |
234 | <para>The majority of memory allocation requests in the kernel are for |
226 | small, frequently used data structures. For this purpose the slab |
235 | small, frequently used data structures. For this purpose the slab |
227 | allocator is a perfect solution. The basic idea behind a slab |
236 | allocator is a perfect solution. The basic idea behind the slab |
228 | allocator is to have lists of commonly used objects available packed |
237 | allocator is to have lists of commonly used objects available packed |
229 | into pages. This avoids the overhead of allocating and destroying |
238 | into pages. This avoids the overhead of allocating and destroying |
230 | commonly used types of objects such as inodes, threads, virtual memory |
239 | commonly used types of objects such threads, virtual memory structures |
231 | structures etc.</para> |
- | |
232 | - | ||
233 | <para>Original slab allocator locking mechanism has become a |
240 | etc. Also due to the exact allocated size matching, slab allocation |
234 | significant preformance bottleneck on SMP architectures. <termdef>Slab |
- | |
235 | SMP perfromance bottleneck was resolved by introducing a per-CPU |
- | |
236 | caching scheme called as <glossterm>magazine |
- | |
237 | layer</glossterm></termdef>.</para> |
241 | completely eliminates internal fragmentation issue.</para> |
238 | </section> |
242 | </section> |
239 | 243 | ||
240 | <section> |
244 | <section> |
241 | <title>Implementation details (needs revision)</title> |
245 | <title>Implementation</title> |
242 | 246 | ||
243 | <para>The SLAB allocator is closely modelled after <ulink |
247 | <para>The SLAB allocator is closely modelled after <ulink |
244 | url="http://www.usenix.org/events/usenix01/full_papers/bonwick/bonwick_html/"> |
248 | url="http://www.usenix.org/events/usenix01/full_papers/bonwick/bonwick_html/"> |
245 | OpenSolaris SLAB allocator by Jeff Bonwick and Jonathan Adams </ulink> |
249 | OpenSolaris SLAB allocator by Jeff Bonwick and Jonathan Adams </ulink> |
246 | with the following exceptions: <itemizedlist> |
250 | with the following exceptions: <itemizedlist> |
Line 256... | Line 260... | ||
256 | <listitem> |
260 | <listitem> |
257 | - cache coloring |
261 | - cache coloring |
258 | </listitem> |
262 | </listitem> |
259 | 263 | ||
260 | <listitem> |
264 | <listitem> |
261 | - dynamic magazine growing (different magazine sizes are already supported, but we would need to adjust allocation strategy) |
265 | - dynamic magazine grow (different magazine sizes are already supported, but we would need to adjust allocation strategy) |
262 | </listitem> |
266 | </listitem> |
263 | </itemizedlist></para> |
267 | </itemizedlist></para> |
264 | 268 | ||
265 | <para>The SLAB allocator supports per-CPU caches ('magazines') to |
- | |
266 | facilitate good SMP scaling.</para> |
- | |
267 | - | ||
268 | <para>When a new object is being allocated, it is first checked, if it |
- | |
269 | is available in CPU-bound magazine. If it is not found there, it is |
- | |
270 | allocated from CPU-shared SLAB - if partial full is found, it is used, |
- | |
271 | otherwise a new one is allocated.</para> |
- | |
272 | - | ||
273 | <para>When an object is being deallocated, it is put to CPU-bound |
- | |
274 | magazine. If there is no such magazine, new one is allocated (if it |
- | |
275 | fails, the object is deallocated into SLAB). If the magazine is full, |
- | |
276 | it is put into cpu-shared list of magazines and new one is |
- | |
277 | allocated.</para> |
- | |
278 | - | ||
279 | <para>The CPU-bound magazine is actually a pair of magazines to avoid |
- | |
280 | thrashing when somebody is allocating/deallocating 1 item at the |
- | |
281 | magazine size boundary. LIFO order is enforced, which should avoid |
- | |
282 | fragmentation as much as possible.</para> |
- | |
283 | - | ||
284 | <para>Every cache contains list of full slabs and list of partialy |
- | |
285 | full slabs. Empty SLABS are immediately freed (thrashing will be |
- | |
286 | avoided because of magazines).</para> |
- | |
287 | - | ||
288 | <para>The SLAB information structure is kept inside the data area, if |
- | |
289 | possible. The cache can be marked that it should not use magazines. |
- | |
290 | This is used only for SLAB related caches to avoid deadlocks and |
- | |
291 | infinite recursion (the SLAB allocator uses itself for allocating all |
- | |
292 | it's control structures).</para> |
- | |
293 | - | ||
294 | <para>The SLAB allocator allocates lots of space and does not free it. |
- | |
295 | When frame allocator fails to allocate the frame, it calls |
- | |
296 | slab_reclaim(). It tries 'light reclaim' first, then brutal reclaim. |
- | |
297 | The light reclaim releases slabs from cpu-shared magazine-list, until |
- | |
298 | at least 1 slab is deallocated in each cache (this algorithm should |
- | |
299 | probably change). The brutal reclaim removes all cached objects, even |
- | |
300 | from CPU-bound magazines.</para> |
- | |
301 | - | ||
302 | <para>TODO: <itemizedlist> |
- | |
303 | <listitem> |
269 | <section> |
304 | For better CPU-scaling the magazine allocation strategy should be extended. Currently, if the cache does not have magazine, it asks for non-cpu cached magazine cache to provide one. It might be feasible to add cpu-cached magazine cache (which would allocate it's magazines from non-cpu-cached mag. cache). This would provide a nice per-cpu buffer. The other possibility is to use the per-cache 'empty-magazine-list', which decreases competing for 1 per-system magazine cache. |
270 | <title>Magazine layer</title> |
305 | </listitem> |
- | |
306 | 271 | ||
- | 272 | <para>Due to the extensive bottleneck on SMP architures, caused by |
|
- | 273 | global SLAB locking mechanism, making processing of all slab |
|
- | 274 | allocation requests serialized, a new layer was introduced to the |
|
- | 275 | classic slab allocator design. Slab allocator was extended to |
|
- | 276 | support per-CPU caches 'magazines' to achieve good SMP scaling. |
|
- | 277 | <termdef>Slab SMP perfromance bottleneck was resolved by introducing |
|
- | 278 | a per-CPU caching scheme called as <glossterm>magazine |
|
- | 279 | layer</glossterm></termdef>.</para> |
|
- | 280 | ||
- | 281 | <para>Magazine is a N-element cache of objects, so each magazine can |
|
- | 282 | satisfy N allocations. Magazine behaves like a automatic weapon |
|
- | 283 | magazine (LIFO, stack), so the allocation/deallocation become simple |
|
- | 284 | push/pop pointer operation. Trick is that CPU does not access global |
|
- | 285 | slab allocator data during the allocation from its magazine, thus |
|
- | 286 | making possible parallel allocations between CPUs.</para> |
|
- | 287 | ||
- | 288 | <para>Implementation also requires adding another feature as the |
|
- | 289 | CPU-bound magazine is actually a pair of magazines to avoid |
|
- | 290 | thrashing when during allocation/deallocatiion of 1 item at the |
|
- | 291 | magazine size boundary. LIFO order is enforced, which should avoid |
|
- | 292 | fragmentation as much as possible.</para> |
|
- | 293 | ||
- | 294 | <para>Another important entity of magazine layer is a full magazine |
|
- | 295 | depot, that stores full magazines which are used by any of the CPU |
|
- | 296 | magazine caches to reload active CPU magazine. Magazine depot can be |
|
- | 297 | pre-filled with full magazines during initialization, but in current |
|
- | 298 | implementation it is filled during object deallocation, when CPU |
|
- | 299 | magazine becomes full.</para> |
|
- | 300 | ||
- | 301 | <para>Slab allocator control structures are allocated from special |
|
- | 302 | slabs, that are marked by special flag, indicating that it should |
|
- | 303 | not be used for slab magazine layer. This is done to avoid possible |
|
- | 304 | infinite recursions and deadlock during conventional slab allocaiton |
|
- | 305 | requests.</para> |
|
- | 306 | </section> |
|
- | 307 | ||
- | 308 | <section> |
|
- | 309 | <title>Allocation/deallocation</title> |
|
- | 310 | ||
- | 311 | <para>Every cache contains list of full slabs and list of partialy |
|
- | 312 | full slabs. Empty slabs are immediately freed (thrashing will be |
|
- | 313 | avoided because of magazines).</para> |
|
- | 314 | ||
- | 315 | <para>The SLAB allocator allocates lots of space and does not free |
|
- | 316 | it. When frame allocator fails to allocate the frame, it calls |
|
- | 317 | slab_reclaim(). It tries 'light reclaim' first, then brutal reclaim. |
|
- | 318 | The light reclaim releases slabs from cpu-shared magazine-list, |
|
- | 319 | until at least 1 slab is deallocated in each cache (this algorithm |
|
- | 320 | should probably change). The brutal reclaim removes all cached |
|
- | 321 | objects, even from CPU-bound magazines.</para> |
|
- | 322 | ||
307 | <listitem> |
323 | <formalpara> |
- | 324 | <title>Allocation</title> |
|
- | 325 | ||
- | 326 | <para><emphasis>Step 1.</emphasis> When it comes to the allocation |
|
- | 327 | request, slab allocator first of all checks availability of memory |
|
- | 328 | in local CPU-bound magazine. If it is there, we would just "pop" |
|
- | 329 | the CPU magazine and return the pointer to object.</para> |
|
- | 330 | ||
- | 331 | <para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is |
|
- | 332 | empty, allocator will attempt to reload magazin, swapping it with |
|
- | 333 | second CPU magazine and returns to the first step.</para> |
|
- | 334 | ||
- | 335 | <para><emphasis>Step 3.</emphasis> Now we are in the situation |
|
- | 336 | when both CPU-bound magazines are empty, which makes allocator to |
|
- | 337 | access shared full-magazines depot to reload CPU-bound magazines. |
|
- | 338 | If reload is succesful (meaning there are full magazines in depot) |
|
- | 339 | algoritm continues at Step 1.</para> |
|
- | 340 | ||
- | 341 | <para><emphasis>Step 4.</emphasis> Final step of the allocation. |
|
308 | - it might be good to add granularity of locks even to slab level, we could then try_spinlock over all partial slabs and thus improve scalability even on slab level |
342 | In this step object is allocated from the conventional slab layer |
- | 343 | and pointer is returned.</para> |
|
309 | </listitem> |
344 | </formalpara> |
- | 345 | ||
- | 346 | <formalpara> |
|
- | 347 | <title>Deallocation</title> |
|
- | 348 | ||
- | 349 | <para><emphasis>Step 1.</emphasis> During deallocation request, |
|
- | 350 | slab allocator will check if the local CPU-bound magazine is not |
|
- | 351 | full. In this case we will just push the pointer to this |
|
310 | </itemizedlist></para> |
352 | magazine.</para> |
- | 353 | ||
- | 354 | <para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is |
|
- | 355 | full, allocator will attempt to reload magazin, swapping it with |
|
- | 356 | second CPU magazine and returns to the first step.</para> |
|
- | 357 | ||
- | 358 | <para><emphasis>Step 3.</emphasis> Now we are in the situation |
|
- | 359 | when both CPU-bound magazines are full, which makes allocator to |
|
- | 360 | access shared full-magazines depot to put one of the magazines to |
|
- | 361 | the depot and creating new empty magazine. Algoritm continues at |
|
- | 362 | Step 1.</para> |
|
- | 363 | </formalpara> |
|
- | 364 | </section> |
|
311 | </section> |
365 | </section> |
312 | </section> |
366 | </section> |
313 | 367 | ||
314 | <!-- End of Physmem --> |
368 | <!-- End of Physmem --> |
315 | </section> |
369 | </section> |