Subversion Repositories HelenOS-doc

Rev

Rev 27 | Rev 35 | Go to most recent revision | Only display areas with differences | Regard whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 27 Rev 34
1
<?xml version="1.0" encoding="UTF-8"?>
1
<?xml version="1.0" encoding="UTF-8"?>
2
<chapter id="mm">
2
<chapter id="mm">
3
  <?dbhtml filename="mm.html"?>
3
  <?dbhtml filename="mm.html"?>
4
 
4
 
5
  <title>Memory management</title>
5
  <title>Memory management</title>
6
 
6
 
7
  <section>
7
  <section>
8
    <!-- VM -->
-
 
9
 
-
 
10
    <title>Virtual memory management</title>
8
    <title>Virtual memory management</title>
11
 
9
 
12
    <section>
10
    <section>
13
      <title>Address spaces</title>
11
      <title>Address spaces</title>
14
 
12
 
15
      <para></para>
13
      <para />
16
    </section>
14
    </section>
17
 
15
 
18
    <section>
16
    <section>
19
      <title>Virtual address translation</title>
17
      <title>Virtual address translation</title>
20
 
18
 
21
      <para></para>
19
      <para />
22
    </section>
20
    </section>
-
 
21
 
-
 
22
    <para>Page tables. 4 level hierarchical and hash directly supported. B+
-
 
23
    Tree can be implemented.</para>
-
 
24
 
-
 
25
    <para>For paging there is an abstract layer</para>
-
 
26
 
-
 
27
    <para>TLB shootdown implementation (update TLB upon mapping
-
 
28
    update/remove). TLB shootdown ASID/ASID:PAGE/ALL. TLB shootdown requests
-
 
29
    can come in asynchroniously so there is a cache of TLB shootdown requests.
-
 
30
    Upon cache overflow TLB shootdown ALL is executed</para>
-
 
31
 
-
 
32
    <para>Address spaces. Address space area (B+ tree). Only for uspace. Set
-
 
33
    of syscalls (shrink/extend etc). Special address space area type - device
-
 
34
    - prohibits shrink/extend syscalls to call on it. Address space has link
-
 
35
    to mapping tables (hierarchical - per Address space, hash - global
-
 
36
    tables).</para>
23
  </section>
37
  </section>
24
 
38
 
25
  <!-- End of VM -->
39
  <!-- End of VM -->
26
 
40
 
27
  <section>
41
  <section>
28
    <!-- Phys mem -->
42
    <!-- Phys mem -->
29
 
43
 
30
    <title>Physical memory management</title>
44
    <title>Physical memory management</title>
31
 
45
 
32
    <section id="zones_and_frames">
46
    <section id="zones_and_frames">
33
      <title>Zones and frames</title>
47
      <title>Zones and frames</title>
34
 
48
 
35
      <para>
-
 
36
      <!--graphic fileref="images/mm2.png" /-->
-
 
37
     
-
 
38
      <!--graphic fileref="images/buddy_alloc.svg" format="SVG" /-->
49
      <para><!--graphic fileref="images/mm2.png" /--><!--graphic fileref="images/buddy_alloc.svg" format="SVG" /--></para>
39
     
-
 
40
     
-
 
41
      </para>
-
 
42
 
50
 
43
      <para>On some architectures not whole physical memory is available for
51
      <para>On some architectures not whole physical memory is available for
44
      conventional usage. This limitations require from kernel to maintain a
52
      conventional usage. This limitations require from kernel to maintain a
45
      table of available and unavailable ranges of physical memory addresses.
53
      table of available and unavailable ranges of physical memory addresses.
46
      Main idea of zones is in creating memory zone entity, that is a
54
      Main idea of zones is in creating memory zone entity, that is a
47
      continuous chunk of memory available for allocation. If some chunk is
55
      continuous chunk of memory available for allocation. If some chunk is
48
      not available, we simply do not put it in any zone.</para>
56
      not available, we simply do not put it in any zone.</para>
49
 
57
 
50
      <para>Zone is also serves for informational purposes, containing
58
      <para>Zone is also serves for informational purposes, containing
51
      information about number of free and busy frames. Physical memory
59
      information about number of free and busy frames. Physical memory
52
      allocation is also done inside the certain zone. Allocation of zone
60
      allocation is also done inside the certain zone. Allocation of zone
53
      frame must be organized by the <link linkend="frame_allocator">frame
61
      frame must be organized by the <link linkend="frame_allocator">frame
54
      allocator</link> associated with the zone.</para>
62
      allocator</link> associated with the zone.</para>
55
 
63
 
56
      <para>Some of the architectures (mips32, ppc32) have only one zone, that
64
      <para>Some of the architectures (mips32, ppc32) have only one zone, that
57
      covers whole physical memory, and the others (like ia32) may have
65
      covers whole physical memory, and the others (like ia32) may have
58
      multiple zones. Information about zones on current machine is stored in
66
      multiple zones. Information about zones on current machine is stored in
59
      BIOS hardware tables or can be hardcoded into kernel during compile
67
      BIOS hardware tables or can be hardcoded into kernel during compile
60
      time.</para>
68
      time.</para>
61
    </section>
69
    </section>
62
 
70
 
63
    <section id="frame_allocator">
71
    <section id="frame_allocator">
64
      <title>Frame allocator</title>
72
      <title>Frame allocator</title>
65
 
73
 
66
      <formalpara>
74
      <formalpara>
67
        <title>Overview</title>
75
        <title>Overview</title>
68
 
76
 
69
        <para>Frame allocator provides physical memory allocation for the
77
        <para>Frame allocator provides physical memory allocation for the
70
        kernel. Because of zonal organization of physical memory, frame
78
        kernel. Because of zonal organization of physical memory, frame
71
        allocator is always working in context of some zone, thus making
79
        allocator is always working in context of some zone, thus making
72
        impossible to allocate a piece of memory, which lays in different
80
        impossible to allocate a piece of memory, which lays in different
73
        zone, which cannot happen, because two adjacent zones can be merged
81
        zone, which cannot happen, because two adjacent zones can be merged
74
        into one. Frame allocator is also being responsible to update
82
        into one. Frame allocator is also being responsible to update
75
        information on the number of free/busy frames in zone. Physical memory
83
        information on the number of free/busy frames in zone. Physical memory
76
        allocation inside one <link linkend="zones_and_frames">memory
84
        allocation inside one <link linkend="zones_and_frames">memory
77
        zone</link> is being handled by an instance of <link
85
        zone</link> is being handled by an instance of <link
78
        linkend="buddy_allocator">buddy allocator</link> tailored to allocate
86
        linkend="buddy_allocator">buddy allocator</link> tailored to allocate
79
        blocks of physical memory frames.</para>
87
        blocks of physical memory frames.</para>
80
      </formalpara>
88
      </formalpara>
81
 
89
 
82
      <formalpara>
90
      <formalpara>
83
        <title>Allocation / deallocation</title>
91
        <title>Allocation / deallocation</title>
84
 
92
 
85
        <para>Upon allocation request, frame allocator tries to find first
93
        <para>Upon allocation request, frame allocator tries to find first
86
        zone, that can satisfy the incoming request (has required amount of
94
        zone, that can satisfy the incoming request (has required amount of
87
        free frames to allocate). During deallocation, frame allocator needs
95
        free frames to allocate). During deallocation, frame allocator needs
88
        to find zone, that contain deallocated frame. This approach could
96
        to find zone, that contain deallocated frame. This approach could
89
        bring up two potential problems: <itemizedlist>
97
        bring up two potential problems: <itemizedlist>
90
            <listitem>
98
            <listitem>
91
               Linear search of zones does not any good to performance, but number of zones is not expected to be high. And if yes, list of zones can be replaced with more time-efficient B-tree.
99
               Linear search of zones does not any good to performance, but number of zones is not expected to be high. And if yes, list of zones can be replaced with more time-efficient B-tree.
92
            </listitem>
100
            </listitem>
93
 
101
 
94
            <listitem>
102
            <listitem>
95
               Quickly find out if zone contains required number of frames to allocate and if this chunk of memory is properly aligned. This issue is perfectly solved bu the buddy allocator.
103
               Quickly find out if zone contains required number of frames to allocate and if this chunk of memory is properly aligned. This issue is perfectly solved bu the buddy allocator.
96
            </listitem>
104
            </listitem>
97
          </itemizedlist></para>
105
          </itemizedlist></para>
98
      </formalpara>
106
      </formalpara>
99
    </section>
107
    </section>
100
  </section>
-
 
101
 
108
 
102
  <section id="buddy_allocator">
109
    <section id="buddy_allocator">
103
    <title>Buddy allocator</title>
110
      <title>Buddy allocator</title>
104
 
111
 
105
    <section>
112
      <section>
106
      <title>Overview</title>
113
        <title>Overview</title>
107
 
114
 
108
      <para>In buddy allocator, memory is broken down into power-of-two sized
115
        <para>In buddy allocator, memory is broken down into power-of-two
109
      naturally aligned blocks. These blocks are organized in an array of
116
        sized naturally aligned blocks. These blocks are organized in an array
110
      lists in which list with index i contains all unallocated blocks of the
117
        of lists in which list with index i contains all unallocated blocks of
111
      size <mathphrase>2<superscript>i</superscript></mathphrase>. The index i
118
        the size <mathphrase>2<superscript>i</superscript></mathphrase>. The
112
      is called the order of block. Should there be two adjacent equally sized
119
        index i is called the order of block. Should there be two adjacent
113
      blocks in list <mathphrase>i</mathphrase> (i.e. buddies), the buddy
120
        equally sized blocks in list <mathphrase>i</mathphrase> (i.e.
114
      allocator would coalesce them and put the resulting block in list
121
        buddies), the buddy allocator would coalesce them and put the
115
      <mathphrase>i + 1</mathphrase>, provided that the resulting block would
122
        resulting block in list <mathphrase>i + 1</mathphrase>, provided that
116
      be naturally aligned. Similarily, when the allocator is asked to
123
        the resulting block would be naturally aligned. Similarily, when the
117
      allocate a block of size
124
        allocator is asked to allocate a block of size
118
      <mathphrase>2<superscript>i</superscript></mathphrase>, it first tries
125
        <mathphrase>2<superscript>i</superscript></mathphrase>, it first tries
119
      to satisfy the request from list with index i. If the request cannot be
126
        to satisfy the request from list with index i. If the request cannot
120
      satisfied (i.e. the list i is empty), the buddy allocator will try to
127
        be satisfied (i.e. the list i is empty), the buddy allocator will try
121
      allocate and split larger block from list with index i + 1. Both of
128
        to allocate and split larger block from list with index i + 1. Both of
122
      these algorithms are recursive. The recursion ends either when there are
129
        these algorithms are recursive. The recursion ends either when there
123
      no blocks to coalesce in the former case or when there are no blocks
130
        are no blocks to coalesce in the former case or when there are no
124
      that can be split in the latter case.</para>
131
        blocks that can be split in the latter case.</para>
125
 
132
 
126
      <!--graphic fileref="images/mm1.png" format="EPS" /-->
133
        <!--graphic fileref="images/mm1.png" format="EPS" /-->
127
 
134
 
128
      <para>This approach greatly reduces external fragmentation of memory and
135
        <para>This approach greatly reduces external fragmentation of memory
129
      helps in allocating bigger continuous blocks of memory aligned to their
136
        and helps in allocating bigger continuous blocks of memory aligned to
130
      size. On the other hand, the buddy allocator suffers increased internal
137
        their size. On the other hand, the buddy allocator suffers increased
131
      fragmentation of memory and is not suitable for general kernel
138
        internal fragmentation of memory and is not suitable for general
132
      allocations. This purpose is better addressed by the <link
139
        kernel allocations. This purpose is better addressed by the <link
133
      linkend="slab">slab allocator</link>.</para>
140
        linkend="slab">slab allocator</link>.</para>
134
    </section>
141
      </section>
135
 
142
 
136
    <section>
143
      <section>
137
      <title>Implementation</title>
144
        <title>Implementation</title>
138
 
145
 
139
      <para>The buddy allocator is, in fact, an abstract framework wich can be
146
        <para>The buddy allocator is, in fact, an abstract framework wich can
140
      easily specialized to serve one particular task. It knows nothing about
147
        be easily specialized to serve one particular task. It knows nothing
141
      the nature of memory it helps to allocate. In order to beat the lack of
148
        about the nature of memory it helps to allocate. In order to beat the
142
      this knowledge, the buddy allocator exports an interface that each of
149
        lack of this knowledge, the buddy allocator exports an interface that
143
      its clients is required to implement. When supplied an implementation of
150
        each of its clients is required to implement. When supplied an
144
      this interface, the buddy allocator can use specialized external
151
        implementation of this interface, the buddy allocator can use
145
      functions to find buddy for a block, split and coalesce blocks,
152
        specialized external functions to find buddy for a block, split and
146
      manipulate block order and mark blocks busy or available. For precize
153
        coalesce blocks, manipulate block order and mark blocks busy or
147
      documentation of this interface, refer to <link linkend="???">HelenOS
154
        available. For precize documentation of this interface, refer to <link
148
      Generic Kernel Reference Manual</link>.</para>
155
        linkend="???">HelenOS Generic Kernel Reference Manual</link>.</para>
149
 
156
 
150
      <formalpara>
157
        <formalpara>
151
        <title>Data organization</title>
158
          <title>Data organization</title>
152
 
159
 
153
        <para>Each entity allocable by the buddy allocator is required to
160
          <para>Each entity allocable by the buddy allocator is required to
154
        contain space for storing block order number and a link variable used
161
          contain space for storing block order number and a link variable
155
        to interconnect blocks within the same order.</para>
162
          used to interconnect blocks within the same order.</para>
156
 
163
 
157
        <para>Whatever entities are allocated by the buddy allocator, the
164
          <para>Whatever entities are allocated by the buddy allocator, the
158
        first entity within a block is used to represent the entire block. The
165
          first entity within a block is used to represent the entire block.
159
        first entity keeps the order of the whole block. Other entities within
166
          The first entity keeps the order of the whole block. Other entities
160
        the block are assigned the magic value
167
          within the block are assigned the magic value
161
        <constant>BUDDY_INNER_BLOCK</constant>. This is especially important
168
          <constant>BUDDY_INNER_BLOCK</constant>. This is especially important
162
        for effective identification of buddies in one-dimensional array
169
          for effective identification of buddies in one-dimensional array
163
        because the entity that represents a potential buddy cannot be
170
          because the entity that represents a potential buddy cannot be
164
        associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it is
171
          associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it
165
        associated with <constant>BUDDY_INNER_BLOCK</constant> then it is not
172
          is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is
166
        a buddy).</para>
173
          not a buddy).</para>
167
      </formalpara>
-
 
168
 
-
 
169
      <formalpara>
-
 
170
        <title>Data organization</title>
-
 
171
 
174
 
172
        <para>Buddy allocator always uses first frame to represent frame
175
          <para>Buddy allocator always uses first frame to represent frame
173
        block. This frame contains <varname>buddy_order</varname> variable to
176
          block. This frame contains <varname>buddy_order</varname> variable
174
        provide information about the block size it actually represents (
177
          to provide information about the block size it actually represents (
175
        <mathphrase>2<superscript>buddy_order</superscript></mathphrase>
178
          <mathphrase>2<superscript>buddy_order</superscript></mathphrase>
176
        frames block). Other frames in block have this value set to magic
179
          frames block). Other frames in block have this value set to magic
177
        <constant>BUDDY_INNER_BLOCK</constant> that is much greater than buddy
180
          <constant>BUDDY_INNER_BLOCK</constant> that is much greater than
178
        <varname>max_order</varname> value.</para>
181
          buddy <varname>max_order</varname> value.</para>
179
 
182
 
180
        <para>Each <varname>frame_t</varname> also contains pointer member to
183
          <para>Each <varname>frame_t</varname> also contains pointer member
181
        hold frame structure in the linked list inside one order.</para>
184
          to hold frame structure in the linked list inside one order.</para>
182
      </formalpara>
185
        </formalpara>
183
 
186
 
184
      <formalpara>
187
        <formalpara>
185
        <title>Allocation algorithm</title>
188
          <title>Allocation algorithm</title>
186
 
189
 
187
        <para>Upon <mathphrase>2<superscript>i</superscript></mathphrase>
190
          <para>Upon <mathphrase>2<superscript>i</superscript></mathphrase>
188
        frames block allocation request, allocator checks if there are any
191
          frames block allocation request, allocator checks if there are any
189
        blocks available at the order list <varname>i</varname>. If yes,
192
          blocks available at the order list <varname>i</varname>. If yes,
190
        removes block from order list and returns its address. If no,
193
          removes block from order list and returns its address. If no,
191
        recursively allocates
194
          recursively allocates
192
        <mathphrase>2<superscript>i+1</superscript></mathphrase> frame block,
195
          <mathphrase>2<superscript>i+1</superscript></mathphrase> frame
193
        splits it into two
196
          block, splits it into two
194
        <mathphrase>2<superscript>i</superscript></mathphrase> frame blocks.
197
          <mathphrase>2<superscript>i</superscript></mathphrase> frame blocks.
195
        Then adds one of the blocks to the <varname>i</varname> order list and
198
          Then adds one of the blocks to the <varname>i</varname> order list
196
        returns address of another.</para>
199
          and returns address of another.</para>
197
      </formalpara>
200
        </formalpara>
198
 
201
 
199
      <formalpara>
202
        <formalpara>
200
        <title>Deallocation algorithm</title>
203
          <title>Deallocation algorithm</title>
201
 
204
 
202
        <para>Check if block has so called buddy (another free
205
          <para>Check if block has so called buddy (another free
203
        <mathphrase>2<superscript>i</superscript></mathphrase> frame block
206
          <mathphrase>2<superscript>i</superscript></mathphrase> frame block
204
        that can be linked with freed block into the
207
          that can be linked with freed block into the
205
        <mathphrase>2<superscript>i+1</superscript></mathphrase> block).
208
          <mathphrase>2<superscript>i+1</superscript></mathphrase> block).
206
        Technically, buddy is a odd/even block for even/odd block
209
          Technically, buddy is a odd/even block for even/odd block
207
        respectively. Plus we can put an extra requirement, that resulting
210
          respectively. Plus we can put an extra requirement, that resulting
208
        block must be aligned to its size. This requirement guarantees natural
211
          block must be aligned to its size. This requirement guarantees
209
        block alignment for the blocks coming out the allocation
212
          natural block alignment for the blocks coming out the allocation
210
        system.</para>
213
          system.</para>
211
 
214
 
212
        <para>Using direct pointer arithmetics,
215
          <para>Using direct pointer arithmetics,
213
        <varname>frame_t::ref_count</varname> and
216
          <varname>frame_t::ref_count</varname> and
214
        <varname>frame_t::buddy_order</varname> variables, finding buddy is
217
          <varname>frame_t::buddy_order</varname> variables, finding buddy is
215
        done at constant time.</para>
218
          done at constant time.</para>
216
      </formalpara>
219
        </formalpara>
217
    </section>
220
      </section>
-
 
221
    </section>
218
 
222
 
219
    <section id="slab">
223
    <section id="slab">
220
      <title>Slab allocator</title>
224
      <title>Slab allocator</title>
221
 
225
 
222
      <section>
226
      <section>
223
        <title>Introduction</title>
227
        <title>Overview</title>
-
 
228
 
-
 
229
        <para><termdef><glossterm>Slab</glossterm> represents a contiguous
-
 
230
        piece of memory, usually made of several physically contiguous
-
 
231
        pages.</termdef> <termdef><glossterm>Slab cache</glossterm> consists
-
 
232
        of one or more slabs.</termdef></para>
224
 
233
 
225
        <para>The majority of memory allocation requests in the kernel are for
234
        <para>The majority of memory allocation requests in the kernel are for
226
        small, frequently used data structures. For this purpose the slab
235
        small, frequently used data structures. For this purpose the slab
227
        allocator is a perfect solution. The basic idea behind a slab
236
        allocator is a perfect solution. The basic idea behind the slab
228
        allocator is to have lists of commonly used objects available packed
237
        allocator is to have lists of commonly used objects available packed
229
        into pages. This avoids the overhead of allocating and destroying
238
        into pages. This avoids the overhead of allocating and destroying
230
        commonly used types of objects such as inodes, threads, virtual memory
239
        commonly used types of objects such threads, virtual memory structures
231
        structures etc.</para>
-
 
232
 
-
 
233
        <para>Original slab allocator locking mechanism has become a
240
        etc. Also due to the exact allocated size matching, slab allocation
234
        significant preformance bottleneck on SMP architectures. <termdef>Slab
-
 
235
        SMP perfromance bottleneck was resolved by introducing a per-CPU
-
 
236
        caching scheme called as <glossterm>magazine
-
 
237
        layer</glossterm></termdef>.</para>
241
        completely eliminates internal fragmentation issue.</para>
238
      </section>
242
      </section>
239
 
243
 
240
      <section>
244
      <section>
241
        <title>Implementation details (needs revision)</title>
245
        <title>Implementation</title>
242
 
246
 
243
        <para>The SLAB allocator is closely modelled after <ulink
247
        <para>The SLAB allocator is closely modelled after <ulink
244
        url="http://www.usenix.org/events/usenix01/full_papers/bonwick/bonwick_html/">
248
        url="http://www.usenix.org/events/usenix01/full_papers/bonwick/bonwick_html/">
245
        OpenSolaris SLAB allocator by Jeff Bonwick and Jonathan Adams </ulink>
249
        OpenSolaris SLAB allocator by Jeff Bonwick and Jonathan Adams </ulink>
246
        with the following exceptions: <itemizedlist>
250
        with the following exceptions: <itemizedlist>
247
            <listitem>
251
            <listitem>
248
               empty SLABS are deallocated immediately (in Linux they are kept in linked list, in Solaris ???)
252
               empty SLABS are deallocated immediately (in Linux they are kept in linked list, in Solaris ???)
249
            </listitem>
253
            </listitem>
250
 
254
 
251
            <listitem>
255
            <listitem>
252
               empty magazines are deallocated when not needed (in Solaris they are held in linked list in slab cache)
256
               empty magazines are deallocated when not needed (in Solaris they are held in linked list in slab cache)
253
            </listitem>
257
            </listitem>
254
          </itemizedlist> Following features are not currently supported but
258
          </itemizedlist> Following features are not currently supported but
255
        would be easy to do: <itemizedlist>
259
        would be easy to do: <itemizedlist>
256
            <listitem>
260
            <listitem>
257
               - cache coloring
261
               - cache coloring
258
            </listitem>
262
            </listitem>
259
 
263
 
260
            <listitem>
264
            <listitem>
261
               - dynamic magazine growing (different magazine sizes are already supported, but we would need to adjust allocation strategy)
265
               - dynamic magazine grow (different magazine sizes are already supported, but we would need to adjust allocation strategy)
262
            </listitem>
266
            </listitem>
263
          </itemizedlist></para>
267
          </itemizedlist></para>
264
 
268
 
265
        <para>The SLAB allocator supports per-CPU caches ('magazines') to
269
        <section>
266
        facilitate good SMP scaling.</para>
270
          <title>Magazine layer</title>
267
 
271
 
268
        <para>When a new object is being allocated, it is first checked, if it
272
          <para>Due to the extensive bottleneck on SMP architures, caused by
269
        is available in CPU-bound magazine. If it is not found there, it is
273
          global SLAB locking mechanism, making processing of all slab
270
        allocated from CPU-shared SLAB - if partial full is found, it is used,
274
          allocation requests serialized, a new layer was introduced to the
271
        otherwise a new one is allocated.</para>
-
 
272
 
-
 
273
        <para>When an object is being deallocated, it is put to CPU-bound
275
          classic slab allocator design. Slab allocator was extended to
274
        magazine. If there is no such magazine, new one is allocated (if it
276
          support per-CPU caches 'magazines' to achieve good SMP scaling.
275
        fails, the object is deallocated into SLAB). If the magazine is full,
277
          <termdef>Slab SMP perfromance bottleneck was resolved by introducing
276
        it is put into cpu-shared list of magazines and new one is
278
          a per-CPU caching scheme called as <glossterm>magazine
277
        allocated.</para>
279
          layer</glossterm></termdef>.</para>
278
 
280
 
-
 
281
          <para>Magazine is a N-element cache of objects, so each magazine can
-
 
282
          satisfy N allocations. Magazine behaves like a automatic weapon
-
 
283
          magazine (LIFO, stack), so the allocation/deallocation become simple
-
 
284
          push/pop pointer operation. Trick is that CPU does not access global
-
 
285
          slab allocator data during the allocation from its magazine, thus
-
 
286
          making possible parallel allocations between CPUs.</para>
-
 
287
 
-
 
288
          <para>Implementation also requires adding another feature as the
279
        <para>The CPU-bound magazine is actually a pair of magazines to avoid
289
          CPU-bound magazine is actually a pair of magazines to avoid
280
        thrashing when somebody is allocating/deallocating 1 item at the
290
          thrashing when during allocation/deallocatiion of 1 item at the
281
        magazine size boundary. LIFO order is enforced, which should avoid
291
          magazine size boundary. LIFO order is enforced, which should avoid
282
        fragmentation as much as possible.</para>
292
          fragmentation as much as possible.</para>
283
 
293
 
-
 
294
          <para>Another important entity of magazine layer is a full magazine
-
 
295
          depot, that stores full magazines which are used by any of the CPU
-
 
296
          magazine caches to reload active CPU magazine. Magazine depot can be
-
 
297
          pre-filled with full magazines during initialization, but in current
-
 
298
          implementation it is filled during object deallocation, when CPU
-
 
299
          magazine becomes full.</para>
-
 
300
 
-
 
301
          <para>Slab allocator control structures are allocated from special
-
 
302
          slabs, that are marked by special flag, indicating that it should
-
 
303
          not be used for slab magazine layer. This is done to avoid possible
-
 
304
          infinite recursions and deadlock during conventional slab allocaiton
-
 
305
          requests.</para>
-
 
306
        </section>
-
 
307
 
-
 
308
        <section>
-
 
309
          <title>Allocation/deallocation</title>
-
 
310
 
284
        <para>Every cache contains list of full slabs and list of partialy
311
          <para>Every cache contains list of full slabs and list of partialy
285
        full slabs. Empty SLABS are immediately freed (thrashing will be
312
          full slabs. Empty slabs are immediately freed (thrashing will be
286
        avoided because of magazines).</para>
313
          avoided because of magazines).</para>
287
 
314
 
288
        <para>The SLAB information structure is kept inside the data area, if
-
 
289
        possible. The cache can be marked that it should not use magazines.
-
 
290
        This is used only for SLAB related caches to avoid deadlocks and
-
 
291
        infinite recursion (the SLAB allocator uses itself for allocating all
-
 
292
        it's control structures).</para>
-
 
293
 
-
 
294
        <para>The SLAB allocator allocates lots of space and does not free it.
315
          <para>The SLAB allocator allocates lots of space and does not free
295
        When frame allocator fails to allocate the frame, it calls
316
          it. When frame allocator fails to allocate the frame, it calls
296
        slab_reclaim(). It tries 'light reclaim' first, then brutal reclaim.
317
          slab_reclaim(). It tries 'light reclaim' first, then brutal reclaim.
297
        The light reclaim releases slabs from cpu-shared magazine-list, until
318
          The light reclaim releases slabs from cpu-shared magazine-list,
298
        at least 1 slab is deallocated in each cache (this algorithm should
319
          until at least 1 slab is deallocated in each cache (this algorithm
299
        probably change). The brutal reclaim removes all cached objects, even
320
          should probably change). The brutal reclaim removes all cached
300
        from CPU-bound magazines.</para>
321
          objects, even from CPU-bound magazines.</para>
301
 
322
 
302
        <para>TODO: <itemizedlist>
-
 
303
            <listitem>
323
          <formalpara>
304
               For better CPU-scaling the magazine allocation strategy should be extended. Currently, if the cache does not have magazine, it asks for non-cpu cached magazine cache to provide one. It might be feasible to add cpu-cached magazine cache (which would allocate it's magazines from non-cpu-cached mag. cache). This would provide a nice per-cpu buffer. The other possibility is to use the per-cache 'empty-magazine-list', which decreases competing for 1 per-system magazine cache.
324
            <title>Allocation</title>
305
            </listitem>
-
 
306
 
325
 
-
 
326
            <para><emphasis>Step 1.</emphasis> When it comes to the allocation
-
 
327
            request, slab allocator first of all checks availability of memory
-
 
328
            in local CPU-bound magazine. If it is there, we would just "pop"
-
 
329
            the CPU magazine and return the pointer to object.</para>
-
 
330
 
-
 
331
            <para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is
-
 
332
            empty, allocator will attempt to reload magazin, swapping it with
-
 
333
            second CPU magazine and returns to the first step.</para>
-
 
334
 
-
 
335
            <para><emphasis>Step 3.</emphasis> Now we are in the situation
-
 
336
            when both CPU-bound magazines are empty, which makes allocator to
-
 
337
            access shared full-magazines depot to reload CPU-bound magazines.
-
 
338
            If reload is succesful (meaning there are full magazines in depot)
-
 
339
            algoritm continues at Step 1.</para>
-
 
340
 
-
 
341
            <para><emphasis>Step 4.</emphasis> Final step of the allocation.
-
 
342
            In this step object is allocated from the conventional slab layer
-
 
343
            and pointer is returned.</para>
-
 
344
          </formalpara>
-
 
345
 
307
            <listitem>
346
          <formalpara>
-
 
347
            <title>Deallocation</title>
-
 
348
 
-
 
349
            <para><emphasis>Step 1.</emphasis> During deallocation request,
308
               - it might be good to add granularity of locks even to slab level, we could then try_spinlock over all partial slabs and thus improve scalability even on slab level
350
            slab allocator will check if the local CPU-bound magazine is not
-
 
351
            full. In this case we will just push the pointer to this
309
            </listitem>
352
            magazine.</para>
-
 
353
 
-
 
354
            <para><emphasis>Step 2.</emphasis> If the CPU-bound magazine is
-
 
355
            full, allocator will attempt to reload magazin, swapping it with
-
 
356
            second CPU magazine and returns to the first step.</para>
-
 
357
 
-
 
358
            <para><emphasis>Step 3.</emphasis> Now we are in the situation
-
 
359
            when both CPU-bound magazines are full, which makes allocator to
-
 
360
            access shared full-magazines depot to put one of the magazines to
-
 
361
            the depot and creating new empty magazine. Algoritm continues at
310
          </itemizedlist></para>
362
            Step 1.</para>
-
 
363
          </formalpara>
-
 
364
        </section>
311
      </section>
365
      </section>
312
    </section>
366
    </section>
313
 
367
 
314
    <!-- End of Physmem -->
368
    <!-- End of Physmem -->
315
  </section>
369
  </section>
316
 
370
 
317
  <section>
371
  <section>
318
    <title>Memory sharing</title>
372
    <title>Memory sharing</title>
319
 
373
 
320
    <para>Not implemented yet(?)</para>
374
    <para>Not implemented yet(?)</para>
321
  </section>
375
  </section>
322
</chapter>
376
</chapter>