Subversion Repositories HelenOS-doc

Rev

Rev 82 | Rev 84 | Go to most recent revision | Only display areas with differences | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 82 Rev 83
1
<?xml version="1.0" encoding="UTF-8"?>
1
<?xml version="1.0" encoding="UTF-8"?>
2
<chapter id="mm">
2
<chapter id="mm">
3
  <?dbhtml filename="mm.html"?>
3
  <?dbhtml filename="mm.html"?>
4
 
4
 
5
  <title>Memory management</title>
5
  <title>Memory management</title>
6
 
6
 
7
  <para>In previous chapters, this book described the scheduling subsystem as
7
  <para>In previous chapters, this book described the scheduling subsystem as
8
  the creator of the impression that threads execute in parallel. The memory
8
  the creator of the impression that threads execute in parallel. The memory
9
  management subsystem, on the other hand, creates the impression that there
9
  management subsystem, on the other hand, creates the impression that there
10
  is enough physical memory for the kernel and that userspace tasks have the
10
  is enough physical memory for the kernel and that userspace tasks have the
11
  entire address space only for themselves.</para>
11
  entire address space only for themselves.</para>
12
 
12
 
13
  <section>
13
  <section>
14
    <title>Physical memory management</title>
14
    <title>Physical memory management</title>
15
 
15
 
16
    <section id="zones_and_frames">
16
    <section id="zones_and_frames">
17
      <title>Zones and frames</title>
17
      <title>Zones and frames</title>
18
 
18
 
19
      <para>HelenOS represents continuous areas of physical memory in
19
      <para>HelenOS represents continuous areas of physical memory in
20
      structures called frame zones (abbreviated as zones). Each zone contains
20
      structures called frame zones (abbreviated as zones). Each zone contains
21
      information about the number of allocated and unallocated physical
21
      information about the number of allocated and unallocated physical
22
      memory frames as well as the physical base address of the zone and
22
      memory frames as well as the physical base address of the zone and
23
      number of frames contained in it. A zone also contains an array of frame
23
      number of frames contained in it. A zone also contains an array of frame
24
      structures describing each frame of the zone and, in the last, but not
24
      structures describing each frame of the zone and, in the last, but not
25
      the least important, front, each zone is equipped with a buddy system
25
      the least important, front, each zone is equipped with a buddy system
26
      that faciliates effective allocation of power-of-two sized block of
26
      that faciliates effective allocation of power-of-two sized block of
27
      frames.</para>
27
      frames.</para>
28
 
28
 
29
      <para>This organization of physical memory provides good preconditions
29
      <para>This organization of physical memory provides good preconditions
30
      for hot-plugging of more zones. There is also one currently unused zone
30
      for hot-plugging of more zones. There is also one currently unused zone
31
      attribute: <code>flags</code>. The attribute could be used to give a
31
      attribute: <code>flags</code>. The attribute could be used to give a
32
      special meaning to some zones in the future.</para>
32
      special meaning to some zones in the future.</para>
33
 
33
 
34
      <para>The zones are linked in a doubly-linked list. This might seem a
34
      <para>The zones are linked in a doubly-linked list. This might seem a
35
      bit ineffective because the zone list is walked everytime a frame is
35
      bit ineffective because the zone list is walked everytime a frame is
36
      allocated or deallocated. However, this does not represent a significant
36
      allocated or deallocated. However, this does not represent a significant
37
      performance problem as it is expected that the number of zones will be
37
      performance problem as it is expected that the number of zones will be
38
      rather low. Moreover, most architectures merge all zones into
38
      rather low. Moreover, most architectures merge all zones into
39
      one.</para>
39
      one.</para>
40
 
40
 
41
      <para>Every physical memory frame in a zone, is described by a structure
41
      <para>Every physical memory frame in a zone, is described by a structure
42
      that contains number of references and other data used by buddy
42
      that contains number of references and other data used by buddy
43
      system.</para>
43
      system.</para>
44
    </section>
44
    </section>
45
 
45
 
46
    <section id="frame_allocator">
46
    <section id="frame_allocator">
47
      <indexterm>
47
      <indexterm>
48
        <primary>frame allocator</primary>
48
        <primary>frame allocator</primary>
49
      </indexterm>
49
      </indexterm>
50
 
50
 
51
      <title>Frame allocator</title>
51
      <title>Frame allocator</title>
52
 
52
 
53
      <para>The frame allocator satisfies kernel requests to allocate
53
      <para>The frame allocator satisfies kernel requests to allocate
54
      power-of-two sized blocks of physical memory. Because of zonal
54
      power-of-two sized blocks of physical memory. Because of zonal
55
      organization of physical memory, the frame allocator is always working
55
      organization of physical memory, the frame allocator is always working
56
      within a context of a particular frame zone. In order to carry out the
56
      within a context of a particular frame zone. In order to carry out the
57
      allocation requests, the frame allocator is tightly integrated with the
57
      allocation requests, the frame allocator is tightly integrated with the
58
      buddy system belonging to the zone. The frame allocator is also
58
      buddy system belonging to the zone. The frame allocator is also
59
      responsible for updating information about the number of free and busy
59
      responsible for updating information about the number of free and busy
60
      frames in the zone. <figure>
60
      frames in the zone. <figure>
61
          <mediaobject id="frame_alloc">
61
          <mediaobject id="frame_alloc">
62
            <imageobject role="eps">
62
            <imageobject role="eps">
63
              <imagedata fileref="images.vector/frame_alloc.eps" format="EPS" />
63
              <imagedata fileref="images.vector/frame_alloc.eps" format="EPS" />
64
            </imageobject>
64
            </imageobject>
65
 
65
 
66
            <imageobject role="html">
66
            <imageobject role="html">
67
              <imagedata fileref="images/frame_alloc.png" format="PNG" />
67
              <imagedata fileref="images/frame_alloc.png" format="PNG" />
68
            </imageobject>
68
            </imageobject>
69
 
69
 
70
            <imageobject role="fop">
70
            <imageobject role="fop">
71
              <imagedata fileref="images.vector/frame_alloc.svg" format="SVG" />
71
              <imagedata fileref="images.vector/frame_alloc.svg" format="SVG" />
72
            </imageobject>
72
            </imageobject>
73
          </mediaobject>
73
          </mediaobject>
74
 
74
 
75
          <title>Frame allocator scheme.</title>
75
          <title>Frame allocator scheme.</title>
76
        </figure></para>
76
        </figure></para>
77
 
77
 
78
      <formalpara>
78
      <formalpara>
79
        <title>Allocation / deallocation</title>
79
        <title>Allocation / deallocation</title>
80
 
80
 
81
        <para>Upon allocation request via function <code>frame_alloc()</code>,
81
        <para>Upon allocation request via function <code>frame_alloc()</code>,
82
        the frame allocator first tries to find a zone that can satisfy the
82
        the frame allocator first tries to find a zone that can satisfy the
83
        request (i.e. has the required amount of free frames). Once a suitable
83
        request (i.e. has the required amount of free frames). Once a suitable
84
        zone is found, the frame allocator uses the buddy allocator on the
84
        zone is found, the frame allocator uses the buddy allocator on the
85
        zone's buddy system to perform the allocation. During deallocation,
85
        zone's buddy system to perform the allocation. During deallocation,
86
        which is triggered by a call to <code>frame_free()</code>, the frame
86
        which is triggered by a call to <code>frame_free()</code>, the frame
87
        allocator looks up the respective zone that contains the frame being
87
        allocator looks up the respective zone that contains the frame being
88
        deallocated. Afterwards, it calls the buddy allocator again, this time
88
        deallocated. Afterwards, it calls the buddy allocator again, this time
89
        to take care of deallocation within the zone's buddy system.</para>
89
        to take care of deallocation within the zone's buddy system.</para>
90
      </formalpara>
90
      </formalpara>
91
    </section>
91
    </section>
92
 
92
 
93
    <section id="buddy_allocator">
93
    <section id="buddy_allocator">
94
      <indexterm>
94
      <indexterm>
95
        <primary>buddy system</primary>
95
        <primary>buddy system</primary>
96
      </indexterm>
96
      </indexterm>
97
 
97
 
98
      <title>Buddy allocator</title>
98
      <title>Buddy allocator</title>
99
 
99
 
100
      <para>In the buddy system, the memory is broken down into power-of-two
100
      <para>In the buddy system, the memory is broken down into power-of-two
101
      sized naturally aligned blocks. These blocks are organized in an array
101
      sized naturally aligned blocks. These blocks are organized in an array
102
      of lists, in which the list with index <emphasis>i</emphasis> contains
102
      of lists, in which the list with index <emphasis>i</emphasis> contains
103
      all unallocated blocks of size
103
      all unallocated blocks of size
104
      <emphasis>2<superscript>i</superscript></emphasis>. The index
104
      <emphasis>2<superscript>i</superscript></emphasis>. The index
105
      <emphasis>i</emphasis> is called the order of block. Should there be two
105
      <emphasis>i</emphasis> is called the order of block. Should there be two
106
      adjacent equally sized blocks in the list <emphasis>i</emphasis> (i.e.
106
      adjacent equally sized blocks in the list <emphasis>i</emphasis> (i.e.
107
      buddies), the buddy allocator would coalesce them and put the resulting
107
      buddies), the buddy allocator would coalesce them and put the resulting
108
      block in list <emphasis>i + 1</emphasis>, provided that the resulting
108
      block in list <emphasis>i + 1</emphasis>, provided that the resulting
109
      block would be naturally aligned. Similarily, when the allocator is
109
      block would be naturally aligned. Similarily, when the allocator is
110
      asked to allocate a block of size
110
      asked to allocate a block of size
111
      <emphasis>2<superscript>i</superscript></emphasis>, it first tries to
111
      <emphasis>2<superscript>i</superscript></emphasis>, it first tries to
112
      satisfy the request from the list with index <emphasis>i</emphasis>. If
112
      satisfy the request from the list with index <emphasis>i</emphasis>. If
113
      the request cannot be satisfied (i.e. the list <emphasis>i</emphasis> is
113
      the request cannot be satisfied (i.e. the list <emphasis>i</emphasis> is
114
      empty), the buddy allocator will try to allocate and split a larger
114
      empty), the buddy allocator will try to allocate and split a larger
115
      block from the list with index <emphasis>i + 1</emphasis>. Both of these
115
      block from the list with index <emphasis>i + 1</emphasis>. Both of these
116
      algorithms are recursive. The recursion ends either when there are no
116
      algorithms are recursive. The recursion ends either when there are no
117
      blocks to coalesce in the former case or when there are no blocks that
117
      blocks to coalesce in the former case or when there are no blocks that
118
      can be split in the latter case.</para>
118
      can be split in the latter case.</para>
119
 
119
 
120
      <para>This approach greatly reduces external fragmentation of memory and
120
      <para>This approach greatly reduces external fragmentation of memory and
121
      helps in allocating bigger continuous blocks of memory aligned to their
121
      helps in allocating bigger continuous blocks of memory aligned to their
122
      size. On the other hand, the buddy allocator suffers increased internal
122
      size. On the other hand, the buddy allocator suffers increased internal
123
      fragmentation of memory and is not suitable for general kernel
123
      fragmentation of memory and is not suitable for general kernel
124
      allocations. This purpose is better addressed by the <link
124
      allocations. This purpose is better addressed by the <link
125
      linkend="slab">slab allocator</link>.<figure>
125
      linkend="slab">slab allocator</link>.<figure>
126
          <mediaobject id="buddy_alloc">
126
          <mediaobject id="buddy_alloc">
127
            <imageobject role="eps">
127
            <imageobject role="eps">
128
              <imagedata fileref="images.vector/buddy_alloc.eps" format="EPS" />
128
              <imagedata fileref="images.vector/buddy_alloc.eps" format="EPS" />
129
            </imageobject>
129
            </imageobject>
130
 
130
 
131
            <imageobject role="html">
131
            <imageobject role="html">
132
              <imagedata fileref="images/buddy_alloc.png" format="PNG" />
132
              <imagedata fileref="images/buddy_alloc.png" format="PNG" />
133
            </imageobject>
133
            </imageobject>
134
 
134
 
135
            <imageobject role="fop">
135
            <imageobject role="fop">
136
              <imagedata fileref="images.vector/buddy_alloc.svg" format="SVG" />
136
              <imagedata fileref="images.vector/buddy_alloc.svg" format="SVG" />
137
            </imageobject>
137
            </imageobject>
138
          </mediaobject>
138
          </mediaobject>
139
 
139
 
140
          <title>Buddy system scheme.</title>
140
          <title>Buddy system scheme.</title>
141
        </figure></para>
141
        </figure></para>
142
 
142
 
143
      <section>
143
      <section>
144
        <title>Implementation</title>
144
        <title>Implementation</title>
145
 
145
 
146
        <para>The buddy allocator is, in fact, an abstract framework wich can
146
        <para>The buddy allocator is, in fact, an abstract framework wich can
147
        be easily specialized to serve one particular task. It knows nothing
147
        be easily specialized to serve one particular task. It knows nothing
148
        about the nature of memory it helps to allocate. In order to beat the
148
        about the nature of memory it helps to allocate. In order to beat the
149
        lack of this knowledge, the buddy allocator exports an interface that
149
        lack of this knowledge, the buddy allocator exports an interface that
150
        each of its clients is required to implement. When supplied with an
150
        each of its clients is required to implement. When supplied with an
151
        implementation of this interface, the buddy allocator can use
151
        implementation of this interface, the buddy allocator can use
152
        specialized external functions to find a buddy for a block, split and
152
        specialized external functions to find a buddy for a block, split and
153
        coalesce blocks, manipulate block order and mark blocks busy or
153
        coalesce blocks, manipulate block order and mark blocks busy or
154
        available.</para>
154
        available.</para>
155
 
155
 
156
        <formalpara>
156
        <formalpara>
157
          <title>Data organization</title>
157
          <title>Data organization</title>
158
 
158
 
159
          <para>Each entity allocable by the buddy allocator is required to
159
          <para>Each entity allocable by the buddy allocator is required to
160
          contain space for storing block order number and a link variable
160
          contain space for storing block order number and a link variable
161
          used to interconnect blocks within the same order.</para>
161
          used to interconnect blocks within the same order.</para>
162
 
162
 
163
          <para>Whatever entities are allocated by the buddy allocator, the
163
          <para>Whatever entities are allocated by the buddy allocator, the
164
          first entity within a block is used to represent the entire block.
164
          first entity within a block is used to represent the entire block.
165
          The first entity keeps the order of the whole block. Other entities
165
          The first entity keeps the order of the whole block. Other entities
166
          within the block are assigned the magic value
166
          within the block are assigned the magic value
167
          <constant>BUDDY_INNER_BLOCK</constant>. This is especially important
167
          <constant>BUDDY_INNER_BLOCK</constant>. This is especially important
168
          for effective identification of buddies in a one-dimensional array
168
          for effective identification of buddies in a one-dimensional array
169
          because the entity that represents a potential buddy cannot be
169
          because the entity that represents a potential buddy cannot be
170
          associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it
170
          associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it
171
          is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is
171
          is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is
172
          not a buddy).</para>
172
          not a buddy).</para>
173
        </formalpara>
173
        </formalpara>
174
      </section>
174
      </section>
175
    </section>
175
    </section>
176
 
176
 
177
    <section id="slab">
177
    <section id="slab">
178
      <indexterm>
178
      <indexterm>
179
        <primary>slab allocator</primary>
179
        <primary>slab allocator</primary>
180
      </indexterm>
180
      </indexterm>
181
 
181
 
182
      <title>Slab allocator</title>
182
      <title>Slab allocator</title>
183
 
183
 
184
      <para>The majority of memory allocation requests in the kernel is for
184
      <para>The majority of memory allocation requests in the kernel is for
185
      small, frequently used data structures. The basic idea behind the slab
185
      small, frequently used data structures. The basic idea behind the slab
186
      allocator is that commonly used objects are preallocated in continuous
186
      allocator is that commonly used objects are preallocated in continuous
187
      areas of physical memory called slabs<footnote>
187
      areas of physical memory called slabs<footnote>
188
          <para>Slabs are in fact blocks of physical memory frames allocated
188
          <para>Slabs are in fact blocks of physical memory frames allocated
189
          from the frame allocator.</para>
189
          from the frame allocator.</para>
190
        </footnote>. Whenever an object is to be allocated, the slab allocator
190
        </footnote>. Whenever an object is to be allocated, the slab allocator
191
      returns the first available item from a suitable slab corresponding to
191
      returns the first available item from a suitable slab corresponding to
192
      the object type<footnote>
192
      the object type<footnote>
193
          <para>The mechanism is rather more complicated, see the next
193
          <para>The mechanism is rather more complicated, see the next
194
          paragraph.</para>
194
          paragraph.</para>
195
        </footnote>. Due to the fact that the sizes of the requested and
195
        </footnote>. Due to the fact that the sizes of the requested and
196
      allocated object match, the slab allocator significantly reduces
196
      allocated object match, the slab allocator significantly reduces
197
      internal fragmentation.</para>
197
      internal fragmentation.</para>
198
 
198
 
199
      <indexterm>
199
      <indexterm>
200
        <primary>slab allocator</primary>
200
        <primary>slab allocator</primary>
201
 
201
 
202
        <secondary>- slab cache</secondary>
202
        <secondary>- slab cache</secondary>
203
      </indexterm>
203
      </indexterm>
204
 
204
 
205
      <para>Slabs of one object type are organized in a structure called slab
205
      <para>Slabs of one object type are organized in a structure called slab
206
      cache. There are ususally more slabs in the slab cache, depending on
206
      cache. There are ususally more slabs in the slab cache, depending on
207
      previous allocations. If the the slab cache runs out of available slabs,
207
      previous allocations. If the the slab cache runs out of available slabs,
208
      new slabs are allocated. In order to exploit parallelism and to avoid
208
      new slabs are allocated. In order to exploit parallelism and to avoid
209
      locking of shared spinlocks, slab caches can have variants of
209
      locking of shared spinlocks, slab caches can have variants of
210
      processor-private slabs called magazines. On each processor, there is a
210
      processor-private slabs called magazines. On each processor, there is a
211
      two-magazine cache. Full magazines that are not part of any
211
      two-magazine cache. Full magazines that are not part of any
212
      per-processor magazine cache are stored in a global list of full
212
      per-processor magazine cache are stored in a global list of full
213
      magazines.</para>
213
      magazines.</para>
214
 
214
 
215
      <indexterm>
215
      <indexterm>
216
        <primary>slab allocator</primary>
216
        <primary>slab allocator</primary>
217
 
217
 
218
        <secondary>- magazine</secondary>
218
        <secondary>- magazine</secondary>
219
      </indexterm>
219
      </indexterm>
220
 
220
 
221
      <para>Each object begins its life in a slab. When it is allocated from
221
      <para>Each object begins its life in a slab. When it is allocated from
222
      there, the slab allocator calls a constructor that is registered in the
222
      there, the slab allocator calls a constructor that is registered in the
223
      respective slab cache. The constructor initializes and brings the object
223
      respective slab cache. The constructor initializes and brings the object
224
      into a known state. The object is then used by the user. When the user
224
      into a known state. The object is then used by the user. When the user
225
      later frees the object, the slab allocator puts it into a processor
225
      later frees the object, the slab allocator puts it into a processor
226
      private <indexterm>
226
      private <indexterm>
227
          <primary>slab allocator</primary>
227
          <primary>slab allocator</primary>
228
 
228
 
229
          <secondary>- magazine</secondary>
229
          <secondary>- magazine</secondary>
230
        </indexterm>magazine cache, from where it can be precedently allocated
230
        </indexterm>magazine cache, from where it can be precedently allocated
231
      again. Note that allocations satisfied from a magazine are already
231
      again. Note that allocations satisfied from a magazine are already
232
      initialized by the constructor. When both of the processor cached
232
      initialized by the constructor. When both of the processor cached
233
      magazines get full, the allocator will move one of the magazines to the
233
      magazines get full, the allocator will move one of the magazines to the
234
      list of full magazines. Similarily, when allocating from an empty
234
      list of full magazines. Similarily, when allocating from an empty
235
      processor magazine cache, the kernel will reload only one magazine from
235
      processor magazine cache, the kernel will reload only one magazine from
236
      the list of full magazines. In other words, the slab allocator tries to
236
      the list of full magazines. In other words, the slab allocator tries to
237
      keep the processor magazine cache only half-full in order to prevent
237
      keep the processor magazine cache only half-full in order to prevent
238
      thrashing when allocations and deallocations interleave on magazine
238
      thrashing when allocations and deallocations interleave on magazine
239
      boundaries. The advantage of this setup is that during most of the
239
      boundaries. The advantage of this setup is that during most of the
240
      allocations, no global spinlock needs to be held.</para>
240
      allocations, no global spinlock needs to be held.</para>
241
 
241
 
242
      <para>Should HelenOS run short of memory, it would start deallocating
242
      <para>Should HelenOS run short of memory, it would start deallocating
243
      objects from magazines, calling slab cache destructor on them and
243
      objects from magazines, calling slab cache destructor on them and
244
      putting them back into slabs. When a slab contanins no allocated object,
244
      putting them back into slabs. When a slab contanins no allocated object,
245
      it is immediately freed.</para>
245
      it is immediately freed.</para>
246
 
246
 
247
      <para>
247
      <para>
248
        <figure>
248
        <figure>
249
          <mediaobject id="slab_alloc">
249
          <mediaobject id="slab_alloc">
250
            <imageobject role="eps">
250
            <imageobject role="eps">
251
              <imagedata fileref="images.vector/slab_alloc.eps" format="EPS" />
251
              <imagedata fileref="images.vector/slab_alloc.eps" format="EPS" />
252
            </imageobject>
252
            </imageobject>
253
 
253
 
254
            <imageobject role="html">
254
            <imageobject role="html">
255
              <imagedata fileref="images/slab_alloc.png" format="PNG" />
255
              <imagedata fileref="images/slab_alloc.png" format="PNG" />
256
            </imageobject>
256
            </imageobject>
257
 
257
 
258
            <imageobject role="fop">
258
            <imageobject role="fop">
259
              <imagedata fileref="images.vector/slab_alloc.svg" format="SVG" />
259
              <imagedata fileref="images.vector/slab_alloc.svg" format="SVG" />
260
            </imageobject>
260
            </imageobject>
261
          </mediaobject>
261
          </mediaobject>
262
 
262
 
263
          <title>Slab allocator scheme.</title>
263
          <title>Slab allocator scheme.</title>
264
        </figure>
264
        </figure>
265
      </para>
265
      </para>
266
 
266
 
267
      <section>
267
      <section>
268
        <title>Implementation</title>
268
        <title>Implementation</title>
269
 
269
 
270
        <para>The slab allocator is closely modelled after OpenSolaris slab
270
        <para>The slab allocator is closely modelled after <xref
271
        allocator by Jeff Bonwick and Jonathan Adams <xref
-
 
272
        linkend="Bonwick01" /> with the following exceptions:<itemizedlist>
271
        linkend="Bonwick01" /> with the following exceptions:<itemizedlist>
273
            <listitem>
272
            <listitem>
274
              <para>empty slabs are immediately deallocated and</para>
273
              <para>empty slabs are immediately deallocated and</para>
275
            </listitem>
274
            </listitem>
276
 
275
 
277
            <listitem>
276
            <listitem>
278
              <para>empty magazines are deallocated when not needed.</para>
277
              <para>empty magazines are deallocated when not needed.</para>
279
            </listitem>
278
            </listitem>
280
          </itemizedlist>The following features are not currently supported
279
          </itemizedlist>The following features are not currently supported
281
        but would be easy to do: <itemizedlist>
280
        but would be easy to do: <itemizedlist>
282
            <listitem>cache coloring and</listitem>
281
            <listitem>cache coloring and</listitem>
283
 
282
 
284
            <listitem>dynamic magazine grow (different magazine sizes are
283
            <listitem>dynamic magazine grow (different magazine sizes are
285
            already supported, but the allocation strategy would need to be
284
            already supported, but the allocation strategy would need to be
286
            adjusted).</listitem>
285
            adjusted).</listitem>
287
          </itemizedlist></para>
286
          </itemizedlist></para>
288
 
287
 
289
        <section>
288
        <section>
290
          <title>Allocation/deallocation</title>
289
          <title>Allocation/deallocation</title>
291
 
290
 
292
          <para>The following two paragraphs summarize and complete the
291
          <para>The following two paragraphs summarize and complete the
293
          description of the slab allocator operation (i.e.
292
          description of the slab allocator operation (i.e.
294
          <code>slab_alloc()</code> and <code>slab_free()</code>
293
          <code>slab_alloc()</code> and <code>slab_free()</code>
295
          functions).</para>
294
          functions).</para>
296
 
295
 
297
          <formalpara>
296
          <formalpara>
298
            <title>Allocation</title>
297
            <title>Allocation</title>
299
 
298
 
300
            <para><emphasis>Step 1.</emphasis> When an allocation request
299
            <para><emphasis>Step 1.</emphasis> When an allocation request
301
            comes, the slab allocator checks availability of memory in the
300
            comes, the slab allocator checks availability of memory in the
302
            current magazine of the local processor magazine cache. If the
301
            current magazine of the local processor magazine cache. If the
303
            available memory is there, the allocator just pops the object from
302
            available memory is there, the allocator just pops the object from
304
            magazine and returns it.</para>
303
            magazine and returns it.</para>
305
 
304
 
306
            <para><emphasis>Step 2.</emphasis> If the current magazine in the
305
            <para><emphasis>Step 2.</emphasis> If the current magazine in the
307
            processor magazine cache is empty, the allocator will attempt to
306
            processor magazine cache is empty, the allocator will attempt to
308
            swap it with the last magazine from the cache and return to the
307
            swap it with the last magazine from the cache and return to the
309
            first step. If also the last magazine is empty, the algorithm will
308
            first step. If also the last magazine is empty, the algorithm will
310
            fall through to Step 3.</para>
309
            fall through to Step 3.</para>
311
 
310
 
312
            <para><emphasis>Step 3.</emphasis> Now the allocator is in the
311
            <para><emphasis>Step 3.</emphasis> Now the allocator is in the
313
            situation when both magazines in the processor magazine cache are
312
            situation when both magazines in the processor magazine cache are
314
            empty. The allocator reloads one magazine from the shared list of
313
            empty. The allocator reloads one magazine from the shared list of
315
            full magazines. If the reload is successful (i.e. there are full
314
            full magazines. If the reload is successful (i.e. there are full
316
            magazines in the list), the algorithm continues with Step
315
            magazines in the list), the algorithm continues with Step
317
            1.</para>
316
            1.</para>
318
 
317
 
319
            <para><emphasis>Step 4.</emphasis> In this fail-safe step, an
318
            <para><emphasis>Step 4.</emphasis> In this fail-safe step, an
320
            object is allocated from the conventional slab layer and a pointer
319
            object is allocated from the conventional slab layer and a pointer
321
            to it is returned. If also the last magazine is full,</para>
320
            to it is returned. If also the last magazine is full,</para>
322
          </formalpara>
321
          </formalpara>
323
 
322
 
324
          <formalpara>
323
          <formalpara>
325
            <title>Deallocation</title>
324
            <title>Deallocation</title>
326
 
325
 
327
            <para><emphasis>Step 1.</emphasis> During a deallocation request,
326
            <para><emphasis>Step 1.</emphasis> During a deallocation request,
328
            the slab allocator checks if the current magazine of the local
327
            the slab allocator checks if the current magazine of the local
329
            processor magazine cache is not full. If it is, the pointer to the
328
            processor magazine cache is not full. If it is, the pointer to the
330
            objects is just pushed into the magazine and the algorithm
329
            objects is just pushed into the magazine and the algorithm
331
            returns.</para>
330
            returns.</para>
332
 
331
 
333
            <para><emphasis>Step 2.</emphasis> If the current magazine is
332
            <para><emphasis>Step 2.</emphasis> If the current magazine is
334
            full, the allocator will attempt to swap it with the last magazine
333
            full, the allocator will attempt to swap it with the last magazine
335
            from the cache and return to the first step. If also the last
334
            from the cache and return to the first step. If also the last
336
            magazine is empty, the algorithm will fall through to Step
335
            magazine is empty, the algorithm will fall through to Step
337
            3.</para>
336
            3.</para>
338
 
337
 
339
            <para><emphasis>Step 3.</emphasis> Now the allocator is in the
338
            <para><emphasis>Step 3.</emphasis> Now the allocator is in the
340
            situation when both magazines in the processor magazine cache are
339
            situation when both magazines in the processor magazine cache are
341
            full. The allocator tries to allocate a new empty magazine and
340
            full. The allocator tries to allocate a new empty magazine and
342
            flush one of the full magazines to the shared list of full
341
            flush one of the full magazines to the shared list of full
343
            magazines. If it is successfull, the algoritm continues with Step
342
            magazines. If it is successfull, the algoritm continues with Step
344
            1.</para>
343
            1.</para>
345
 
344
 
346
            <para><emphasis>Step 4. </emphasis>In case of low memory condition
345
            <para><emphasis>Step 4. </emphasis>In case of low memory condition
347
            when the allocation of empty magazine fails, the object is moved
346
            when the allocation of empty magazine fails, the object is moved
348
            directly into slab. In the worst case object deallocation does not
347
            directly into slab. In the worst case object deallocation does not
349
            need to allocate any additional memory.</para>
348
            need to allocate any additional memory.</para>
350
          </formalpara>
349
          </formalpara>
351
        </section>
350
        </section>
352
      </section>
351
      </section>
353
    </section>
352
    </section>
354
  </section>
353
  </section>
355
 
354
 
356
  <section>
355
  <section>
357
    <title>Virtual memory management</title>
356
    <title>Virtual memory management</title>
358
 
357
 
359
    <para>Virtual memory is essential for an operating system because it makes
358
    <para>Virtual memory is essential for an operating system because it makes
360
    several things possible. First, it helps to isolate tasks from each other
359
    several things possible. First, it helps to isolate tasks from each other
361
    by encapsulating them in their private address spaces. Second, virtual
360
    by encapsulating them in their private address spaces. Second, virtual
362
    memory can give tasks the feeling of more memory available than is
361
    memory can give tasks the feeling of more memory available than is
363
    actually possible. And third, by using virtual memory, there might be
362
    actually possible. And third, by using virtual memory, there might be
364
    multiple copies of the same program, linked to the same addresses, running
363
    multiple copies of the same program, linked to the same addresses, running
365
    in the system. There are at least two known mechanisms for implementing
364
    in the system. There are at least two known mechanisms for implementing
366
    virtual memory: segmentation and paging. Even though some processor
365
    virtual memory: segmentation and paging. Even though some processor
367
    architectures supported by HelenOS<footnote>
366
    architectures supported by HelenOS<footnote>
368
        <para>ia32 has full-fledged segmentation.</para>
367
        <para>ia32 has full-fledged segmentation.</para>
369
      </footnote> provide both mechanism, the kernel makes use solely of
368
      </footnote> provide both mechanism, the kernel makes use solely of
370
    paging.</para>
369
    paging.</para>
371
 
370
 
372
    <section id="paging">
371
    <section id="paging">
373
      <title>VAT subsystem</title>
372
      <title>VAT subsystem</title>
374
 
373
 
375
      <para>In a paged virtual memory, the entire virtual address space is
374
      <para>In a paged virtual memory, the entire virtual address space is
376
      divided into small power-of-two sized naturally aligned blocks called
375
      divided into small power-of-two sized naturally aligned blocks called
377
      pages. The processor implements a translation mechanism, that allows the
376
      pages. The processor implements a translation mechanism, that allows the
378
      operating system to manage mappings between set of pages and set of
377
      operating system to manage mappings between set of pages and set of
379
      indentically sized and identically aligned pieces of physical memory
378
      indentically sized and identically aligned pieces of physical memory
380
      called frames. In a result, references to continuous virtual memory
379
      called frames. In a result, references to continuous virtual memory
381
      areas don't necessarily need to reference continuos area of physical
380
      areas don't necessarily need to reference continuos area of physical
382
      memory. Supported page sizes usually range from several kilobytes to
381
      memory. Supported page sizes usually range from several kilobytes to
383
      several megabytes. Each page that takes part in the mapping is
382
      several megabytes. Each page that takes part in the mapping is
384
      associated with certain attributes that further desribe the mapping
383
      associated with certain attributes that further desribe the mapping
385
      (e.g. access rights, dirty and accessed bits and present bit).</para>
384
      (e.g. access rights, dirty and accessed bits and present bit).</para>
386
 
385
 
387
      <para>When the processor accesses a page that is not present (i.e. its
386
      <para>When the processor accesses a page that is not present (i.e. its
388
      present bit is not set), the operating system is notified through a
387
      present bit is not set), the operating system is notified through a
389
      special exception called page fault. It is then up to the operating
388
      special exception called page fault. It is then up to the operating
390
      system to service the page fault. In HelenOS, some page faults are fatal
389
      system to service the page fault. In HelenOS, some page faults are fatal
391
      and result in either task termination or, in the worse case, kernel
390
      and result in either task termination or, in the worse case, kernel
392
      panic<footnote>
391
      panic<footnote>
393
          <para>Such a condition would be either caused by a hardware failure
392
          <para>Such a condition would be either caused by a hardware failure
394
          or a bug in the kernel.</para>
393
          or a bug in the kernel.</para>
395
        </footnote>, while other page faults are used to load memory on demand
394
        </footnote>, while other page faults are used to load memory on demand
396
      or to notify the kernel about certain events.</para>
395
      or to notify the kernel about certain events.</para>
397
 
396
 
398
      <indexterm>
397
      <indexterm>
399
        <primary>page tables</primary>
398
        <primary>page tables</primary>
400
      </indexterm>
399
      </indexterm>
401
 
400
 
402
      <para>The set of all page mappings is stored in a memory structure
401
      <para>The set of all page mappings is stored in a memory structure
403
      called page tables. Some architectures have no hardware support for page
402
      called page tables. Some architectures have no hardware support for page
404
      tables<footnote>
403
      tables<footnote>
405
          <para>On mips32, TLB-only model is used and the operating system is
404
          <para>On mips32, TLB-only model is used and the operating system is
406
          responsible for managing software defined page tables.</para>
405
          responsible for managing software defined page tables.</para>
407
        </footnote> while other processor architectures<footnote>
406
        </footnote> while other processor architectures<footnote>
408
          <para>Like amd64 and ia32.</para>
407
          <para>Like amd64 and ia32.</para>
409
        </footnote> understand the whole memory format thereof. Despite all
408
        </footnote> understand the whole memory format thereof. Despite all
410
      the possible differences in page table formats, the HelenOS VAT
409
      the possible differences in page table formats, the HelenOS VAT
411
      subsystem<footnote>
410
      subsystem<footnote>
412
          <para>Virtual Address Translation subsystem.</para>
411
          <para>Virtual Address Translation subsystem.</para>
413
        </footnote> unifies the page table operations under one programming
412
        </footnote> unifies the page table operations under one programming
414
      interface. For all parts of the kernel, three basic functions are
413
      interface. For all parts of the kernel, three basic functions are
415
      provided:</para>
414
      provided:</para>
416
 
415
 
417
      <itemizedlist>
416
      <itemizedlist>
418
        <listitem>
417
        <listitem>
419
          <para><code>page_mapping_insert()</code>,</para>
418
          <para><code>page_mapping_insert()</code>,</para>
420
        </listitem>
419
        </listitem>
421
 
420
 
422
        <listitem>
421
        <listitem>
423
          <para><code>page_mapping_find()</code> and</para>
422
          <para><code>page_mapping_find()</code> and</para>
424
        </listitem>
423
        </listitem>
425
 
424
 
426
        <listitem>
425
        <listitem>
427
          <para><code>page_mapping_remove()</code>.</para>
426
          <para><code>page_mapping_remove()</code>.</para>
428
        </listitem>
427
        </listitem>
429
      </itemizedlist>
428
      </itemizedlist>
430
 
429
 
431
      <para>The <code>page_mapping_insert()</code> function is used to
430
      <para>The <code>page_mapping_insert()</code> function is used to
432
      introduce a mapping for one virtual memory page belonging to a
431
      introduce a mapping for one virtual memory page belonging to a
433
      particular address space into the page tables. Once the mapping is in
432
      particular address space into the page tables. Once the mapping is in
434
      the page tables, it can be searched by <code>page_mapping_find()</code>
433
      the page tables, it can be searched by <code>page_mapping_find()</code>
435
      and removed by <code>page_mapping_remove()</code>. All of these
434
      and removed by <code>page_mapping_remove()</code>. All of these
436
      functions internally select the page table mechanism specific functions
435
      functions internally select the page table mechanism specific functions
437
      that carry out the self operation.</para>
436
      that carry out the self operation.</para>
438
 
437
 
439
      <para>There are currently two supported mechanisms: generic 4-level
438
      <para>There are currently two supported mechanisms: generic 4-level
440
      hierarchical page tables and global page hash table. Both of the
439
      hierarchical page tables and global page hash table. Both of the
441
      mechanisms are generic as they cover several hardware platforms. For
440
      mechanisms are generic as they cover several hardware platforms. For
442
      instance, the 4-level hierarchical page table mechanism is used by
441
      instance, the 4-level hierarchical page table mechanism is used by
443
      amd64, ia32, mips32 and ppc32, respectively. These architectures have
442
      amd64, ia32, mips32 and ppc32, respectively. These architectures have
444
      the following page table format: 4-level, 2-level, TLB-only and hardware
443
      the following page table format: 4-level, 2-level, TLB-only and hardware
445
      hash table, respectively. On the other hand, the global page hash table
444
      hash table, respectively. On the other hand, the global page hash table
446
      is used on ia64 that can be TLB-only or use a hardware hash table.
445
      is used on ia64 that can be TLB-only or use a hardware hash table.
447
      Although only two mechanisms are currently implemented, other mechanisms
446
      Although only two mechanisms are currently implemented, other mechanisms
448
      (e.g. B+tree) can be easily added.</para>
447
      (e.g. B+tree) can be easily added.</para>
449
 
448
 
450
      <section id="page_tables">
449
      <section id="page_tables">
451
        <indexterm>
450
        <indexterm>
452
          <primary>page tables</primary>
451
          <primary>page tables</primary>
453
 
452
 
454
          <secondary>- hierarchical</secondary>
453
          <secondary>- hierarchical</secondary>
455
        </indexterm>
454
        </indexterm>
456
 
455
 
457
        <title>Hierarchical 4-level page tables</title>
456
        <title>Hierarchical 4-level page tables</title>
458
 
457
 
459
        <para>Hierarchical 4-level page tables are generalization of the
458
        <para>Hierarchical 4-level page tables are generalization of the
460
        frequently used hierarchical model of page tables. In this mechanism,
459
        frequently used hierarchical model of page tables. In this mechanism,
461
        each address space has its own page tables. To avoid confusion in
460
        each address space has its own page tables. To avoid confusion in
462
        terminology used by hardware vendors, in HelenOS, the root level page
461
        terminology used by hardware vendors, in HelenOS, the root level page
463
        table is called PTL0, the two middle levels are called PTL1 and PTL2,
462
        table is called PTL0, the two middle levels are called PTL1 and PTL2,
464
        and, finally, the leaf level is called PTL3. All architectures using
463
        and, finally, the leaf level is called PTL3. All architectures using
465
        this mechanism are required to use PTL0 and PTL3. However, the middle
464
        this mechanism are required to use PTL0 and PTL3. However, the middle
466
        levels can be left out, depending on the hardware hierachy or
465
        levels can be left out, depending on the hardware hierachy or
467
        structure of software-only page tables. The genericity is achieved
466
        structure of software-only page tables. The genericity is achieved
468
        through a set of macros that define transitions from one level to
467
        through a set of macros that define transitions from one level to
469
        another. Unused levels are optimised out by the compiler.</para>
468
        another. Unused levels are optimised out by the compiler.</para>
470
      </section>
469
      </section>
471
 
470
 
472
      <section>
471
      <section>
473
        <indexterm>
472
        <indexterm>
474
          <primary>page tables</primary>
473
          <primary>page tables</primary>
475
 
474
 
476
          <secondary>- hashing</secondary>
475
          <secondary>- hashing</secondary>
477
        </indexterm>
476
        </indexterm>
478
 
477
 
479
        <title>Global page hash table</title>
478
        <title>Global page hash table</title>
480
 
479
 
481
        <para>Implementation of the global page hash table was encouraged by
480
        <para>Implementation of the global page hash table was encouraged by
482
        64-bit architectures that can have rather sparse address spaces. The
481
        64-bit architectures that can have rather sparse address spaces. The
483
        hash table contains valid mappings only. Each entry of the hash table
482
        hash table contains valid mappings only. Each entry of the hash table
484
        contains an address space pointer, virtual memory page number (VPN),
483
        contains an address space pointer, virtual memory page number (VPN),
485
        physical memory frame number (PFN) and a set of flags. The pair of the
484
        physical memory frame number (PFN) and a set of flags. The pair of the
486
        address space pointer and the virtual memory page number is used as a
485
        address space pointer and the virtual memory page number is used as a
487
        key for the hash table. One of the major differences between the
486
        key for the hash table. One of the major differences between the
488
        global page hash table and hierarchical 4-level page tables is that
487
        global page hash table and hierarchical 4-level page tables is that
489
        there is only a single global page hash table in the system while
488
        there is only a single global page hash table in the system while
490
        hierarchical page tables exist per address space. Thus, the global
489
        hierarchical page tables exist per address space. Thus, the global
491
        page hash table contains information about mappings of all address
490
        page hash table contains information about mappings of all address
492
        spaces in the system. </para>
491
        spaces in the system.</para>
493
 
492
 
494
        <para>The global page hash table mechanism uses the generic hash table
493
        <para>The global page hash table mechanism uses the generic hash table
495
        type as described in the chapter about <link linkend="hashtables">data
494
        type as described in the chapter dedicated to <link
496
        structures</link> earlier in this book.</para>
495
        linkend="hashtables">data structures</link> earlier in this
-
 
496
        book.</para>
497
      </section>
497
      </section>
498
    </section>
498
    </section>
499
  </section>
499
  </section>
500
 
500
 
501
  <section id="tlb">
501
  <section id="tlb">
502
    <indexterm>
502
    <indexterm>
503
      <primary>TLB</primary>
503
      <primary>TLB</primary>
504
    </indexterm>
504
    </indexterm>
505
 
505
 
506
    <title>Translation Lookaside buffer</title>
506
    <title>Translation Lookaside buffer</title>
507
 
507
 
508
    <para>Due to the extensive overhead during the page mapping lookup in the
508
    <para>Due to the extensive overhead of several extra memory accesses
-
 
509
    during page table lookup that are necessary on every instruction, modern
509
    page tables, all architectures has fast assotiative cache memory built-in
510
    architectures deploy fast assotiative cache of recelntly used page
510
    CPU. This memory called TLB stores recently used page table
511
    mappings. This cache is called TLB - Translation Lookaside Buffer - and is
-
 
512
    present on every processor in the system. As it has been already pointed
-
 
513
    out, TLB is the only page translation mechanism for some
511
    entries.</para>
514
    architectures.</para>
512
 
515
 
513
    <section id="tlb_shootdown">
516
    <section id="tlb_shootdown">
514
      <indexterm>
517
      <indexterm>
515
        <primary>TLB</primary>
518
        <primary>TLB</primary>
516
 
519
 
517
        <secondary>- TLB shootdown</secondary>
520
        <secondary>- TLB shootdown</secondary>
518
      </indexterm>
521
      </indexterm>
519
 
522
 
520
      <title>TLB consistency. TLB shootdown algorithm.</title>
523
      <title>TLB consistency</title>
521
 
524
 
522
      <para>Operating system is responsible for keeping TLB consistent by
525
      <para>The operating system is responsible for keeping TLB consistent
523
      invalidating the contents of TLB, whenever there is some change in page
526
      with the page tables. Whenever mappings are modified or purged from the
524
      tables. Those changes may occur when page or group of pages were
527
      page tables, or when an address space identifier is reused, the kernel
525
      unmapped, mapping is changed or system switching active address space to
-
 
526
      schedule a new system task. Moreover, this invalidation operation must
-
 
527
      be done an all system CPUs because each CPU has its own independent TLB
528
      needs to invalidate the respective contents of TLB. Some TLB types
528
      cache. Thus maintaining TLB consistency on SMP configuration as not as
-
 
529
      trivial task as it looks on the first glance. Naive solution would
529
      support partial invalidation of their content (e.g. ranges of pages or
530
      assume that is the CPU which wants to invalidate TLB will invalidate TLB
530
      address spaces) while other types can be invalidated only entirely. The
531
      caches on other CPUs. It is not possible on the most of the
531
      invalidation must be done on all processors for there is one TLB per
532
      architectures, because of the simple fact - flushing TLB is allowed only
-
 
533
      on the local CPU and there is no possibility to access other CPUs' TLB
532
      processor. Maintaining TLB consistency on multiprocessor configurations
534
      caches, thus invalidate TLB remotely.</para>
533
      is not as trivial as it might look from the first glance.</para>
535
 
534
 
536
      <para>Technique of remote invalidation of TLB entries is called "TLB
535
      <para>The remote TLB invalidation is called TLB shootdown. HelenOS uses
537
      shootdown". HelenOS uses a variation of the algorithm described by D.
536
      a simplified variant of the algorithm described in <xref
538
      Black et al., "Translation Lookaside Buffer Consistency: A Software
-
 
539
      Approach," Proc. Third Int'l Conf. Architectural Support for Programming
-
 
540
      Languages and Operating Systems, 1989, pp. 113-122. <xref
-
 
541
      linkend="Black89" /></para>
537
      linkend="Black89" />. </para>
542
 
-
 
543
      <para>As the situation demands, you will want partitial invalidation of
-
 
544
      TLB caches. In case of simple memory mapping change it is necessary to
-
 
545
      invalidate only one or more adjacent pages. In case if the architecture
-
 
546
      is aware of ASIDs, when kernel needs to dump some ASID to use by another
-
 
547
      task, it invalidates only entries from this particular address space.
-
 
548
      Final option of the TLB invalidation is the complete TLB cache
-
 
549
      invalidation, which is the operation that flushes all entries in
-
 
550
      TLB.</para>
-
 
551
 
538
 
552
      <para>TLB shootdown is performed in two phases.</para>
539
      <para>TLB shootdown is performed in three phases.</para>
553
 
540
 
554
      <formalpara>
541
      <formalpara>
555
        <title>Phase 1.</title>
542
        <title>Phase 1.</title>
556
 
543
 
557
        <para>First, initiator locks a global TLB spinlock, then request is
544
        <para>The initiator clears its TLB flag and locks the global TLB
558
        being put to the local request cache of every other CPU in the system
545
        spinlock. The request is then enqueued into all other processors' TLB
559
        protected by its spinlock. In case the cache is full, all requests in
546
        shootdown message queues. When the TLB shootdown message queue is full
560
        the cache are replaced by one request, indicating global TLB flush.
547
        on any processor, the queue is purged and a single request to
561
        Then the initiator thread sends an IPI message indicating the TLB
548
        invalidate the entire TLB is stored there. Once all the TLB shootdown
562
        shootdown request to the rest of the CPUs and waits actively until all
549
        messages were dispatched, the initiator sends all other processors an
563
        CPUs confirm TLB invalidating action execution by setting up a special
550
        interrupt to notify them about the incoming TLB shootdown message. It
564
        flag. After setting this flag this thread is blocked on the TLB
551
        then spins until all processors accept the interrupt and clear their
565
        spinlock, held by the initiator.</para>
552
        TLB flags.</para>
566
      </formalpara>
553
      </formalpara>
567
 
554
 
568
      <formalpara>
555
      <formalpara>
569
        <title>Phase 2.</title>
556
        <title>Phase 2.</title>
570
 
557
 
571
        <para>All CPUs are waiting on the TLB spinlock to execute TLB
558
        <para>Except for the initiator, all other processors are spining on
-
 
559
        the TLB spinlock. The initiator is now free to modify the page tables
572
        invalidation action and have indicated their intention to the
560
        and purge its own TLB. The initiator then unlocks the global TLB
-
 
561
        spinlock and sets its TLB flag.</para>
-
 
562
      </formalpara>
-
 
563
 
-
 
564
      <formalpara>
-
 
565
        <title>Phase 3.</title>
-
 
566
 
-
 
567
        <para>When the spinlock is unlocked by the initiator, other processors
573
        initiator. Initiator continues, cleaning up its TLB and releasing the
568
        are sequentially granted the spinlock. However, once they manage to
574
        global TLB spinlock. After this all other CPUs gain and immidiately
569
        lock it, they immediately release it. Each processor invalidates its
-
 
570
        TLB according to messages found in its TLB shootdown message queue. In
575
        release TLB spinlock and perform TLB invalidation actions.</para>
571
        the end, each processor sets its TLB flag and resumes its previous
-
 
572
        operation.</para>
576
      </formalpara>
573
      </formalpara>
577
    </section>
574
    </section>
578
 
575
 
579
    <section>
576
    <section>
580
      <title>Address spaces</title>
577
      <title>Address spaces</title>
581
 
578
 
582
      <section>
579
      <section>
583
        <indexterm>
580
        <indexterm>
584
          <primary>address space</primary>
581
          <primary>address space</primary>
585
 
582
 
586
          <secondary>- area</secondary>
583
          <secondary>- area</secondary>
587
        </indexterm>
584
        </indexterm>
588
 
585
 
589
        <title>Address space areas</title>
586
        <title>Address space areas</title>
590
 
587
 
591
        <para>Each address space consists of mutually disjunctive continuous
588
        <para>Each address space consists of mutually disjunctive continuous
592
        address space areas. Address space area is precisely defined by its
589
        address space areas. Address space area is precisely defined by its
593
        base address and the number of frames/pages is contains.</para>
590
        base address and the number of frames/pages is contains.</para>
594
 
591
 
595
        <para>Address space area , that define behaviour and permissions on
592
        <para>Address space area , that define behaviour and permissions on
596
        the particular area. <itemizedlist>
593
        the particular area. <itemizedlist>
597
            <listitem><emphasis>AS_AREA_READ</emphasis> flag indicates reading
594
            <listitem><emphasis>AS_AREA_READ</emphasis> flag indicates reading
598
            permission.</listitem>
595
            permission.</listitem>
599
 
596
 
600
            <listitem><emphasis>AS_AREA_WRITE</emphasis> flag indicates
597
            <listitem><emphasis>AS_AREA_WRITE</emphasis> flag indicates
601
            writing permission.</listitem>
598
            writing permission.</listitem>
602
 
599
 
603
            <listitem><emphasis>AS_AREA_EXEC</emphasis> flag indicates code
600
            <listitem><emphasis>AS_AREA_EXEC</emphasis> flag indicates code
604
            execution permission. Some architectures do not support execution
601
            execution permission. Some architectures do not support execution
605
            persmission restriction. In this case this flag has no
602
            persmission restriction. In this case this flag has no
606
            effect.</listitem>
603
            effect.</listitem>
607
 
604
 
608
            <listitem><emphasis>AS_AREA_DEVICE</emphasis> marks area as mapped
605
            <listitem><emphasis>AS_AREA_DEVICE</emphasis> marks area as mapped
609
            to the device memory.</listitem>
606
            to the device memory.</listitem>
610
          </itemizedlist></para>
607
          </itemizedlist></para>
611
 
608
 
612
        <para>Kernel provides possibility tasks create/expand/shrink/share its
609
        <para>Kernel provides possibility tasks create/expand/shrink/share its
613
        address space via the set of syscalls.</para>
610
        address space via the set of syscalls.</para>
614
      </section>
611
      </section>
615
 
612
 
616
      <section>
613
      <section>
617
        <indexterm>
614
        <indexterm>
618
          <primary>address space</primary>
615
          <primary>address space</primary>
619
 
616
 
620
          <secondary>- ASID</secondary>
617
          <secondary>- ASID</secondary>
621
        </indexterm>
618
        </indexterm>
622
 
619
 
623
        <title>Address Space ID (ASID)</title>
620
        <title>Address Space ID (ASID)</title>
624
 
621
 
625
        <para>Every task in the operating system has it's own view of the
622
        <para>Every task in the operating system has it's own view of the
626
        virtual memory. When performing context switch between different
623
        virtual memory. When performing context switch between different
627
        tasks, the kernel must switch the address space mapping as well. As
624
        tasks, the kernel must switch the address space mapping as well. As
628
        modern processors perform very aggressive caching of virtual mappings,
625
        modern processors perform very aggressive caching of virtual mappings,
629
        flushing the complete TLB on every context switch would be very
626
        flushing the complete TLB on every context switch would be very
630
        inefficient. To avoid such performance penalty, some architectures
627
        inefficient. To avoid such performance penalty, some architectures
631
        introduce an address space identifier, which allows storing several
628
        introduce an address space identifier, which allows storing several
632
        different mappings inside TLB.</para>
629
        different mappings inside TLB.</para>
633
 
630
 
634
        <para>HelenOS kernel can take advantage of this hardware support by
631
        <para>HelenOS kernel can take advantage of this hardware support by
635
        having an ASID abstraction. I.e. on ia64 kernel ASID is derived from
632
        having an ASID abstraction. I.e. on ia64 kernel ASID is derived from
636
        RID (region identifier) and on the mips32 kernel ASID is actually the
633
        RID (region identifier) and on the mips32 kernel ASID is actually the
637
        hardware identifier. As expected, this ASID information record is the
634
        hardware identifier. As expected, this ASID information record is the
638
        part of <emphasis>as_t</emphasis> structure.</para>
635
        part of <emphasis>as_t</emphasis> structure.</para>
639
 
636
 
640
        <para>Due to the hardware limitations, hardware ASID has limited
637
        <para>Due to the hardware limitations, hardware ASID has limited
641
        length from 8 bits on ia64 to 24 bits on mips32, which makes it
638
        length from 8 bits on ia64 to 24 bits on mips32, which makes it
642
        impossible to use it as unique address space identifier for all tasks
639
        impossible to use it as unique address space identifier for all tasks
643
        running in the system. In such situations special ASID stealing
640
        running in the system. In such situations special ASID stealing
644
        algoritm is used, which takes ASID from inactive task and assigns it
641
        algoritm is used, which takes ASID from inactive task and assigns it
645
        to the active task.</para>
642
        to the active task.</para>
646
 
643
 
647
        <indexterm>
644
        <indexterm>
648
          <primary>address space</primary>
645
          <primary>address space</primary>
649
 
646
 
650
          <secondary>- ASID stealing</secondary>
647
          <secondary>- ASID stealing</secondary>
651
        </indexterm>
648
        </indexterm>
652
 
649
 
653
        <para>
650
        <para>
654
          <classname>ASID stealing algoritm here.</classname>
651
          <classname>ASID stealing algoritm here.</classname>
655
        </para>
652
        </para>
656
      </section>
653
      </section>
657
    </section>
654
    </section>
658
  </section>
655
  </section>
659
</chapter>
656
</chapter>