Subversion Repositories HelenOS-doc

Rev

Rev 83 | Rev 96 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
9 bondari 1
<?xml version="1.0" encoding="UTF-8"?>
11 bondari 2
<chapter id="mm">
3
  <?dbhtml filename="mm.html"?>
9 bondari 4
 
11 bondari 5
  <title>Memory management</title>
9 bondari 6
 
67 jermar 7
  <para>In previous chapters, this book described the scheduling subsystem as
8
  the creator of the impression that threads execute in parallel. The memory
9
  management subsystem, on the other hand, creates the impression that there
10
  is enough physical memory for the kernel and that userspace tasks have the
11
  entire address space only for themselves.</para>
64 jermar 12
 
26 bondari 13
  <section>
64 jermar 14
    <title>Physical memory management</title>
15
 
16
    <section id="zones_and_frames">
17
      <title>Zones and frames</title>
18
 
67 jermar 19
      <para>HelenOS represents continuous areas of physical memory in
20
      structures called frame zones (abbreviated as zones). Each zone contains
21
      information about the number of allocated and unallocated physical
22
      memory frames as well as the physical base address of the zone and
23
      number of frames contained in it. A zone also contains an array of frame
24
      structures describing each frame of the zone and, in the last, but not
25
      the least important, front, each zone is equipped with a buddy system
26
      that faciliates effective allocation of power-of-two sized block of
27
      frames.</para>
64 jermar 28
 
67 jermar 29
      <para>This organization of physical memory provides good preconditions
30
      for hot-plugging of more zones. There is also one currently unused zone
31
      attribute: <code>flags</code>. The attribute could be used to give a
32
      special meaning to some zones in the future.</para>
64 jermar 33
 
67 jermar 34
      <para>The zones are linked in a doubly-linked list. This might seem a
35
      bit ineffective because the zone list is walked everytime a frame is
36
      allocated or deallocated. However, this does not represent a significant
37
      performance problem as it is expected that the number of zones will be
38
      rather low. Moreover, most architectures merge all zones into
39
      one.</para>
40
 
76 palkovsky 41
      <para>Every physical memory frame in a zone, is described by a structure
42
      that contains number of references and other data used by buddy
67 jermar 43
      system.</para>
64 jermar 44
    </section>
45
 
46
    <section id="frame_allocator">
71 bondari 47
      <indexterm>
48
        <primary>frame allocator</primary>
49
      </indexterm>
50
 
64 jermar 51
      <title>Frame allocator</title>
52
 
67 jermar 53
      <para>The frame allocator satisfies kernel requests to allocate
54
      power-of-two sized blocks of physical memory. Because of zonal
55
      organization of physical memory, the frame allocator is always working
76 palkovsky 56
      within a context of a particular frame zone. In order to carry out the
67 jermar 57
      allocation requests, the frame allocator is tightly integrated with the
58
      buddy system belonging to the zone. The frame allocator is also
59
      responsible for updating information about the number of free and busy
60
      frames in the zone. <figure>
61
          <mediaobject id="frame_alloc">
77 bondari 62
            <imageobject role="eps">
63
              <imagedata fileref="images.vector/frame_alloc.eps" format="EPS" />
64
            </imageobject>
65
 
67 jermar 66
            <imageobject role="html">
67
              <imagedata fileref="images/frame_alloc.png" format="PNG" />
68
            </imageobject>
64 jermar 69
 
67 jermar 70
            <imageobject role="fop">
71
              <imagedata fileref="images.vector/frame_alloc.svg" format="SVG" />
72
            </imageobject>
73
          </mediaobject>
64 jermar 74
 
67 jermar 75
          <title>Frame allocator scheme.</title>
76
        </figure></para>
64 jermar 77
 
78
      <formalpara>
79
        <title>Allocation / deallocation</title>
80
 
82 jermar 81
        <para>Upon allocation request via function <code>frame_alloc()</code>,
67 jermar 82
        the frame allocator first tries to find a zone that can satisfy the
83
        request (i.e. has the required amount of free frames). Once a suitable
84
        zone is found, the frame allocator uses the buddy allocator on the
85
        zone's buddy system to perform the allocation. During deallocation,
82 jermar 86
        which is triggered by a call to <code>frame_free()</code>, the frame
67 jermar 87
        allocator looks up the respective zone that contains the frame being
88
        deallocated. Afterwards, it calls the buddy allocator again, this time
89
        to take care of deallocation within the zone's buddy system.</para>
64 jermar 90
      </formalpara>
91
    </section>
92
 
93
    <section id="buddy_allocator">
71 bondari 94
      <indexterm>
95
        <primary>buddy system</primary>
96
      </indexterm>
97
 
64 jermar 98
      <title>Buddy allocator</title>
99
 
67 jermar 100
      <para>In the buddy system, the memory is broken down into power-of-two
101
      sized naturally aligned blocks. These blocks are organized in an array
82 jermar 102
      of lists, in which the list with index <emphasis>i</emphasis> contains
103
      all unallocated blocks of size
104
      <emphasis>2<superscript>i</superscript></emphasis>. The index
105
      <emphasis>i</emphasis> is called the order of block. Should there be two
106
      adjacent equally sized blocks in the list <emphasis>i</emphasis> (i.e.
107
      buddies), the buddy allocator would coalesce them and put the resulting
108
      block in list <emphasis>i + 1</emphasis>, provided that the resulting
109
      block would be naturally aligned. Similarily, when the allocator is
110
      asked to allocate a block of size
111
      <emphasis>2<superscript>i</superscript></emphasis>, it first tries to
112
      satisfy the request from the list with index <emphasis>i</emphasis>. If
113
      the request cannot be satisfied (i.e. the list <emphasis>i</emphasis> is
114
      empty), the buddy allocator will try to allocate and split a larger
115
      block from the list with index <emphasis>i + 1</emphasis>. Both of these
116
      algorithms are recursive. The recursion ends either when there are no
117
      blocks to coalesce in the former case or when there are no blocks that
118
      can be split in the latter case.</para>
64 jermar 119
 
67 jermar 120
      <para>This approach greatly reduces external fragmentation of memory and
121
      helps in allocating bigger continuous blocks of memory aligned to their
122
      size. On the other hand, the buddy allocator suffers increased internal
123
      fragmentation of memory and is not suitable for general kernel
124
      allocations. This purpose is better addressed by the <link
125
      linkend="slab">slab allocator</link>.<figure>
64 jermar 126
          <mediaobject id="buddy_alloc">
77 bondari 127
            <imageobject role="eps">
128
              <imagedata fileref="images.vector/buddy_alloc.eps" format="EPS" />
129
            </imageobject>
130
 
64 jermar 131
            <imageobject role="html">
132
              <imagedata fileref="images/buddy_alloc.png" format="PNG" />
133
            </imageobject>
134
 
135
            <imageobject role="fop">
136
              <imagedata fileref="images.vector/buddy_alloc.svg" format="SVG" />
137
            </imageobject>
138
          </mediaobject>
139
 
140
          <title>Buddy system scheme.</title>
67 jermar 141
        </figure></para>
64 jermar 142
 
143
      <section>
144
        <title>Implementation</title>
145
 
146
        <para>The buddy allocator is, in fact, an abstract framework wich can
147
        be easily specialized to serve one particular task. It knows nothing
148
        about the nature of memory it helps to allocate. In order to beat the
149
        lack of this knowledge, the buddy allocator exports an interface that
150
        each of its clients is required to implement. When supplied with an
151
        implementation of this interface, the buddy allocator can use
152
        specialized external functions to find a buddy for a block, split and
153
        coalesce blocks, manipulate block order and mark blocks busy or
67 jermar 154
        available.</para>
64 jermar 155
 
156
        <formalpara>
157
          <title>Data organization</title>
158
 
159
          <para>Each entity allocable by the buddy allocator is required to
160
          contain space for storing block order number and a link variable
161
          used to interconnect blocks within the same order.</para>
162
 
163
          <para>Whatever entities are allocated by the buddy allocator, the
164
          first entity within a block is used to represent the entire block.
165
          The first entity keeps the order of the whole block. Other entities
166
          within the block are assigned the magic value
167
          <constant>BUDDY_INNER_BLOCK</constant>. This is especially important
168
          for effective identification of buddies in a one-dimensional array
169
          because the entity that represents a potential buddy cannot be
170
          associated with <constant>BUDDY_INNER_BLOCK</constant> (i.e. if it
171
          is associated with <constant>BUDDY_INNER_BLOCK</constant> then it is
172
          not a buddy).</para>
173
        </formalpara>
174
      </section>
175
    </section>
176
 
177
    <section id="slab">
71 bondari 178
      <indexterm>
179
        <primary>slab allocator</primary>
180
      </indexterm>
181
 
70 jermar 182
      <title>Slab allocator</title>
64 jermar 183
 
67 jermar 184
      <para>The majority of memory allocation requests in the kernel is for
185
      small, frequently used data structures. The basic idea behind the slab
186
      allocator is that commonly used objects are preallocated in continuous
187
      areas of physical memory called slabs<footnote>
188
          <para>Slabs are in fact blocks of physical memory frames allocated
189
          from the frame allocator.</para>
190
        </footnote>. Whenever an object is to be allocated, the slab allocator
191
      returns the first available item from a suitable slab corresponding to
192
      the object type<footnote>
193
          <para>The mechanism is rather more complicated, see the next
194
          paragraph.</para>
195
        </footnote>. Due to the fact that the sizes of the requested and
196
      allocated object match, the slab allocator significantly reduces
197
      internal fragmentation.</para>
64 jermar 198
 
71 bondari 199
      <indexterm>
200
        <primary>slab allocator</primary>
201
 
72 bondari 202
        <secondary>- slab cache</secondary>
71 bondari 203
      </indexterm>
204
 
67 jermar 205
      <para>Slabs of one object type are organized in a structure called slab
206
      cache. There are ususally more slabs in the slab cache, depending on
207
      previous allocations. If the the slab cache runs out of available slabs,
208
      new slabs are allocated. In order to exploit parallelism and to avoid
209
      locking of shared spinlocks, slab caches can have variants of
210
      processor-private slabs called magazines. On each processor, there is a
211
      two-magazine cache. Full magazines that are not part of any
212
      per-processor magazine cache are stored in a global list of full
213
      magazines.</para>
64 jermar 214
 
71 bondari 215
      <indexterm>
216
        <primary>slab allocator</primary>
217
 
72 bondari 218
        <secondary>- magazine</secondary>
71 bondari 219
      </indexterm>
220
 
67 jermar 221
      <para>Each object begins its life in a slab. When it is allocated from
222
      there, the slab allocator calls a constructor that is registered in the
223
      respective slab cache. The constructor initializes and brings the object
224
      into a known state. The object is then used by the user. When the user
225
      later frees the object, the slab allocator puts it into a processor
71 bondari 226
      private <indexterm>
227
          <primary>slab allocator</primary>
228
 
72 bondari 229
          <secondary>- magazine</secondary>
71 bondari 230
        </indexterm>magazine cache, from where it can be precedently allocated
67 jermar 231
      again. Note that allocations satisfied from a magazine are already
232
      initialized by the constructor. When both of the processor cached
233
      magazines get full, the allocator will move one of the magazines to the
234
      list of full magazines. Similarily, when allocating from an empty
235
      processor magazine cache, the kernel will reload only one magazine from
236
      the list of full magazines. In other words, the slab allocator tries to
237
      keep the processor magazine cache only half-full in order to prevent
238
      thrashing when allocations and deallocations interleave on magazine
70 jermar 239
      boundaries. The advantage of this setup is that during most of the
240
      allocations, no global spinlock needs to be held.</para>
65 jermar 241
 
67 jermar 242
      <para>Should HelenOS run short of memory, it would start deallocating
243
      objects from magazines, calling slab cache destructor on them and
244
      putting them back into slabs. When a slab contanins no allocated object,
245
      it is immediately freed.</para>
66 bondari 246
 
71 bondari 247
      <para>
248
        <figure>
64 jermar 249
          <mediaobject id="slab_alloc">
77 bondari 250
            <imageobject role="eps">
251
              <imagedata fileref="images.vector/slab_alloc.eps" format="EPS" />
252
            </imageobject>
253
 
64 jermar 254
            <imageobject role="html">
255
              <imagedata fileref="images/slab_alloc.png" format="PNG" />
256
            </imageobject>
77 bondari 257
 
258
            <imageobject role="fop">
259
              <imagedata fileref="images.vector/slab_alloc.svg" format="SVG" />
260
            </imageobject>
64 jermar 261
          </mediaobject>
262
 
263
          <title>Slab allocator scheme.</title>
71 bondari 264
        </figure>
265
      </para>
64 jermar 266
 
67 jermar 267
      <section>
268
        <title>Implementation</title>
269
 
83 jermar 270
        <para>The slab allocator is closely modelled after <xref
71 bondari 271
        linkend="Bonwick01" /> with the following exceptions:<itemizedlist>
82 jermar 272
            <listitem>
273
              <para>empty slabs are immediately deallocated and</para>
274
            </listitem>
66 bondari 275
 
276
            <listitem>
70 jermar 277
              <para>empty magazines are deallocated when not needed.</para>
66 bondari 278
            </listitem>
70 jermar 279
          </itemizedlist>The following features are not currently supported
280
        but would be easy to do: <itemizedlist>
71 bondari 281
            <listitem>cache coloring and</listitem>
64 jermar 282
 
71 bondari 283
            <listitem>dynamic magazine grow (different magazine sizes are
284
            already supported, but the allocation strategy would need to be
285
            adjusted).</listitem>
64 jermar 286
          </itemizedlist></para>
287
 
288
        <section>
289
          <title>Allocation/deallocation</title>
290
 
70 jermar 291
          <para>The following two paragraphs summarize and complete the
292
          description of the slab allocator operation (i.e.
82 jermar 293
          <code>slab_alloc()</code> and <code>slab_free()</code>
294
          functions).</para>
64 jermar 295
 
296
          <formalpara>
297
            <title>Allocation</title>
298
 
70 jermar 299
            <para><emphasis>Step 1.</emphasis> When an allocation request
300
            comes, the slab allocator checks availability of memory in the
301
            current magazine of the local processor magazine cache. If the
76 palkovsky 302
            available memory is there, the allocator just pops the object from
303
            magazine and returns it.</para>
64 jermar 304
 
70 jermar 305
            <para><emphasis>Step 2.</emphasis> If the current magazine in the
306
            processor magazine cache is empty, the allocator will attempt to
307
            swap it with the last magazine from the cache and return to the
308
            first step. If also the last magazine is empty, the algorithm will
309
            fall through to Step 3.</para>
64 jermar 310
 
70 jermar 311
            <para><emphasis>Step 3.</emphasis> Now the allocator is in the
312
            situation when both magazines in the processor magazine cache are
313
            empty. The allocator reloads one magazine from the shared list of
314
            full magazines. If the reload is successful (i.e. there are full
315
            magazines in the list), the algorithm continues with Step
316
            1.</para>
64 jermar 317
 
70 jermar 318
            <para><emphasis>Step 4.</emphasis> In this fail-safe step, an
319
            object is allocated from the conventional slab layer and a pointer
320
            to it is returned. If also the last magazine is full,</para>
64 jermar 321
          </formalpara>
322
 
323
          <formalpara>
324
            <title>Deallocation</title>
325
 
70 jermar 326
            <para><emphasis>Step 1.</emphasis> During a deallocation request,
327
            the slab allocator checks if the current magazine of the local
76 palkovsky 328
            processor magazine cache is not full. If it is, the pointer to the
70 jermar 329
            objects is just pushed into the magazine and the algorithm
330
            returns.</para>
64 jermar 331
 
70 jermar 332
            <para><emphasis>Step 2.</emphasis> If the current magazine is
333
            full, the allocator will attempt to swap it with the last magazine
334
            from the cache and return to the first step. If also the last
335
            magazine is empty, the algorithm will fall through to Step
336
            3.</para>
64 jermar 337
 
70 jermar 338
            <para><emphasis>Step 3.</emphasis> Now the allocator is in the
339
            situation when both magazines in the processor magazine cache are
76 palkovsky 340
            full. The allocator tries to allocate a new empty magazine and
341
            flush one of the full magazines to the shared list of full
342
            magazines. If it is successfull, the algoritm continues with Step
343
            1.</para>
344
 
345
            <para><emphasis>Step 4. </emphasis>In case of low memory condition
346
            when the allocation of empty magazine fails, the object is moved
347
            directly into slab. In the worst case object deallocation does not
348
            need to allocate any additional memory.</para>
64 jermar 349
          </formalpara>
350
        </section>
351
      </section>
352
    </section>
353
  </section>
354
 
355
  <section>
67 jermar 356
    <title>Virtual memory management</title>
9 bondari 357
 
82 jermar 358
    <para>Virtual memory is essential for an operating system because it makes
359
    several things possible. First, it helps to isolate tasks from each other
360
    by encapsulating them in their private address spaces. Second, virtual
361
    memory can give tasks the feeling of more memory available than is
362
    actually possible. And third, by using virtual memory, there might be
363
    multiple copies of the same program, linked to the same addresses, running
364
    in the system. There are at least two known mechanisms for implementing
365
    virtual memory: segmentation and paging. Even though some processor
366
    architectures supported by HelenOS<footnote>
367
        <para>ia32 has full-fledged segmentation.</para>
368
      </footnote> provide both mechanism, the kernel makes use solely of
369
    paging.</para>
67 jermar 370
 
82 jermar 371
    <section id="paging">
372
      <title>VAT subsystem</title>
67 jermar 373
 
82 jermar 374
      <para>In a paged virtual memory, the entire virtual address space is
375
      divided into small power-of-two sized naturally aligned blocks called
376
      pages. The processor implements a translation mechanism, that allows the
377
      operating system to manage mappings between set of pages and set of
378
      indentically sized and identically aligned pieces of physical memory
379
      called frames. In a result, references to continuous virtual memory
380
      areas don't necessarily need to reference continuos area of physical
381
      memory. Supported page sizes usually range from several kilobytes to
382
      several megabytes. Each page that takes part in the mapping is
383
      associated with certain attributes that further desribe the mapping
384
      (e.g. access rights, dirty and accessed bits and present bit).</para>
67 jermar 385
 
82 jermar 386
      <para>When the processor accesses a page that is not present (i.e. its
387
      present bit is not set), the operating system is notified through a
388
      special exception called page fault. It is then up to the operating
389
      system to service the page fault. In HelenOS, some page faults are fatal
390
      and result in either task termination or, in the worse case, kernel
391
      panic<footnote>
392
          <para>Such a condition would be either caused by a hardware failure
393
          or a bug in the kernel.</para>
394
        </footnote>, while other page faults are used to load memory on demand
395
      or to notify the kernel about certain events.</para>
67 jermar 396
 
82 jermar 397
      <indexterm>
398
        <primary>page tables</primary>
399
      </indexterm>
67 jermar 400
 
82 jermar 401
      <para>The set of all page mappings is stored in a memory structure
402
      called page tables. Some architectures have no hardware support for page
403
      tables<footnote>
404
          <para>On mips32, TLB-only model is used and the operating system is
405
          responsible for managing software defined page tables.</para>
406
        </footnote> while other processor architectures<footnote>
407
          <para>Like amd64 and ia32.</para>
408
        </footnote> understand the whole memory format thereof. Despite all
409
      the possible differences in page table formats, the HelenOS VAT
410
      subsystem<footnote>
411
          <para>Virtual Address Translation subsystem.</para>
412
        </footnote> unifies the page table operations under one programming
413
      interface. For all parts of the kernel, three basic functions are
414
      provided:</para>
415
 
416
      <itemizedlist>
417
        <listitem>
418
          <para><code>page_mapping_insert()</code>,</para>
419
        </listitem>
420
 
421
        <listitem>
422
          <para><code>page_mapping_find()</code> and</para>
423
        </listitem>
424
 
425
        <listitem>
426
          <para><code>page_mapping_remove()</code>.</para>
427
        </listitem>
428
      </itemizedlist>
429
 
430
      <para>The <code>page_mapping_insert()</code> function is used to
431
      introduce a mapping for one virtual memory page belonging to a
432
      particular address space into the page tables. Once the mapping is in
433
      the page tables, it can be searched by <code>page_mapping_find()</code>
434
      and removed by <code>page_mapping_remove()</code>. All of these
435
      functions internally select the page table mechanism specific functions
436
      that carry out the self operation.</para>
437
 
438
      <para>There are currently two supported mechanisms: generic 4-level
439
      hierarchical page tables and global page hash table. Both of the
440
      mechanisms are generic as they cover several hardware platforms. For
441
      instance, the 4-level hierarchical page table mechanism is used by
442
      amd64, ia32, mips32 and ppc32, respectively. These architectures have
443
      the following page table format: 4-level, 2-level, TLB-only and hardware
444
      hash table, respectively. On the other hand, the global page hash table
445
      is used on ia64 that can be TLB-only or use a hardware hash table.
446
      Although only two mechanisms are currently implemented, other mechanisms
447
      (e.g. B+tree) can be easily added.</para>
448
 
449
      <section id="page_tables">
450
        <indexterm>
451
          <primary>page tables</primary>
452
 
453
          <secondary>- hierarchical</secondary>
454
        </indexterm>
455
 
456
        <title>Hierarchical 4-level page tables</title>
457
 
458
        <para>Hierarchical 4-level page tables are generalization of the
459
        frequently used hierarchical model of page tables. In this mechanism,
460
        each address space has its own page tables. To avoid confusion in
461
        terminology used by hardware vendors, in HelenOS, the root level page
462
        table is called PTL0, the two middle levels are called PTL1 and PTL2,
463
        and, finally, the leaf level is called PTL3. All architectures using
464
        this mechanism are required to use PTL0 and PTL3. However, the middle
465
        levels can be left out, depending on the hardware hierachy or
466
        structure of software-only page tables. The genericity is achieved
467
        through a set of macros that define transitions from one level to
468
        another. Unused levels are optimised out by the compiler.</para>
469
      </section>
470
 
471
      <section>
472
        <indexterm>
473
          <primary>page tables</primary>
474
 
475
          <secondary>- hashing</secondary>
476
        </indexterm>
477
 
478
        <title>Global page hash table</title>
479
 
480
        <para>Implementation of the global page hash table was encouraged by
481
        64-bit architectures that can have rather sparse address spaces. The
482
        hash table contains valid mappings only. Each entry of the hash table
483
        contains an address space pointer, virtual memory page number (VPN),
484
        physical memory frame number (PFN) and a set of flags. The pair of the
485
        address space pointer and the virtual memory page number is used as a
486
        key for the hash table. One of the major differences between the
487
        global page hash table and hierarchical 4-level page tables is that
488
        there is only a single global page hash table in the system while
489
        hierarchical page tables exist per address space. Thus, the global
490
        page hash table contains information about mappings of all address
83 jermar 491
        spaces in the system.</para>
82 jermar 492
 
493
        <para>The global page hash table mechanism uses the generic hash table
83 jermar 494
        type as described in the chapter dedicated to <link
495
        linkend="hashtables">data structures</link> earlier in this
496
        book.</para>
82 jermar 497
      </section>
67 jermar 498
    </section>
82 jermar 499
  </section>
67 jermar 500
 
82 jermar 501
  <section id="tlb">
502
    <indexterm>
503
      <primary>TLB</primary>
504
    </indexterm>
505
 
506
    <title>Translation Lookaside buffer</title>
507
 
83 jermar 508
    <para>Due to the extensive overhead of several extra memory accesses
509
    during page table lookup that are necessary on every instruction, modern
510
    architectures deploy fast assotiative cache of recelntly used page
511
    mappings. This cache is called TLB - Translation Lookaside Buffer - and is
512
    present on every processor in the system. As it has been already pointed
513
    out, TLB is the only page translation mechanism for some
514
    architectures.</para>
82 jermar 515
 
516
    <section id="tlb_shootdown">
517
      <indexterm>
518
        <primary>TLB</primary>
519
 
520
        <secondary>- TLB shootdown</secondary>
521
      </indexterm>
522
 
83 jermar 523
      <title>TLB consistency</title>
82 jermar 524
 
83 jermar 525
      <para>The operating system is responsible for keeping TLB consistent
526
      with the page tables. Whenever mappings are modified or purged from the
527
      page tables, or when an address space identifier is reused, the kernel
528
      needs to invalidate the respective contents of TLB. Some TLB types
529
      support partial invalidation of their content (e.g. ranges of pages or
530
      address spaces) while other types can be invalidated only entirely. The
531
      invalidation must be done on all processors for there is one TLB per
532
      processor. Maintaining TLB consistency on multiprocessor configurations
533
      is not as trivial as it might look from the first glance.</para>
82 jermar 534
 
83 jermar 535
      <para>The remote TLB invalidation is called TLB shootdown. HelenOS uses
536
      a simplified variant of the algorithm described in <xref
84 jermar 537
      linkend="Black89" />.</para>
82 jermar 538
 
83 jermar 539
      <para>TLB shootdown is performed in three phases.</para>
82 jermar 540
 
541
      <formalpara>
542
        <title>Phase 1.</title>
543
 
83 jermar 544
        <para>The initiator clears its TLB flag and locks the global TLB
545
        spinlock. The request is then enqueued into all other processors' TLB
546
        shootdown message queues. When the TLB shootdown message queue is full
547
        on any processor, the queue is purged and a single request to
548
        invalidate the entire TLB is stored there. Once all the TLB shootdown
549
        messages were dispatched, the initiator sends all other processors an
550
        interrupt to notify them about the incoming TLB shootdown message. It
551
        then spins until all processors accept the interrupt and clear their
552
        TLB flags.</para>
82 jermar 553
      </formalpara>
554
 
555
      <formalpara>
556
        <title>Phase 2.</title>
557
 
83 jermar 558
        <para>Except for the initiator, all other processors are spining on
559
        the TLB spinlock. The initiator is now free to modify the page tables
560
        and purge its own TLB. The initiator then unlocks the global TLB
561
        spinlock and sets its TLB flag.</para>
82 jermar 562
      </formalpara>
83 jermar 563
 
564
      <formalpara>
565
        <title>Phase 3.</title>
566
 
567
        <para>When the spinlock is unlocked by the initiator, other processors
568
        are sequentially granted the spinlock. However, once they manage to
569
        lock it, they immediately release it. Each processor invalidates its
570
        TLB according to messages found in its TLB shootdown message queue. In
571
        the end, each processor sets its TLB flag and resumes its previous
572
        operation.</para>
573
      </formalpara>
82 jermar 574
    </section>
84 jermar 575
  </section>
82 jermar 576
 
84 jermar 577
  <section>
578
    <title>Address spaces</title>
70 jermar 579
 
84 jermar 580
    <para>last bits about address spaces</para>
70 jermar 581
 
84 jermar 582
    <section>
583
      <indexterm>
584
        <primary>address space</primary>
70 jermar 585
 
84 jermar 586
        <secondary>- ASID</secondary>
587
      </indexterm>
67 jermar 588
 
84 jermar 589
      <title>Address Space ID (ASID)</title>
67 jermar 590
 
84 jermar 591
      <para>Modern processor architectures optimize TLB utilization by
592
      associating TLB entries with address spaces through assigning
593
      identification numbers to them. In HelenOS, the term ASID, originally
594
      taken from the mips32 terminology, is used to refer to the address space
595
      identification number. The advantage of having ASIDs is that TLB does
596
      not have to be invalidated on thread context switch as long as ASIDs are
597
      unique. Unfotunatelly, architectures supported by HelenOS use all
598
      different widths of ASID numbers<footnote>
599
          <para>amd64 and ia32 don't use similar abstraction at all, mips32
600
          has 8-bit ASIDs and ia64 can have ASIDs between 18 to 24 bits
601
          wide.</para>
602
        </footnote> out of which none is sufficient. The amd64 and ia32
603
      architectures cannot make use of ASIDs as their TLB doesn't recognize
604
      such an abstraction. Other architectures have support for ASIDs, but for
605
      instance ppc32 doesn't make use of them in the current version of
606
      HelenOS. The rest of the architectures does use ASIDs. However, even on
607
      the ia64 architecture, the minimal supported width of ASID<footnote>
608
          <para>RID in ia64 terminology.</para>
609
        </footnote> is insufficient to provide a unique integer identifier to
610
      all address spaces that might hypothetically coexist in the running
611
      system. The situation on mips32 is even worse: the architecture has only
612
      256 unique identifiers.</para>
67 jermar 613
 
84 jermar 614
      <indexterm>
615
        <primary>address space</primary>
67 jermar 616
 
84 jermar 617
        <secondary>- ASID stealing</secondary>
618
      </indexterm>
67 jermar 619
 
84 jermar 620
      <para>To mitigate the shortage of ASIDs, HelenOS uses the following
621
      strategy. When the system initializes, a FIFO queue<footnote>
622
          <para>Note that architecture-specific measures are taken to avoid
623
          too large FIFO queue. For instance, seven consecutive ia64 RIDs are
624
          grouped to form one HelenOS ASID.</para>
625
        </footnote> is created and filled with all available ASIDs. Moreover,
626
      every address space remembers the number of processors on which it is
627
      active. Address spaces that have a valid ASID and that are not active on
628
      any processor are appended to the list of inactive address spaces with
629
      valid ASID. When an address space needs to be assigned a valid ASID, it
630
      first checks the FIFO queue. If it contains at least one ASID, the ASID
631
      is allocated. If the queue is empty, an ASID is simply stolen from the
632
      first address space in the list. In that case, the address space that
633
      loses the ASID in favor of another address space, is removed from the
634
      list. After the new ASID is purged from all TLBs, it can be used by the
635
      address space. Note that this approach works due to the fact that the
636
      number of ASIDs is greater than the maximal number of processors
637
      supported by HelenOS and that there can be only one active address space
638
      per processor. In other words, when the FIFO queue is empty, there must
639
      be address spaces that are not active on any processor.</para>
67 jermar 640
    </section>
26 bondari 641
  </section>
11 bondari 642
</chapter>