Rev 146 | Rev 157 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed
| Rev | Author | Line No. | Line |
|---|---|---|---|
| 122 | jermar | 1 | <?xml version="1.0" encoding="UTF-8"?> |
| 126 | jermar | 2 | <appendix id="archspecs"> |
| 130 | palkovsky | 3 | <?dbhtml filename="arch.html"?> |
| 4 | |||
| 133 | jermar | 5 | <title>Architecture Specific Notes</title> |
| 122 | jermar | 6 | |
| 7 | <section> |
||
| 130 | palkovsky | 8 | <title>AMD64/Intel EM64T</title> |
| 122 | jermar | 9 | |
| 133 | jermar | 10 | <para>The amd64 architecture is a 64-bit extension of the older ia32 |
| 130 | palkovsky | 11 | architecture. Only 64-bit applications are supported. Creating this port |
| 133 | jermar | 12 | was relatively easy, because it shares a lot of common code with ia32 |
| 130 | palkovsky | 13 | platform. However, the 64-bit extension has some specifics, which made the |
| 14 | porting interesting.</para> |
||
| 15 | |||
| 16 | <section> |
||
| 17 | <title>Virtual Memory</title> |
||
| 18 | |||
| 133 | jermar | 19 | <para>The amd64 architecture uses standard processor defined 4-level |
| 130 | palkovsky | 20 | page mapping of 4KB pages. The NX(no-execute) flag on individual pages |
| 21 | is fully supported.</para> |
||
| 22 | </section> |
||
| 23 | |||
| 24 | <section> |
||
| 25 | <title>TLB-only Paging</title> |
||
| 26 | |||
| 133 | jermar | 27 | <para>All memory on the amd64 architecture is memory mapped, if the |
| 130 | palkovsky | 28 | kernel needs to access physical memory, a mapping must be created. |
| 29 | During boot process the boot loader creates mapping for the first 20MB |
||
| 30 | of physical memory. To correctly initialize the page mapping system, an |
||
| 31 | identity mapping of whole physical memory must be created. However, to |
||
| 32 | create the mapping it is unavoidable to allocate new - possibly unmapped |
||
| 33 | - frames from frame allocator. The ia32 solves it by mapping first 2GB |
||
| 34 | memory during boot process. The same solution on 64-bit platform becomes |
||
| 35 | unfeasible because of the size of the possible address space.</para> |
||
| 36 | |||
| 37 | <para>As soon as the exception routines are initialized, a special page |
||
| 38 | fault exception handler is installed which provides a complete view of |
||
| 39 | physical memory until the real page mapping system is initialized. It |
||
| 40 | dynamically changes the page tables to always contain exactly the |
||
| 41 | faulting address. The page then becomes cached in the TLB and on the |
||
| 42 | next page fault the same tables can be utilized to handle another |
||
| 43 | mapping.</para> |
||
| 44 | </section> |
||
| 45 | |||
| 46 | <section> |
||
| 47 | <title>Mapping of Physical Memory</title> |
||
| 48 | |||
| 133 | jermar | 49 | <para>The amd64 ABI document describes several modes of program layout. |
| 130 | palkovsky | 50 | The operating system kernel should be compiled in a |
| 51 | <emphasis>kernel</emphasis> mode - the kernel is located in the negative |
||
| 52 | 2 gigabytes (0xffffffff80000000-0xfffffffffffffffff) and can access data |
||
| 53 | anywhere in the 64-bit space. This wouldn't allow kernel to see directly |
||
| 54 | more than 2GB of physical memory. HelenOS duplicates the virtual mapping |
||
| 55 | of the physical memory starting at 0xffff800000000000 and accesses all |
||
| 56 | external references using this address range.</para> |
||
| 57 | </section> |
||
| 58 | |||
| 59 | <section> |
||
| 60 | <title>Thread Local Storage</title> |
||
| 61 | |||
| 62 | <para>The code accessing thread local storage uses a segment register FS |
||
| 63 | as a base. The thread local storage is stored in the hidden 64-bit part |
||
| 64 | of the FS register which must be written using priviledged machine |
||
| 65 | specific instructions. Special syscall to change this register is |
||
| 66 | provided to user applications. The TLS address for this platform is |
||
| 137 | palkovsky | 67 | expected to point just after the end of the thread local data. The |
| 68 | application sometimes need to get a real address of the thread local |
||
| 69 | data in its address space but it is impossible to read the base of the |
||
| 70 | FS segmentation register. The solution is to add the self-reference |
||
| 71 | address to the end of thread local data, so that the application can |
||
| 146 | palkovsky | 72 | read the address as %gs:0.</para> |
| 137 | palkovsky | 73 | |
| 74 | <figure float="1"> |
||
| 75 | <title>IA32 & AMD64</title> |
||
| 76 | |||
| 77 | <mediaobject id="tldia32"> |
||
| 78 | <imageobject role="pdf"> |
||
| 79 | <imagedata fileref="images/tld_ia32.pdf" format="PDF" /> |
||
| 80 | </imageobject> |
||
| 81 | |||
| 82 | <imageobject role="html"> |
||
| 83 | <imagedata fileref="images/tld_ia32.png" format="PNG" /> |
||
| 84 | </imageobject> |
||
| 85 | |||
| 86 | <imageobject role="fop"> |
||
| 87 | <imagedata fileref="images/tld_ia32.svg" format="SVG" /> |
||
| 88 | </imageobject> |
||
| 89 | </mediaobject> |
||
| 90 | </figure> |
||
| 130 | palkovsky | 91 | </section> |
| 92 | |||
| 93 | <section> |
||
| 94 | <title>Fast SYSCALL/SYSRET Support</title> |
||
| 95 | |||
| 96 | <para>The entry point for system calls was traditionally a speed problem |
||
| 133 | jermar | 97 | on the ia32 architecture. The amd64 supports SYSCALL/SYSRET |
| 98 | instructions. Upon encountering the SYSCALL instruction, the processor |
||
| 99 | changes privilege mode and transfers control to an address stored in |
||
| 100 | machine specific register. Unlike other similar instructions it does not |
||
| 101 | change stack to a known kernel stack, which must be done by the syscall |
||
| 102 | entry routine. A hidden part of a GS register is provided to support the |
||
| 103 | entry routine with data needed for switching to kernel stack.</para> |
||
| 130 | palkovsky | 104 | </section> |
| 105 | |||
| 106 | <section> |
||
| 107 | <title>Debugging Support</title> |
||
| 108 | |||
| 109 | <para>To provide developers tools for finding bugs, hardware breakpoints |
||
| 110 | and watchpoints are supported. The kernel also supports self-debugging - |
||
| 111 | it sets watchpoints on certain data and upon every modification |
||
| 112 | automatically checks whether a correct value was written. It is |
||
| 113 | worthwhile to mention, that since this feature was implemented, the |
||
| 114 | watchpoint was never fired.</para> |
||
| 115 | </section> |
||
| 122 | jermar | 116 | </section> |
| 130 | palkovsky | 117 | |
| 118 | <section> |
||
| 133 | jermar | 119 | <title>Intel IA-32</title> |
| 130 | palkovsky | 120 | |
| 133 | jermar | 121 | <para>The ia32 architecture uses 4K pages and processor supported 2-level |
| 122 | page tables. Along with amd64 It is one of the 2 architectures that fully |
||
| 123 | supports SMP configurations. The architecture is mostly similar to amd64, |
||
| 124 | it even shares a lot of code. The debugging support is the same as with |
||
| 125 | amd64. The thread local storage uses GS register.</para> |
||
| 130 | palkovsky | 126 | </section> |
| 127 | |||
| 128 | <section> |
||
| 133 | jermar | 129 | <title>32-bit MIPS</title> |
| 130 | palkovsky | 130 | |
| 131 | <para>Both little and big endian kernels are supported. In order to test |
||
| 133 | jermar | 132 | different page sizes, the mips32 page size was set to 16K. The mips32 |
| 133 | architecture is TLB-only, the kernel simulates 2-level page tables. On |
||
| 134 | processors that support it, lazy FPU context switching is |
||
| 135 | implemented.</para> |
||
| 130 | palkovsky | 136 | |
| 137 | <section> |
||
| 138 | <title>Thread Local Storage</title> |
||
| 139 | |||
| 140 | <para>The thread local storage support in compilers is a relatively |
||
| 133 | jermar | 141 | recent phenomena. The standardization of such support for the mips32 |
| 142 | platform is very new and even the newest versions of GCC cannot generate |
||
| 143 | 100% correct code. Because of some weird MIPS processor variants, it was |
||
| 130 | palkovsky | 144 | decided, that the TLS pointer will be gathered not from some of the free |
| 145 | registers, but a special instruction was devised and the kernel is |
||
| 146 | supposed to emulate it. HelenOS expects that the TLS pointer is in the |
||
| 147 | K1 register. Upon encountering the reserved instruction exception and |
||
| 148 | checking that the application is requesting a TLS pointer, it returns |
||
| 149 | the contents of the K1 register. The K1 register is expected to point |
||
| 150 | 0x7000 bytes after the beginning of the thread local data.</para> |
||
| 137 | palkovsky | 151 | |
| 152 | <figure float="1"> |
||
| 153 | <title>MIPS & PPC</title> |
||
| 154 | |||
| 155 | <mediaobject id="tldmips"> |
||
| 156 | <imageobject role="pdf"> |
||
| 157 | <imagedata fileref="images/tld_mips.pdf" format="PDF" /> |
||
| 158 | </imageobject> |
||
| 159 | |||
| 160 | <imageobject role="html"> |
||
| 161 | <imagedata fileref="images/tld_mips.png" format="PNG" /> |
||
| 162 | </imageobject> |
||
| 163 | |||
| 164 | <imageobject role="fop"> |
||
| 165 | <imagedata fileref="images/tld_mips.svg" format="SVG" /> |
||
| 166 | </imageobject> |
||
| 167 | </mediaobject> |
||
| 168 | </figure> |
||
| 130 | palkovsky | 169 | </section> |
| 146 | palkovsky | 170 | |
| 171 | <section> |
||
| 172 | <title>Lazy FPU Context Switching</title> |
||
| 173 | |||
| 174 | <para>Implementing lazy FPU switching on MIPS architecture is |
||
| 175 | straightforward. When coprocessor CP1 is disabled, any FPU intruction |
||
| 176 | raises a Coprocessor Unusable exception. The generic lazy FPU context |
||
| 177 | switch is then called that takes care of the correct context |
||
| 178 | save/restore.</para> |
||
| 179 | </section> |
||
| 130 | palkovsky | 180 | </section> |
| 181 | |||
| 182 | <section> |
||
| 183 | <title>Power PC</title> |
||
| 184 | |||
| 146 | palkovsky | 185 | <para>PowerPC allows kernel to enable mode, where data and intruction |
| 186 | memory reads are not translated through virtual memory mapping |
||
| 187 | (<emphasis>real mode</emphasis>). The real mode is automatically enabled |
||
| 188 | when an exception occurs. However, the kernel uses the same memory |
||
| 189 | structure as on other 32-bit platforms - physical memory is mapped into |
||
| 190 | the top 2GB, userspace memory is available in the bottom half of the |
||
| 191 | 32-bit address space.</para> |
||
| 137 | palkovsky | 192 | |
| 193 | <section> |
||
| 146 | palkovsky | 194 | <title>OpenFirmware Boot</title> |
| 195 | |||
| 196 | <para>The OpenFirmware loads an image of HelenOS operating system and |
||
| 197 | passes control to the HelenOS specific boot loader. The boot loader then |
||
| 198 | performs following tasks:</para> |
||
| 199 | |||
| 200 | <itemizedlist> |
||
| 201 | <listitem> |
||
| 202 | <para>Fetches information from OpenFirmware regarding memory |
||
| 203 | structure, device information etc.</para> |
||
| 204 | </listitem> |
||
| 205 | |||
| 206 | <listitem> |
||
| 207 | <para>Switches memory mapping to the real mode.</para> |
||
| 208 | </listitem> |
||
| 209 | |||
| 210 | <listitem> |
||
| 211 | <para>Copies the kernel to proper physical address.</para> |
||
| 212 | </listitem> |
||
| 213 | |||
| 214 | <listitem> |
||
| 215 | <para>Creates basic memory mapping and switches to the new kernel |
||
| 216 | mapping, in which the kernel can run.</para> |
||
| 217 | </listitem> |
||
| 218 | |||
| 219 | <listitem> |
||
| 220 | <para>Passes control to the kernel <function>main_bsp</function> |
||
| 221 | function.</para> |
||
| 222 | </listitem> |
||
| 223 | </itemizedlist> |
||
| 224 | </section> |
||
| 225 | |||
| 226 | <section> |
||
| 137 | palkovsky | 227 | <title>Thread Local Storage</title> |
| 228 | |||
| 229 | <para>The Power PC thread local storage uses R2 register to hold an |
||
| 230 | address, that is 0x7000 bytes after the beginning of the thread local |
||
| 231 | data. Overally it is the same as on the MIPS architecture.</para> |
||
| 232 | </section> |
||
| 130 | palkovsky | 233 | </section> |
| 234 | |||
| 235 | <section> |
||
| 151 | jermar | 236 | <title>IA-64</title> |
| 130 | palkovsky | 237 | |
| 151 | jermar | 238 | <para>The ia64 kernel uses 16K pages.</para> |
| 137 | palkovsky | 239 | |
| 146 | palkovsky | 240 | <section> |
| 151 | jermar | 241 | <title>Two IA-64 Stacks</title> |
| 242 | |||
| 243 | <para>The architecture makes use of a pair of stacks. One stack is the |
||
| 244 | ordinary memory stack while the other is a special register stack. This |
||
| 245 | makes the ia64 architecture unique. HelenOS on ia64 solves the problem |
||
| 246 | by allocating two physical memory frames for thread and scheduler |
||
| 247 | stacks. The upper frame is used by the register stack while the first |
||
| 248 | frame is used by the conventional memory stack. The generic kernel and |
||
| 249 | userspace code had to be adjusted to cope with the possibility of |
||
| 250 | allocating more frames for the stack.</para> |
||
| 251 | </section> |
||
| 252 | |||
| 253 | <section> |
||
| 146 | palkovsky | 254 | <title>Thread Local Storage</title> |
| 137 | palkovsky | 255 | |
| 146 | palkovsky | 256 | <para>Although thread local storage is not officially supported in |
| 257 | statically linked binaries, GCC supports it without any major obstacles. |
||
| 151 | jermar | 258 | The r13 register is used as a thread pointer, the thread local data |
| 259 | section starts at address r13+16.</para> |
||
| 137 | palkovsky | 260 | |
| 146 | palkovsky | 261 | <para><figure float="1"> |
| 262 | <title>IA64</title> |
||
| 137 | palkovsky | 263 | |
| 146 | palkovsky | 264 | <mediaobject id="tldia64"> |
| 265 | <imageobject role="pdf"> |
||
| 266 | <imagedata fileref="images/tld_ia64.pdf" format="PDF" /> |
||
| 267 | </imageobject> |
||
| 268 | |||
| 269 | <imageobject role="html"> |
||
| 270 | <imagedata fileref="images/tld_ia64.png" format="PNG" /> |
||
| 271 | </imageobject> |
||
| 272 | |||
| 273 | <imageobject role="fop"> |
||
| 274 | <imagedata fileref="images/tld_ia64.svg" format="SVG" /> |
||
| 275 | </imageobject> |
||
| 276 | </mediaobject> |
||
| 277 | </figure></para> |
||
| 278 | </section> |
||
| 130 | palkovsky | 279 | </section> |
| 122 | jermar | 280 | </appendix> |