Rev 157 | Only display areas with differences | Regard whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 157 | Rev 169 | ||
---|---|---|---|
1 | <?xml version="1.0" encoding="UTF-8"?> |
1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <appendix id="archspecs"> |
2 | <appendix id="archspecs"> |
3 | <?dbhtml filename="arch.html"?> |
3 | <?dbhtml filename="arch.html"?> |
4 | 4 | ||
5 | <title>Architecture Specific Notes</title> |
5 | <title>Architecture Specific Notes</title> |
6 | 6 | ||
7 | <section> |
7 | <section> |
8 | <title>AMD64/Intel EM64T</title> |
8 | <title>AMD64/Intel EM64T</title> |
9 | 9 | ||
10 | <para>The amd64 architecture is a 64-bit extension of the older ia32 |
10 | <para>The amd64 architecture is a 64-bit extension of the older ia32 |
11 | architecture. Only 64-bit applications are supported. Creating this port |
11 | architecture. Only 64-bit applications are supported. Creating this port |
12 | was relatively easy, because it shares a lot of common code with ia32 |
12 | was relatively easy, because it shares a lot of common code with ia32 |
13 | platform. However, the 64-bit extension has some specifics, which made the |
13 | platform. However, the 64-bit extension has some specifics, which made the |
14 | porting interesting.</para> |
14 | porting interesting.</para> |
15 | 15 | ||
16 | <section> |
16 | <section> |
17 | <title>Virtual Memory</title> |
17 | <title>Virtual Memory</title> |
18 | 18 | ||
19 | <para>The amd64 architecture uses standard processor defined 4-level |
19 | <para>The amd64 architecture uses standard processor defined 4-level |
20 | page mapping of 4KB pages. The NX(no-execute) flag on individual pages |
20 | page mapping of 4KB pages. The NX(no-execute) flag on individual pages |
21 | is fully supported.</para> |
21 | is fully supported.</para> |
22 | </section> |
22 | </section> |
23 | 23 | ||
24 | <section> |
24 | <section> |
25 | <title>TLB-only Paging</title> |
25 | <title>TLB-only Paging</title> |
26 | 26 | ||
27 | <para>All memory on the amd64 architecture is memory mapped, if the |
27 | <para>All memory on the amd64 architecture is memory mapped, if the |
28 | kernel needs to access physical memory, a mapping must be created. |
28 | kernel needs to access physical memory, a mapping must be created. |
29 | During boot process the boot loader creates mapping for the first 20MB |
29 | During boot process the boot loader creates mapping for the first 20MB |
30 | of physical memory. To correctly initialize the page mapping system, an |
30 | of physical memory. To correctly initialize the page mapping system, an |
31 | identity mapping of whole physical memory must be created. However, to |
31 | identity mapping of whole physical memory must be created. However, to |
32 | create the mapping it is unavoidable to allocate new - possibly unmapped |
32 | create the mapping it is unavoidable to allocate new - possibly unmapped |
33 | - frames from frame allocator. The ia32 solves it by mapping first 2GB |
33 | - frames from frame allocator. The ia32 solves it by mapping first 2GB |
34 | memory during boot process. The same solution on 64-bit platform becomes |
34 | memory during boot process. The same solution on 64-bit platform becomes |
35 | unfeasible because of the size of the possible address space.</para> |
35 | unfeasible because of the size of the possible address space.</para> |
36 | 36 | ||
37 | <para>As soon as the exception routines are initialized, a special page |
37 | <para>As soon as the exception routines are initialized, a special page |
38 | fault exception handler is installed which provides a complete view of |
38 | fault exception handler is installed which provides a complete view of |
39 | physical memory until the real page mapping system is initialized. It |
39 | physical memory until the real page mapping system is initialized. It |
40 | dynamically changes the page tables to always contain exactly the |
40 | dynamically changes the page tables to always contain exactly the |
41 | faulting address. The page then becomes cached in the TLB and on the |
41 | faulting address. The page then becomes cached in the TLB and on the |
42 | next page fault the same tables can be utilized to handle another |
42 | next page fault the same tables can be utilized to handle another |
43 | mapping.</para> |
43 | mapping.</para> |
44 | </section> |
44 | </section> |
45 | 45 | ||
46 | <section> |
46 | <section> |
47 | <title>Mapping of Physical Memory</title> |
47 | <title>Mapping of Physical Memory</title> |
48 | 48 | ||
49 | <para>The amd64 ABI document describes several modes of program layout. |
49 | <para>The amd64 ABI document describes several modes of program layout. |
50 | The operating system kernel should be compiled in a |
50 | The operating system kernel should be compiled in a |
51 | <emphasis>kernel</emphasis> mode - the kernel is located in the negative |
51 | <emphasis>kernel</emphasis> mode - the kernel is located in the negative |
52 | 2 gigabytes (0xffffffff80000000-0xfffffffffffffffff) and can access data |
52 | 2 gigabytes (0xffffffff80000000-0xfffffffffffffffff) and can access data |
53 | anywhere in the 64-bit space. This wouldn't allow kernel to see directly |
53 | anywhere in the 64-bit space. This wouldn't allow kernel to see directly |
54 | more than 2GB of physical memory. HelenOS duplicates the virtual mapping |
54 | more than 2GB of physical memory. HelenOS duplicates the virtual mapping |
55 | of the physical memory starting at 0xffff800000000000 and accesses all |
55 | of the physical memory starting at 0xffff800000000000 and accesses all |
56 | external references using this address range.</para> |
56 | external references using this address range.</para> |
57 | </section> |
57 | </section> |
58 | 58 | ||
59 | <section> |
59 | <section> |
60 | <title>Thread Local Storage</title> |
60 | <title>Thread Local Storage</title> |
61 | 61 | ||
62 | <para>The code accessing thread local storage uses a segment register FS |
62 | <para>The code accessing thread local storage uses a segment register FS |
63 | as a base. The thread local storage is stored in the hidden 64-bit part |
63 | as a base. The thread local storage is stored in the hidden 64-bit part |
64 | of the FS register which must be written using priviledged machine |
64 | of the FS register which must be written using priviledged machine |
65 | specific instructions. Special syscall to change this register is |
65 | specific instructions. Special syscall to change this register is |
66 | provided to user applications. The TLS address for this platform is |
66 | provided to user applications. The TLS address for this platform is |
67 | expected to point just after the end of the thread local data. The |
67 | expected to point just after the end of the thread local data. The |
68 | application sometimes need to get a real address of the thread local |
68 | application sometimes need to get a real address of the thread local |
69 | data in its address space but it is impossible to read the base of the |
69 | data in its address space but it is impossible to read the base of the |
70 | FS segmentation register. The solution is to add the self-reference |
70 | FS segmentation register. The solution is to add the self-reference |
71 | address to the end of thread local data, so that the application can |
71 | address to the end of thread local data, so that the application can |
72 | read the address as %gs:0.</para> |
72 | read the address as %gs:0.</para> |
73 | 73 | ||
74 | <figure float="1"> |
74 | <figure float="1"> |
75 | <title>IA-32 & AMD64 TLD</title> |
75 | <title>IA-32 & AMD64 TLD</title> |
76 | 76 | ||
77 | <mediaobject id="tldia32"> |
77 | <mediaobject id="tldia32"> |
78 | <imageobject role="pdf"> |
78 | <imageobject role="pdf"> |
79 | <imagedata fileref="images/tld_ia32.pdf" format="PDF" /> |
79 | <imagedata fileref="images/tld_ia32.pdf" format="PDF" /> |
80 | </imageobject> |
80 | </imageobject> |
81 | 81 | ||
82 | <imageobject role="html"> |
82 | <imageobject role="html"> |
83 | <imagedata fileref="images/tld_ia32.png" format="PNG" /> |
83 | <imagedata fileref="images/tld_ia32.png" format="PNG" /> |
84 | </imageobject> |
84 | </imageobject> |
85 | 85 | ||
86 | <imageobject role="fop"> |
86 | <imageobject role="fop"> |
87 | <imagedata fileref="images/tld_ia32.svg" format="SVG" /> |
87 | <imagedata fileref="images/tld_ia32.svg" format="SVG" /> |
88 | </imageobject> |
88 | </imageobject> |
89 | </mediaobject> |
89 | </mediaobject> |
90 | </figure> |
90 | </figure> |
91 | </section> |
91 | </section> |
92 | 92 | ||
93 | <section> |
93 | <section> |
94 | <title>Fast SYSCALL/SYSRET Support</title> |
94 | <title>Fast SYSCALL/SYSRET Support</title> |
95 | 95 | ||
96 | <para>The entry point for system calls was traditionally a speed problem |
96 | <para>The entry point for system calls was traditionally a speed problem |
97 | on the ia32 architecture. The amd64 supports SYSCALL/SYSRET |
97 | on the ia32 architecture. The amd64 supports SYSCALL/SYSRET |
98 | instructions. Upon encountering the SYSCALL instruction, the processor |
98 | instructions. Upon encountering the SYSCALL instruction, the processor |
99 | changes privilege mode and transfers control to an address stored in |
99 | changes privilege mode and transfers control to an address stored in |
100 | machine specific register. Unlike other similar instructions it does not |
100 | machine specific register. Unlike other similar instructions it does not |
101 | change stack to a known kernel stack, which must be done by the syscall |
101 | change stack to a known kernel stack, which must be done by the syscall |
102 | entry routine. A hidden part of a GS register is provided to support the |
102 | entry routine. A hidden part of a GS register is provided to support the |
103 | entry routine with data needed for switching to kernel stack.</para> |
103 | entry routine with data needed for switching to kernel stack.</para> |
104 | </section> |
104 | </section> |
105 | 105 | ||
106 | <section> |
106 | <section> |
107 | <title>Debugging Support</title> |
107 | <title>Debugging Support</title> |
108 | 108 | ||
109 | <para>To provide developers tools for finding bugs, hardware breakpoints |
109 | <para>To provide developers tools for finding bugs, hardware breakpoints |
110 | and watchpoints are supported. The kernel also supports self-debugging - |
110 | and watchpoints are supported. The kernel also supports self-debugging - |
111 | it sets watchpoints on certain data and upon every modification |
111 | it sets watchpoints on certain data and upon every modification |
112 | automatically checks whether a correct value was written. It is |
112 | automatically checks whether a correct value was written. It is |
113 | worthwhile to mention, that since this feature was implemented, the |
113 | worthwhile to mention, that since this feature was implemented, the |
114 | watchpoint was never fired.</para> |
114 | watchpoint was never fired.</para> |
115 | </section> |
115 | </section> |
116 | </section> |
116 | </section> |
117 | 117 | ||
118 | <section> |
118 | <section> |
119 | <title>Intel IA-32</title> |
119 | <title>Intel IA-32</title> |
120 | 120 | ||
121 | <para>The ia32 architecture uses 4K pages and processor supported 2-level |
121 | <para>The ia32 architecture uses 4K pages and processor supported 2-level |
122 | page tables. Along with amd64 It is one of the 2 architectures that fully |
122 | page tables. Along with amd64, it is one of the two architectures that fully |
123 | supports SMP configurations. The architecture is mostly similar to amd64, |
123 | support SMP configurations. The architecture is mostly similar to amd64, |
124 | it even shares a lot of code. The debugging support is the same as with |
124 | it even shares a lot of code. The debugging support is the same as with |
125 | amd64. The thread local storage uses GS register.</para> |
125 | amd64. The thread local storage uses GS register.</para> |
126 | </section> |
126 | </section> |
127 | 127 | ||
128 | <section> |
128 | <section> |
129 | <title>32-bit MIPS</title> |
129 | <title>32-bit MIPS</title> |
130 | 130 | ||
131 | <para>Both little and big endian kernels are supported. In order to test |
131 | <para>Both little and big endian kernels are supported. In order to test |
132 | different page sizes, the mips32 page size was set to 16K. The mips32 |
132 | different page sizes, the mips32 page size was set to 16K. The mips32 |
133 | architecture is TLB-only, the kernel simulates 2-level page tables. On |
133 | architecture is TLB-only, the kernel simulates 2-level page tables. On |
134 | processors that support it, lazy FPU context switching is |
134 | processors that support it, lazy FPU context switching is |
135 | implemented.</para> |
135 | implemented.</para> |
136 | 136 | ||
137 | <section> |
137 | <section> |
138 | <title>Thread Local Storage</title> |
138 | <title>Thread Local Storage</title> |
139 | 139 | ||
140 | <para>The thread local storage support in compilers is a relatively |
140 | <para>The thread local storage support in compilers is a relatively |
141 | recent phenomena. The standardization of such support for the mips32 |
141 | recent phenomena. The standardization of such support for the mips32 |
142 | platform is very new and even the newest versions of GCC cannot generate |
142 | platform is very new and even the newest versions of GCC cannot generate |
143 | 100% correct code. Because of some weird MIPS processor variants, it was |
143 | 100% correct code. Because of some weird MIPS processor variants, it was |
144 | decided, that the TLS pointer will be gathered not from some of the free |
144 | decided, that the TLS pointer will be gathered not from some of the free |
145 | registers, but a special instruction was devised and the kernel is |
145 | registers, but a special instruction was devised and the kernel is |
146 | supposed to emulate it. HelenOS expects that the TLS pointer is in the |
146 | supposed to emulate it. HelenOS expects that the TLS pointer is in the |
147 | K1 register. Upon encountering the reserved instruction exception and |
147 | K1 register. Upon encountering the reserved instruction exception and |
148 | checking that the application is requesting a TLS pointer, it returns |
148 | checking that the application is requesting a TLS pointer, it returns |
149 | the contents of the K1 register. The K1 register is expected to point |
149 | the contents of the K1 register. The K1 register is expected to point |
150 | 0x7000 bytes after the beginning of the thread local data.</para> |
150 | 0x7000 bytes after the beginning of the thread local data.</para> |
151 | 151 | ||
152 | <figure float="1"> |
152 | <figure float="1"> |
153 | <title>MIPS & PowerPC TLD</title> |
153 | <title>MIPS & PowerPC TLD</title> |
154 | 154 | ||
155 | <mediaobject id="tldmips"> |
155 | <mediaobject id="tldmips"> |
156 | <imageobject role="pdf"> |
156 | <imageobject role="pdf"> |
157 | <imagedata fileref="images/tld_mips.pdf" format="PDF" /> |
157 | <imagedata fileref="images/tld_mips.pdf" format="PDF" /> |
158 | </imageobject> |
158 | </imageobject> |
159 | 159 | ||
160 | <imageobject role="html"> |
160 | <imageobject role="html"> |
161 | <imagedata fileref="images/tld_mips.png" format="PNG" /> |
161 | <imagedata fileref="images/tld_mips.png" format="PNG" /> |
162 | </imageobject> |
162 | </imageobject> |
163 | 163 | ||
164 | <imageobject role="fop"> |
164 | <imageobject role="fop"> |
165 | <imagedata fileref="images/tld_mips.svg" format="SVG" /> |
165 | <imagedata fileref="images/tld_mips.svg" format="SVG" /> |
166 | </imageobject> |
166 | </imageobject> |
167 | </mediaobject> |
167 | </mediaobject> |
168 | </figure> |
168 | </figure> |
169 | </section> |
169 | </section> |
170 | 170 | ||
171 | <section> |
171 | <section> |
172 | <title>Lazy FPU Context Switching</title> |
172 | <title>Lazy FPU Context Switching</title> |
173 | 173 | ||
174 | <para>Implementing lazy FPU switching on MIPS architecture is |
174 | <para>Implementing lazy FPU switching on MIPS architecture is |
175 | straightforward. When coprocessor CP1 is disabled, any FPU intruction |
175 | straightforward. When coprocessor CP1 is disabled, any FPU intruction |
176 | raises a Coprocessor Unusable exception. The generic lazy FPU context |
176 | raises a Coprocessor Unusable exception. The generic lazy FPU context |
177 | switch is then called that takes care of the correct context |
177 | switch is then called that takes care of the correct context |
178 | save/restore.</para> |
178 | save/restore.</para> |
179 | </section> |
179 | </section> |
180 | </section> |
180 | </section> |
181 | 181 | ||
182 | <section> |
182 | <section> |
183 | <title>Power PC</title> |
183 | <title>Power PC</title> |
184 | 184 | ||
185 | <para>PowerPC allows kernel to enable mode, where data and intruction |
185 | <para>PowerPC allows kernel to enable mode, where data and intruction |
186 | memory reads are not translated through virtual memory mapping |
186 | memory reads are not translated through virtual memory mapping |
187 | (<emphasis>real mode</emphasis>). The real mode is automatically enabled |
187 | (<emphasis>real mode</emphasis>). The real mode is automatically enabled |
188 | when an exception occurs. However, the kernel uses the same memory |
188 | when an exception occurs. However, the kernel uses the same memory |
189 | structure as on other 32-bit platforms - physical memory is mapped into |
189 | structure as on other 32-bit platforms - physical memory is mapped into |
190 | the top 2GB, userspace memory is available in the bottom half of the |
190 | the top 2GB, userspace memory is available in the bottom half of the |
191 | 32-bit address space.</para> |
191 | 32-bit address space.</para> |
192 | 192 | ||
193 | <section> |
193 | <section> |
194 | <title>OpenFirmware Boot</title> |
194 | <title>OpenFirmware Boot</title> |
195 | 195 | ||
196 | <para>The OpenFirmware loads an image of HelenOS operating system and |
196 | <para>The OpenFirmware loads an image of HelenOS operating system and |
197 | passes control to the HelenOS specific boot loader. The boot loader then |
197 | passes control to the HelenOS specific boot loader. The boot loader then |
198 | performs following tasks:</para> |
198 | performs following tasks:</para> |
199 | 199 | ||
200 | <itemizedlist> |
200 | <itemizedlist> |
201 | <listitem> |
201 | <listitem> |
202 | <para>Fetches information from OpenFirmware regarding memory |
202 | <para>Fetches information from OpenFirmware regarding memory |
203 | structure, device information etc.</para> |
203 | structure, device information etc.</para> |
204 | </listitem> |
204 | </listitem> |
205 | 205 | ||
206 | <listitem> |
206 | <listitem> |
207 | <para>Switches memory mapping to the real mode.</para> |
207 | <para>Switches memory mapping to the real mode.</para> |
208 | </listitem> |
208 | </listitem> |
209 | 209 | ||
210 | <listitem> |
210 | <listitem> |
211 | <para>Copies the kernel to proper physical address.</para> |
211 | <para>Copies the kernel to proper physical address.</para> |
212 | </listitem> |
212 | </listitem> |
213 | 213 | ||
214 | <listitem> |
214 | <listitem> |
215 | <para>Creates basic memory mapping and switches to the new kernel |
215 | <para>Creates basic memory mapping and switches to the new kernel |
216 | mapping, in which the kernel can run.</para> |
216 | mapping, in which the kernel can run.</para> |
217 | </listitem> |
217 | </listitem> |
218 | 218 | ||
219 | <listitem> |
219 | <listitem> |
220 | <para>Passes control to the kernel <function>main_bsp</function> |
220 | <para>Passes control to the kernel <function>main_bsp</function> |
221 | function.</para> |
221 | function.</para> |
222 | </listitem> |
222 | </listitem> |
223 | </itemizedlist> |
223 | </itemizedlist> |
224 | </section> |
224 | </section> |
225 | 225 | ||
226 | <section> |
226 | <section> |
227 | <title>Thread Local Storage</title> |
227 | <title>Thread Local Storage</title> |
228 | 228 | ||
229 | <para>The Power PC thread local storage uses R2 register to hold an |
229 | <para>The Power PC thread local storage uses R2 register to hold an |
230 | address, that is 0x7000 bytes after the beginning of the thread local |
230 | address, that is 0x7000 bytes after the beginning of the thread local |
231 | data. Overally it is the same as on the MIPS architecture.</para> |
231 | data. Overally it is the same as on the MIPS architecture.</para> |
232 | </section> |
232 | </section> |
233 | </section> |
233 | </section> |
234 | 234 | ||
235 | <section> |
235 | <section> |
236 | <title>IA-64</title> |
236 | <title>IA-64</title> |
237 | 237 | ||
238 | <para>The ia64 kernel uses 16K pages.</para> |
238 | <para>The ia64 kernel uses 16K pages.</para> |
239 | 239 | ||
240 | <section> |
240 | <section> |
241 | <title>Two IA-64 Stacks</title> |
241 | <title>Two IA-64 Stacks</title> |
242 | 242 | ||
243 | <para>The architecture makes use of a pair of stacks. One stack is the |
243 | <para>The architecture makes use of a pair of stacks. One stack is the |
244 | ordinary memory stack while the other is a special register stack. This |
244 | ordinary memory stack while the other is a special register stack. This |
245 | makes the ia64 architecture unique. HelenOS on ia64 solves the problem |
245 | makes the ia64 architecture unique. HelenOS on ia64 solves the problem |
246 | by allocating two physical memory frames for thread and scheduler |
246 | by allocating two physical memory frames for thread and scheduler |
247 | stacks. The upper frame is used by the register stack while the first |
247 | stacks. The upper frame is used by the register stack while the first |
248 | frame is used by the conventional memory stack. The generic kernel and |
248 | frame is used by the conventional memory stack. The generic kernel and |
249 | userspace code had to be adjusted to cope with the possibility of |
249 | userspace code had to be adjusted to cope with the possibility of |
250 | allocating more frames for the stack.</para> |
250 | allocating more frames for the stack.</para> |
251 | </section> |
251 | </section> |
252 | 252 | ||
253 | <section> |
253 | <section> |
254 | <title>Thread Local Storage</title> |
254 | <title>Thread Local Storage</title> |
255 | 255 | ||
256 | <para>Although thread local storage is not officially supported in |
256 | <para>Although thread local storage is not officially supported in |
257 | statically linked binaries, GCC supports it without any major obstacles. |
257 | statically linked binaries, GCC supports it without any major obstacles. |
258 | The r13 register is used as a thread pointer, the thread local data |
258 | The r13 register is used as a thread pointer, the thread local data |
259 | section starts at address r13+16.</para> |
259 | section starts at address r13+16.</para> |
260 | 260 | ||
261 | <para><figure float="1"> |
261 | <para><figure float="1"> |
262 | <title>IA-64 TLD</title> |
262 | <title>IA-64 TLD</title> |
263 | 263 | ||
264 | <mediaobject id="tldia64"> |
264 | <mediaobject id="tldia64"> |
265 | <imageobject role="pdf"> |
265 | <imageobject role="pdf"> |
266 | <imagedata fileref="images/tld_ia64.pdf" format="PDF" /> |
266 | <imagedata fileref="images/tld_ia64.pdf" format="PDF" /> |
267 | </imageobject> |
267 | </imageobject> |
268 | 268 | ||
269 | <imageobject role="html"> |
269 | <imageobject role="html"> |
270 | <imagedata fileref="images/tld_ia64.png" format="PNG" /> |
270 | <imagedata fileref="images/tld_ia64.png" format="PNG" /> |
271 | </imageobject> |
271 | </imageobject> |
272 | 272 | ||
273 | <imageobject role="fop"> |
273 | <imageobject role="fop"> |
274 | <imagedata fileref="images/tld_ia64.svg" format="SVG" /> |
274 | <imagedata fileref="images/tld_ia64.svg" format="SVG" /> |
275 | </imageobject> |
275 | </imageobject> |
276 | </mediaobject> |
276 | </mediaobject> |
277 | </figure></para> |
277 | </figure></para> |
278 | </section> |
278 | </section> |
279 | </section> |
279 | </section> |
280 | </appendix> |
280 | </appendix> |