WebSVN – HelenOS-doc – Blame – /papers/isarcs10/isarcs10.tex

Rev	Author	Line No.	Line
179	decky	1	\documentclass{llncs}
		2	\usepackage{graphicx}
		3
		4	\title{A Road to a Formally Verified General-Purpose Operating System}
		5	\author{Martin D\v{e}ck\'{y}}
180	decky	6	\institute{Department of Distributed and Dependable Systems\\
179	decky	7	Faculty of Mathematics and Physics, Charles University\\
		8	Malostransk\'{e} n\'{a}m\v{e}st\'{i} 25, Prague 1, 118~00, Czech Republic\\
180	decky	9	\email{martin.decky@d3s.mff.cuni.cz}}
179	decky	10
		11	\begin{document}
		12	\maketitle
		13
		14	\begin{abstract}
		15	Methods of formal description and verification represent a viable way for achieving
		16	fundamentally bug-free software. However, in reality only a small subset of the existing operating
		17	systems were ever formally verified, despite the fact that an operating system is a critical part
180	decky	18	of almost any other software system. This paper points out several key design choices which
		19	should make the formal verification of an operating system easier and presents a work-in-progress
		20	and initial experiences with formal verification of HelenOS, a state-of-the-art microkernel-based
		21	operating system, which, however, was not designed specifically with formal verification in mind,
		22	as this is mostly prohibitive due to time and budget constrains.
179	decky	23
		24	The contribution of this paper is the shift of focus from attempts to use a single ``silver-bullet''
		25	formal verification method which would be able to verify everything to a combination of multiple
180	decky	26	formalisms and techniques to successfully cover various aspects of the operating system.
		27	A reliable and dependable operating system is the emerging property of the combination,
		28	thanks to the suitable architecture of the operating system.
179	decky	29	\end{abstract}
		30
		31	\section{Introduction}
		32	\label{introduction}
180	decky	33	Operating systems (OSes for short) have a somewhat special position among all software.
		34	OSes are usually designed to run on bare hardware. This means that they do not require
		35	any special assumptions on the environment except the assumptions on the properties and
		36	behavior of hardware. In many cases it is perfectly valid to consider the hardware
		37	as \emph{idealized hardware} (zero mathematical probability of failure, perfect compiance
		38	with the specifications, etc.). This means that it is solely the OS that defines the
		39	environment for other software.
179	decky	40
180	decky	41	OSes represent the lowest software layer and provide services to essentially all other
		42	software. Considering the principle of recursion, the properties of an OS form the
		43	assumptions for the upper layers of software. Thus the dependability of end-user and
		44	enterprise software systems is always limited by the dependability of the OS.
179	decky	45
180	decky	46	Finally, OSes are non-trivial software on their own. The way they are generally designed
		47	and programmed (spanning both the kernel and user mode, manipulating execution contexts
		48	and concurrency, handling critical hardware-related operations) represent significant
		49	and interesting challenges for software analysis.
179	decky	50
		51	\medskip
		52
180	decky	53	These are probably the most important reasons that led to several research initiatives
		54	in the recent years which target the creation of a formally verified OSes from scratch
		55	(e.g. \cite{seL4}). Methods of formal description and verification provide fundamentally
		56	better guarantees of desirable properties than non-exhaustive engineering methods such
		57	as testing.
179	decky	58
180	decky	59	However, 98~\%\footnote{98~\% of client computers connected to the Internet as of January
		60	2010~\cite{marketshare}.} of the market share of general-purpose OSes is taken
		61	by Windows, Mac~OS~X and Linux. These systems were clearly not designed with formal
		62	verification in mind from the very beginning. The situation on the embedded, real-time
		63	and special-purpose OSes market is probably different, but it is unlikely that the
		64	segmentation of the desktop and server OSes market is going to change very rapidly
		65	in the near future.
179	decky	66
180	decky	67	The architecture of these major desktop and server OSes is monolithic, which makes
		68	any attempts to do formal verification on them extremely challenging due to the large
		69	state space. Fortunatelly we can observe that aspects of several novel approaches from
		70	the OS research from the late 1980s and early 1990s (microkernel design, user space
		71	file system and device drivers, etc.) are slowly emerging in these originally purely
		72	monolithic implementations.
179	decky	73
180	decky	74	\medskip
179	decky	75
180	decky	76	In this paper we show how specific design choices can markedly improve the feasibility
		77	of verification of an OS, even if the particular OS was not designed
		78	specifically with formal verification in mind. These design choices can be gradually
		79	introduced (and in fact some of them have already been introduced) to mainstream
		80	general-purpose OSes.
179	decky	81
180	decky	82	Our approach is not based on using a single ``silver-bullet'' formalism, methodology or
		83	tool, but on combining various enginering, semi-formal and formal approaches.
		84	While the lesser formal approaches give lesser guarantees, they can complement
		85	the formal approaches on their boundaries and increase the coverage of the set of
		86	all hypothetical interesting properties of the system.
179	decky	87
180	decky	88	We also demonstrate work-in-progress case study of an general-purpose research OS
		89	that was not created specifically with formal verification in mind from the very
		90	beginning, but that was designed according to state-of-the-art OS principles.
179	decky	91
		92	\medskip
		93
180	decky	94	\noindent\textbf{Structure of the Paper.} In Section \ref{design} we introduce
		95	the design choices and our case study in more detail. In Section \ref{analysis} we
		96	discuss our approach of the combination of methods and tools. In Section \ref{evaluation}
		97	we present a preliminary evaluation of our efforts and propose the imminent next steps
		98	to take. Finally, in Section \ref{conclusion} we present the conclusion of the paper.
179	decky	99
180	decky	100	\section{Operating Systems Design}
		101	\label{design}
		102	Two very common schemes of OS design are \emph{monolithic design} and \emph{microkernel design}.
		103	Without going into much detail of any specific implementation, we can define the monolithic design as
		104	a preference to put numerous aspects of the core OS functionality into the kernel, while microkernel
		105	design is a preference to keep the kernel small, with just a minimal set of features.
179	decky	106
180	decky	107	The features which are missing from the kernel in the microkernel design are implemented
		108	in user space, usually by means of libraries, servers (e.g. processes/tasks) and/or software components.
179	decky	109
		110	\subsection{HelenOS}
		111	\label{helenos}
180	decky	112	\emph{HelenOS} is a general-purpose research OS which is being developed at Charles
179	decky	113	University in Prague. The source code is available under the BSD open source license and can be
		114	freely downloaded from the project web site~\cite{helenos}. The authors of the code base are
		115	both from the academia and from the open source community (several contributors are employed
180	decky	116	as Solaris kernel developers and many are freelance IT professionals).
179	decky	117
		118	HelenOS uses a preemptive priority-feedback scheduler, it supports SMP hardware and it is
		119	designed to be highly portable. Currently it runs on 7 distinct hardware architectures, including the
		120	most common IA-32, x86-64 (AMD64), IA-64, SPARC~v9 and PowerPC. It also runs on ARMv7 and MIPS,
		121	but currently only in simulators and not on physical hardware.
		122
		123	Although HelenOS is still far from being an everyday replacement for Linux or Windows due to the lack
		124	of end-user applications (whose development is extremely time-consuming, but unfortunately of
		125	no scientific value), the essential foundations such as file system support and TCP/IP networking
		126	are already in place.
		127
180	decky	128	HelenOS does not currently target embedded devices (although the ARMv7 port can be very easily
		129	modified to run on various embedded boards) and does not implement real-time features.
		130	Many development projects such as task snapshoting and migration, support for MMU-less
		131	platforms and performance monitoring are currently underway.
179	decky	132
		133	\medskip
		134
180	decky	135	HelenOS can be briefly described as microkernel multiserver design. However, the actual design
		136	guiding principles of the HelenOS are more elaborate:
179	decky	137
180	decky	138	\begin{description}
		139	\item[Microkernel principle] Every functionality of the OS that does not
179	decky	140	have to be necessary implemented in the kernel should be implemented in user space. This
		141	implies that subsystems such as the file system, device drivers (except those which are
		142	essential for the basic kernel functionality), naming and trading services, networking,
		143	human interface and similar features should be implemented in user space.
180	decky	144	\item[Full-fledged principle] Features which need to be placed in kernel should
		145	be implemented by full-fledged algorithms and data structures. In contrast
		146	to several other microkernel OSes, where the authors have deliberately chosen
179	decky	147	the most simplistic approach (static memory allocation, na\"{\i}ve algorithms, simple data
		148	structures), HelenOS microkernel tries to use the most advanced and suitable means available.
		149	It contains features such as AVL and B+ trees, hashing tables, SLAB memory allocator, multiple
		150	in-kernel synchronization primitives, fine-grained locking and so on.
180	decky	151	\item[Multiserver principle] Subsystems in user space should be decomposed with the smallest
179	decky	152	reasonable granularity. Each unit of decomposition should be encapsulated in a separate task.
		153	The tasks represent software components with isolated address spaces. From the design point of
		154	view the kernel can be seen as a separate component, too.
180	decky	155	\item[Split of mechanism and policy] The kernel should only provide low-level mechanisms,
179	decky	156	while the high-level policies which are built upon these mechanisms should be defined in
		157	user space. This also implies that the policies should be easily replaceable while keeping
		158	the low-level mechanisms intact.
180	decky	159	\item[Encapsulation principle] The communication between the tasks/components should be
179	decky	160	implemented only via a set of well-defined interfaces. In the user-to-user case the preferred
		161	communication mechanism is HelenOS IPC, which provides reasonable mix of abstraction and
		162	performance (RPC-like primitives combined with implicit memory sharing for large data
		163	transfers). In case of synchronous user-to-kernel communication the usual syscalls are used.
		164	HelenOS IPC is used again for asynchronous kernel-to-user communication.
180	decky	165	\item[Portability principle] The design and implementation should always maintain a high
179	decky	166	level of platform neutrality and portability. Platform-specific code in the kernel, core
180	decky	167	libraries and tasks implementing device drivers should be clearly separated from the
		168	generic code (either by component decomposition or at least by internal compile-time APIs).
		169	\end{description}
179	decky	170
180	decky	171	In Section \ref{analysis} we argue that several of these design principles significantly improve
		172	the feasibility of formal verification of the entire system. On the other hand, other design principles
		173	induce new interesting challenges for formal description and verification.
179	decky	174
		175	The run-time architecture of HelenOS is inherently dynamic. The bindings between the components are
180	decky	176	not created at compile-time, but during bootstrap and can be modified to a large degree also during
		177	normal operation mode of the system (via human interaction and external events).
179	decky	178
180	decky	179	The design of the ubiquitous HelenOS IPC mechanism and the associated threading model present
		180	the possibility to significantly reduce the size of the state space which needs to be explored
		181	by formal verification tools, but at the same time it is quite hard to express these
		182	constrains in many formalisms. The IPC is inherently asynchronous with constant message buffers
		183	in the kernel and dynamic buffers in user space. It uses the notions of uni-directional bindings,
		184	mandatory pairing of requests and replies, binding establishment and abolishment handshakes,
		185	memory sharing and fast message forwarding.
179	decky	186
		187	For easier management of the asynchronous messages and the possibility to process multiple
		188	messages from different peers without the usual kernel threading overhead, the core user space
180	decky	189	library manages the execution flow by so-called \emph{fibrils}. A fibril is a user-space-managed
		190	thread with cooperative scheduling. A different fibril is scheduled every time the current fibril
		191	is about to be blocked while sending out IPC requests (because the kernel buffers of the addressee
179	decky	192	are full) or while waiting on an IPC reply. This allows different execution flows within the
180	decky	193	same thread to process multiple requests and replies. To safeguard proper sequencing of IPC
		194	messages and provide synchronization, special fibril-aware synchronization primitives
179	decky	195	(mutexes, condition variables, etc.) are available.
		196
		197	Because of the cooperative nature of fibrils, they might cause severe performance under-utilization
		198	in SMP configurations and system-wide bottlenecks. As multicore processors are more and more
		199	common nowadays, that would be a substantial design flaw. Therefore the fibrils can be also freely
		200	(and to some degree even automatically) combined with the usual kernel threads, which provide
		201	preemptive scheduling and true parallelism on SMP machines. Needless to say, this combination is
		202	also a grand challenge for the formal reasoning.
		203
		204	\medskip
		205
180	decky	206	Incidentally, the \emph{full-fledged principle} causes that the size of the HelenOS microkernel is
		207	considerably larger compared to other ``scrupulous'' microkernel implementations. The average
		208	footprint of the kernel on IA-32 ranges from 569~KiB when all logging messages, asserts, symbol
		209	resolution and the debugging kernel console are compiled in, down to 198~KiB for a non-debugging
		210	production build. But we do not believe that the raw size of the microkernel is a relevant quality
		211	criterion per se, without taking the actual feature set into account.
		212
		213	\medskip
		214
179	decky	215	To sum up, the choice of HelenOS as our case study is based on the fact that it was not designed
180	decky	216	in advance with formal verification in mind (some of the design principles might be beneficial,
		217	but others might be disadvantageous), but the design of HelenOS is also non-trivial and not obsolete.
179	decky	218
		219	\subsection{The C Programming Language}
180	decky	220	A large majority of OSes is coded in the C programming language (HelenOS is no exception
		221	to this). The choice of C in the case of kernel is usually well-motivated, since the C language was designed
		222	specifically for implementing system software~\cite{c}: It is reasonably low-level in the sense that it allows
		223	to access the memory and other hardware resources with similar effectiveness as from assembler;
179	decky	224	It also requires almost no run-time support and it exports many features of the von Neumann hardware
		225	architecture to the programmer in a very straightforward, but still relatively portable way.
		226
		227	However, what is the biggest advantage of C in terms of run-time performance is also the biggest weakness
		228	for formal reasoning. The permissive memory access model of C, the lack of any reference safety
		229	enforcement, the weak type system and generally little semantic information in the code -- all these
180	decky	230	properties that do not allow to make many general assumptions about the code.
179	decky	231
		232	Programming languages which target controlled environments such as Java or C\(\sharp\) are
		233	generally easier for formal reasoning because they provide a well-known set of primitives
		234	and language constructs for object ownership, threading and synchronization. Many non-imperative
		235	programming languages can be even considered to be a form of ``executable specification'' and thus
		236	very suitable for formal reasoning. In C, almost everything is left to the programmer who
		237	is free to set the rules.
		238
		239	\medskip
		240
180	decky	241	The reasons for frequent use of C in the user space of many established OSes (and HelenOS) is
		242	probably much more questionable. In the case of HelenOS, except for the core libraries and tasks
		243	(such as device drivers), C might be easily replaced by any high-level and perhaps even
		244	non-imperative programming language. The reasons for using C in this context are mostly historical.
179	decky	245
180	decky	246	However, as we have stated in Section \ref{introduction}, the way general-purpose OSes
179	decky	247	are implemented changes only slowly and therefore any propositions which require radical modification
		248	of the existing code base before committing to the formal verification are not realistic.
		249
		250	\section{Analysis}
		251	\label{analysis}
		252
		253	\begin{figure}[t]
		254	\begin{center}
182	decky	255	\resizebox*{125mm}{!}{\includegraphics{diag}}
179	decky	256	\caption{Overview of the formalisms and tools proposed.}
		257	\label{fig:diag}
		258	\end{center}
		259	\end{figure}
		260
		261	In this section, we analyze the properties we would like to check in a general-purpose
		262	operating system. Each set of properties usually requires a specific verification method,
		263	tool or toolchain.
		264
180	decky	265	Our approach will be mostly bottom-up, or, in other words, from the lower levels of abstraction
179	decky	266	to the higher levels of abstraction. If the verification fails on a lower level, it usually
		267	does not make much sense to continue with the higher levels of abstraction until the issues
180	decky	268	are tackled. A structured overview of the formalisms, methods and tools can be seen on
		269	Figure \ref{fig:diag}.
179	decky	270
180	decky	271	\medskip
		272
		273	Some of the proposed methods cannot be called ``formal methods'' in the rigorous understanding
		274	of the term. However, even methods which are based on semi-formal reasoning and non-exhaustive
		275	testing provide some limited guarantees in their specific context. A valued property
		276	of the formal methods is to preserve these limited guarantees even on the higher levels
		277	of abstraction, thus complementing the formal guarantees where the formal methods do not provide
		278	any feasible verification so far. This increases the coverage of the set of all hypothetical
		279	interesting properties of the system (although it is probably impossible to formally define
		280	this set).
		281
		282	\medskip
		283
179	decky	284	Please note that the titles of the following sections do not follow any particular established
180	decky	285	taxonomy. We have simply chosen the names to be intuitivelly descriptive.
179	decky	286
		287	\subsection{C Language Compiler and Continuous Integration Tool}
		288	\label{clang}
181	decky	289	The initial levels of abstraction do not go far from the C source code and common engineering
		290	approaches. First, we would certainly like to know whether our code base is compliant with the
		291	programming language specification and passes only the basic semantic checks (proper number
		292	and types of arguments passed to functions, etc.). It is perhaps not very surprising that
182	decky	293	these decisions can be made by any plain C compiler. However, with the current implementation
181	decky	294	of HelenOS even this is not quite trivial.
		295
179	decky	296	Besides the requirement to support 7 hardware platforms, the system's compile-time configuration
181	decky	297	can be also affected by approximately 65 configuration options, most of which are booleans,
		298	the rest are enumerated types.
179	decky	299
		300	These configuration options are bound by logical propositions in conjunctive or disjunctive
		301	normal forms and while some options are freely configurable, the value of others gets inferred
181	decky	302	by the build system of HelenOS. The overall number of distinct configurations in which
179	decky	303	HelenOS can be compiled is at least one order of magnitude larger than the plain number
		304	of supported hardware platforms.
		305
		306	Various configuration options affect conditional compilation and linking. The programmers
		307	are used to make sure that the source code compiles and links fine with respect to the
		308	most common and obvious configurations, but the unforeseen interaction of the less common
		309	configuration options might cause linking or even compilation errors.
		310
		311	\medskip
		312
		313	A straightforward solution is to generate all distinct configurations, starting from the
		314	open variables and inferring the others. This can be part of the continuous integration
		315	process which would try to compile and link the sources in all distinct configurations.
		316
181	decky	317	If we want to be really pedantic, we should also make sure that we run all higher
179	decky	318	level verification methods on all configurations generated by this step. That would certainly
		319	require to multiply the time required by the verification methods at least by the number
		320	of the distinct configurations. Constraining the set of configurations to just the most
181	decky	321	representative ones is perhaps a reasonable compromise to make the verification realistic.
179	decky	322
		323	\subsection{Regression and Unit Tests}
181	decky	324	Running regression and unit tests which are part of HelenOS code base in the continuous
		325	integration process is fairly easy. The only complication lies in the technicalities:
179	decky	326	We need to setup an automated network of physical machines and simulators which can run the
		327	appropriate compilation outputs for the specific platforms. We need to be able to reboot
		328	them remotely and distribute the boot images to them. And last but not least, we need to be
		329	able to gather the results from them.
181	decky	330
		331	Testing is always non-exhaustive, thus the guarantees provided by tests are strictly limited
		332	to the use cases and contexts which are being explicitly tested. However, it is arguably
		333	easier to express many common use cases in the primary programming language than in some
		334	different formalism. As we follow the bottom-up approach, filtering out the most obvious
		335	bugs by testing can save us a lot of valuable time which would be otherwise waisted by
		336	a futile verification by more formal (and more time-consuming) methods.
179	decky	337
		338	\subsection{Instrumentation}
		339	Instrumentation tools for detecting memory leaks, performance bottlenecks and soft-deadlocks
181	decky	340	are also not usually considered to be formal verification tools (since it is hard to define
		341	exact formal properties which are being verified by the non-exhaustive nature of these tools).
		342	They are also rarely utilized on regular basis as part of the continuous integration process.
		343	But again, it might be helpful to just mention them in the big picture.
179	decky	344
		345	If some regression or unit tests fail, they sometimes do not give sufficient information to
		346	tell immediately what is the root cause of the issue. In that case running the faulting tests
		347	on manually or automatically instrumented executable code might provide more data and point
		348	more directly to the actual bug.
		349
		350	\subsection{Verifying C Language Compiler}
181	decky	351	C language compilers are traditionally also not considered to be formal verification tools.
		352	Many people just say that C compilers are good at generating executable code, but do not
		353	care much about the semantics of the source code (on the other hand, formal verification
		354	tools usually do not generate any executable code at all). However, with recent development
		355	in the compiler domain, the old paradigms are shifting.
179	decky	356
		357	As the optimization passes and general maturity of the compilers improve over time,
		358	the compilers try to extract and use more and more semantic information from the source code.
		359	The C language is quite poor on explicit semantic information, but the verifying compilers
		360	try to rely on vendor-specific language extensions and on the fact that some semantic information
		361	can be added to the source code without changing the resulting executable code.
		362
		363	The checks done by the verifying compilers cannot result in fatal errors in the usual cases (they
		364	are just warnings). Firstly, the compilers still need to successfully compile a well-formed C source
		365	code compliant to some older standard (e.g. C89) even when it is not up with the current quality
		366	expectations. Old legacy source code should still pass the compilation as it did decades ago.
		367
181	decky	368	Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation.
		369	They are mostly realized as abstract syntax tree transformations much in the line with the supporting
		370	routines of the compilation process (data and control flow graph analysis, dead code elimination,
		371	register allocation, etc.) and the evaluation function is basically the matching of antipatterns
		372	of common programming bugs.
179	decky	373
		374	The checks are usually conservative. The verifying compilers identify code constructs which are suspicious,
		375	which might arise out of programmer's bad intuition and so on, but even these code snippets cannot be
		376	tagged as definitive bugs (since the programmer can be simply in a position where he/she really wants to
		377	do something very strange, but nevertheless legitimate). It is upon the programmer
		378	to examine the root cause of the compiler warning, tell whether it is really a bug or just a false
		379	positive and fix the issue by either amending some additional semantic information (e.g. adding an
		380	explicit typecast or a vendor-specific language extension) or rewriting the code.
181	decky	381
		382	Although this level of abstraction is coarse-grained and conservative, it can be called semi-formal,
		383	since the properties which are being verified can be actually defined quite exactly and they
		384	are reasonably general. They do not deal with single traces of methods, runs and use
		385	cases like tests, but they deal with all possible contexts in which the code can run.
179	decky	386
		387	\subsection{Static Analyzer}
		388	Static analyzers try to go deeper than verifying compilers. Besides detecting common antipatterns of
		389	bugs, they also use techniques such as abstract interpretation to check for more complex properties.
		390
		391	Most commercial static analyzers come with a predefined set of properties which cannot be easily changed.
		392	They are coupled with the commonly used semantics of the environment and generate domain-specific models
		393	of the software based not only on the syntax of the source code, but also based on the assumptions derived
		394	from the memory access model, allocation and deallocation rules, tracking of references and tracking of
		395	concurrency locks.
		396
		397	The biggest advantage of static analyzers is that they can be easily included in the development or
		398	continuous integration process as an additional automated step, very similar to the verifying compilers.
181	decky	399	No manual definition of code-specific properties is needed and false positives can be relatively easily
		400	eliminated by amending some explicit additional information to the source code within the boundaries
		401	of the programming language.
179	decky	402
181	decky	403	The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion},
		404	but static analyzers are still relatively limited by the kind of bugs they are designed to detect.
		405	They are usually good at pointing out common issues with security implications (specific types of
		406	buffer and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten
179	decky	407	deallocation of resources and release of locks, etc.). Unfortunately, many static analyzers
		408	only analyze a single-threaded control flow and are thus unable to detect concurrency issues
		409	such as deadlocks.
		410
		411	\subsection{Static Verifier}
		412	There is one key difference between a static analyzer and a static verifier: Static verifiers
		413	allow the user to specify one's own properties, in terms of preconditions, postconditions and
		414	invariants in the code. Many static verifiers also target true multithreaded usage patterns
		415	and have the capability to check proper locking order, hand-over-hand locking and even liveliness.
		416
		417	In the context of an operating system kernel and core libraries two kinds of properties are
		418	common:
		419
181	decky	420	\begin{description}
		421	\item[Consistency constrains] These properties define the correct way how data is supposed
179	decky	422	to be manipulated by some related set of subroutines. Checking for these
		423	properties ensures that data structures and internal states will not get corrupt due
		424	to bugs in the functions and methods which are designed to manipulate them.
181	decky	425	\item[Interface enforcements] These properties define the correct semantics by which
179	decky	426	a set of subroutines should be used by the rest of the code. Checking for these properties
		427	ensures that some API is always used by the rest of the code in a specified way
181	decky	428	and all reported exceptions are handled by the client code.
		429	\end{description}
179	decky	430
		431	\subsection{Model Checker}
		432	\label{modelcheck}
		433	On the first sight it does not seem to be reasonable to consider general model checkers as
		434	relevant independent tools for formal verification of an existing operating system. While many
		435	different tools use model checkers as their backends, verifying a complete model of the entire
		436	system created by hand seems to be infeasible both in the sense of time required for the model
		437	creation and resources required by the checker to finish the exhaustive traversal of the model's
		438	address space.
		439
		440	Nevertheless, model checkers on their own can still serve a good job verifying abstract
		441	properties of key algorithms without dealing with the technical details of the implementation.
		442	Various properties of synchronization algorithms, data structures and communication protocols
		443	can be verified in the most generic conditions by model checkers, answering the
		444	question whether they are designed properly in theory.
		445
		446	If the implementation of these algorithms and protocols do not behave correctly, we can be sure
		447	that the root cause is in the non-compliance between the design and implementation and not a
		448	fundamental flaw of the design itself.
		449
181	decky	450	\subsection{Architecture and Behavior Checker}
		451	All previously mentioned verification methods were targeting internal properties of the OS
		452	components. If we are moving to a higher-level abstraction in order to specify correct
		453	interaction of the encapsulated components in terms of interface compatibility and communication,
		454	we can utilize \emph{Behavior Protocols}~\cite{bp} or some other formalism describing correct
		455	interaction between software components.
179	decky	456
181	decky	457	To gain the knowledge about the architecture of the whole OS in terms of software
		458	component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}
		459	as the specification of the architecture of the system. This language has the possibility to capture
		460	interface types (with method signatures), primitive components (in terms of provided and required
		461	interfaces), composite components (an architectural compositions of primitive components) and the
		462	bindings between the respective interfaces of the components.
179	decky	463
181	decky	464	It is extremely important to define the right role of the behavior and architecture description.
		465	A flawed approach would be to reverse-engineer this description from the source code (either manually
		466	or via some sophisticated tool) and then verify the compliance between the desciption and
		467	the implementation. However, different directions can give more interesting results:
		468
		469	\begin{description}
		470	\item[Description as specification] Behavior and architecture description created independently
		471	on the source code serves the role of specification. This has the following primary
		472	goals of formal verification:
		473	\begin{description}
		474	\item[Horizontal compliance] Also called \emph{compatibility}. The goal is to check
		475	whether the specifications of components that are bound together are semantically
		476	compatible. All required interfaces need to be bound to provided interfaces and
		477	the communication between the components cannot lead to \emph{no activity} (a deadlock),
		478	\emph{bad activity} (a livelock) or other communication and synchronization errors.
		479	\item[Vertical compliance] Also called \emph{substituability}. The goal is to check whether
		480	it is possible to replace a set of primitive components that are nested inside a composite
		481	component by the composite component itself. In other words, this compliance can answer the
		482	questions whether the architecture description of the system is sound with respect to the hierarchical
		483	composition of the components.
		484	\item[Compliance between the specification and the implementation] Using various means
		485	for generating artificial environments for an isolated component the checker is able to
		486	partially answer the question whether the implementation of the component is an instantiation
		487	of the component specification.
		488	\end{description}
		489	\item[Description as abstraction] Generating the behavior and architecture description from the
		490	source code by means of abstract interpretation can serve the purpose of verifying various
		491	properties of the implementation such as invariants, preconditions and postconditions.
		492	This is similar to static verification, but on the level of component interfaces.
		493	\end{description}
		494
		495	Unfortunately, most of the behavior and architecture formalisms are static, which is not quite suitable
		496	for the dynamic of most OSes. This limitation can be circumvented by considering a relevant
179	decky	497	snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to
		498	a statically defined architecture.
		499
181	decky	500	\medskip
179	decky	501
181	decky	502	The key features of software systems with respect to behavior and architecture checkers are the granularity
		503	of the individual primitive components, the level of isolation and complexity of the communication mechanism
		504	between them. Large monolithic OSes created in sematic-poor C present a severe challenge because the
		505	isolation of the individual components is vague and the communication between them is basically unlimited
		506	(function calls, shared resources, etc.).
179	decky	507
181	decky	508	OSes with explicit component architecture and fine-grained components (such as microkernel multiserver
		509	systems) make the feasibility of the verification much easier, since the degrees of freedom (and thus
		510	the state space) is limited.
		511
179	decky	512	Horizontal and vertical compliance checking can be done exhaustively. This is a fundamental property
181	decky	513	which allows the reasoning about the dependability of the entire component-based OS.
179	decky	514	Assuming that the lower-level verification methods (described in Sections \ref{clang} to \ref{modelcheck})
		515	prove some specific properties of the primitive components, we can be sure that the composition of
181	decky	516	the primitive components into composite components and ultimately into the whole OS
179	decky	517	does not break these properties.
		518
		519	The feasibility of many lower-level verification methods from Sections \ref{clang} to \ref{modelcheck}
181	decky	520	depends largely on the size and complexity of the code under verification. If the entire OS
		521	is decomposed into primitive components with a fine granularity, it is more likely that the
179	decky	522	individual primitive components can be verified against a large number of properties. Thanks to the
		523	recursive component composition we can then be sure that these properties also hold for the entire system.
		524
		525	\medskip
		526
		527	The compliance between the behavior specification and the actual behavior of the implementation is, unfortunately,
		528	the missing link in the chain. This compliance cannot be easily verified in an exhaustive manner. If there is
		529	a discrepancy between the specified and the actual behavior of the components, we cannot conclude anything about
		530	the properties holding in the entire system.
		531
		532	However, there is one way how to improve the situation: \emph{code generation}. If we generate implementation
		533	from the specification, the compliance between them is axiomatic. If we are able to generate enough
		534	code from the specification to run into the hand-written ``business code'' where we check for
		535	the lower-level properties, the conclusions about the component composition are going to hold.
		536
		537	\subsection{Behavior Description Generator}
		538	To conclude our path towards higher abstractions we can utilize tools that can
		539	generate the behavior descriptions from \emph{textual use cases} written in a domain-constrained English.
		540	These generated artifacts can be then compared (e.g. via vertical compliance checking) with the formal
		541	specification. Although the comparison might not provide clean-cut results, it can still be
		542	helpful to confront the more-or-less informal user expectations on the system with the exact formal description.
		543
		544	\subsection{Summary}
		545	\label{missing}
181	decky	546	So far, we have proposed a compact combination of engineering, semi-formal and formal methods which
		547	start at the level of C programming language, offer the possibility to check for the presence of various
		548	common antipatterns, check for generic algorithm-related properties, consistency constrains, interface
		549	enforcements and conclude with a framework to make these properties hold even in the case of a large
		550	OS composed from many components of compliant behavior.
179	decky	551
181	decky	552	We do not claim that there are no missing pieces in the big picture or that the semi-formal verifications
		553	might provide more guarantees in this setup. However, state-of-the-art OS design guidelines can push
		554	further the boundaries of practical feasibility of the presented methods. The limited guarantees
		555	of the low-level methods hold even in the composition and the high-level formal methods do not have
		556	to deal with unlimited degrees of freedom of the primitive component implementation.
179	decky	557
		558	\medskip
		559
		560	We have spoken only about the functional properties. In general, we cannot apply the same formalisms
		561	and methods on extra-functional properties (e.g. timing properties, performance properties, etc.).
		562	And although it probably does make a good sense to reason about component composition for the extra-functional
		563	properties, the exact relation might be different compared to the functional properties.
		564
		565	The extra-functional properties need to be tackled by our future work.
		566
		567	\section{Evaluation}
		568	\label{evaluation}
		569	This section copies the structure of the previous Section \ref{analysis} and adds HelenOS-specific
		570	evaluation of the the proposed formalisms and tools. As this is still largely a work-in-progress,
		571	in many cases just the initial observations can be made.
		572
181	decky	573	The choice of the specific methods, tools and formalisms in this initial phase is mostly motivated
		574	by their perceived commonality and author's claims about fitness for the given purpose. An important
		575	part of further evaluation would certainly be to compare multiple particular approaches, tools
		576	and formalisms to find the optimal combination.
		577
179	decky	578	\subsection{Verifying C Language Compiler and Continuous Integration Tool}
		579	The primary C compiler used by HelenOS is \emph{GNU GCC 4.4.3} (all platforms)~\cite{gcc} and \emph{Clang 2.6.0}
		580	(IA-32)~\cite{clang}. We have taken some effort to support also \emph{ICC} and \emph{Sun Studio} C compilers,
		581	but the compatibility with these compilers in not guaranteed.
		582
		583	The whole code base is compiled with the \texttt{-Wall} and \texttt{-Wextra} compilation options. These options turn on
		584	most of the verification checks of the compilers. The compilers trip on common bug antipatterns such
		585	as implicit typecasting of pointer types, comparison of signed and unsigned integer values (danger
		586	of unchecked overflows), the usage of uninitialized variables, the presence of unused local variables,
		587	NULL-pointer dereferencing (determined by conservative local control flow analysis), functions
		588	with non-void return typed that do not return any value and so on. We treat all compilation warnings
		589	as fatal errors, thus the code base must pass without any warnings.
		590
		591	We also turn on several more specific and strict checks. These checks helped to discover several
		592	latent bugs in the source code:
		593
181	decky	594	\begin{description}
		595	\item[\texttt{-Wfloat-equal}] Check for exact equality comparison between floating point values. The
179	decky	596	usage of equal comparator on floats is usually misguided due to the inherent computational errors
		597	of floats.
181	decky	598	\item[\texttt{-Wcast-align}] Check for code which casts pointers to a type with a stricter alignment
179	decky	599	requirement. On many RISC-based platforms this can cause run-time unaligned access exceptions.
181	decky	600	\item[\texttt{-Wconversion}] Check for code where the implicit type conversion (e.g. from float to integer,
179	decky	601	between signed and unsigned integers or between integers with different number of bits) can
		602	cause the actual value to change.
181	decky	603	\end{description}
179	decky	604
		605	To enhance the semantic information in the source code, we use GCC-specific language extensions to annotate
		606	some particular kernel and core library routines:
		607
181	decky	608	\begin{description}
		609	\item[\texttt{\_\_attribute\_\_((noreturn))}] Functions marked in this way never finish from the point of view
179	decky	610	of the current sequential execution flow. The most common case are the routines which restore previously saved
		611	execution context.
181	decky	612	\item[\texttt{\_\_attribute\_\_((returns\_twice))}] Functions marked in this way may return multiple times from
179	decky	613	the point of view of the current sequential execution flow. This is the case of routines which save the current
		614	execution context (first the function returns as usual, but the function can eventually ``return again''
		615	when the context is being restored).
181	decky	616	\end{description}
179	decky	617
		618	The use of these extensions has pointed out to several hard-to-debug bugs on the IA-64 platform.
		619
		620	\medskip
		621
		622	The automated continuous integration building system is currently work-in-progress. Thus, we do not
		623	test all possible configurations of HelenOS with each changeset yet. Currently only
		624	a representative set of 14 configurations (at least one for each supported platform) is tested by hand
		625	by the developers before committing any non-trivial changeset.
		626
		627	From occasional tests of other configurations by hand and the frequency of compilation, linkage and
		628	even run-time problems we conclude that the automated testing of all feasible configurations will
		629	be very beneficial.
		630
		631	\subsection{Regression and Unit Tests}
		632	As already stated in the previous section, the continuous integration building system has not been finished
		633	yet. Therefore regression and unit tests are executed occasionally by hand, which is time consuming
		634	and prone to human omissions. An automated approach is definitively going to be very helpful.
		635
		636	\subsection{Instrumentation}
		637	We are in the process of implementing our own code instrumentation framework which is motivated mainly
		638	by the need to support MMU-less architectures in the future. But this framework might be also very helpful
181	decky	639	in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar
		640	existing tool because of the estimated complexity to adopt it for the usage in HelenOS.
179	decky	641
181	decky	642	HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since
		643	that time the code base has not been analyzed by Coverity.
179	decky	644
		645	\subsection{Static Analyzer}
		646	The integration of various static analyzers into the HelenOS continuous integration process is underway.
		647	For the initial evaluation we have used \emph{Stanse}~\cite{stanse} and \emph{Clang Analyzer}~\cite{clanganalyzer}.
		648	Both of them showed to be moderately helpful to point out instances of unreachable dead code, use of language
		649	constructs which have ambiguous semantics in C and one case of possible NULL-pointer dereference.
		650
		651	The open framework of Clang seems to be very promising for implementing domain-specific checks (and at
		652	the same time it is also a very promising compiler framework). Our mid-term goal is to incorporate some of the features
		653	of Stanse and VCC (see Section \ref{staticverifier2}) into Clang Analyzer.
		654
		655	\subsection{Static Verifier}
		656	\label{staticverifier2}
		657	We have started to extend the source code of HelenOS kernel with annotations understood
		658	by \emph{Frama-C}~\cite{framac} and \emph{VCC}~\cite{vcc}. Initially we have targeted simple kernel data structures
		659	(doubly-linked circular lists) and basic locking operations. Currently we are evaluating the initial experiences
		660	and we are trying to identify the most suitable methodology, but we expect quite promising results.
		661
		662	As the VCC is based on the Microsoft C++ Compiler, which does not support many GCC extensions, we have been
		663	faced with the requirement to preprocess the source code to be syntactically accepted by VCC. This turned out
		664	to be feasible.
		665
		666	\subsection{Model Checker}
		667	We are in the process of creating models of kernel wait queues (basic HelenOS kernel synchronization
		668	primitive) and futexes (basic user space thread synchronization primitive) using \emph{Promela} and
		669	verify several formal properties (deadlock freedom, fairness) in \emph{Spin}~\cite{spin}. As both the Promela language
		670	and the Spin model checker are mature and commonly used tools for such purposes, we expect no major problems
		671	with this approach. Because both synchronization primitives are relatively complex, utilizing a model checker
		672	should provide a much more trustworthy proof of the required properties than ``paper and pencil''.
181	decky	673
		674	The initial choice of Spin is motivated by its suitability to model threads, their interaction and verify
		675	properties related to race conditions and deadlocks (which is the common sources of OS-related bugs). Other
		676	modeling formalisms might be more suitable for different goals.
179	decky	677
181	decky	678	\subsection{Architecture and Behavior Checker}
		679	We have created an architecture description in ADL language derived from \emph{SOFA ADL}~\cite{adl} for the
		680	majority of the HelenOS components and created the Behavior Protocol specification of these components.
		681	Both descriptions were created independently, not by reverse-engineering the existing source code.
		682	The architecture is a snapshot of the dynamic architecture just after a successful bootstrap of HelenOS.
179	decky	683
		684	Both the architecture and behavior description is readily available as part of the source code repository
		685	of HelenOS, including tools which can preprocess the Behavior Protocols according to the architecture description
		686	and create an output suitable for \emph{bp2promela} checker~\cite{bp}.
		687
		688	As the resulting complexity of the description is larger than any of the previously published case studies
181	decky	689	on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune
		690	the bp2promela checker to process the input.
179	decky	691
		692	\medskip
		693
181	decky	694	We have not started to generate code from the architecture description so far because of time constrains.
		695	However, we believe that this is a very promising way to go and provide reasonable guarantees about
		696	the compliance between the specification and the implementation.
179	decky	697
		698	\subsection{Behavior Description Generator}
181	decky	699	We have not tackled the issue of behavior description generation yet, although tools such as
		700	\emph{Procasor}~\cite{procasor} are readily available. We do not consider it our priority at this time.
179	decky	701
		702	\section{Conclusion}
		703	\label{conclusion}
181	decky	704	In this paper we propose a complex combination of various verification methods and tools
		705	to achieve the verification of an entire general-purpose operating system. The proposed
		706	approach generally follows a bottom-up route, starting with low-level checks using state-of-the-art
		707	verifying C language compilers, following by static analyzers and static verifiers.
		708	In specific contexts regression and unit tests, code instrumentation and model checkers
		709	for the sake of verification of key algorithms are utilized.
179	decky	710
		711	Thanks to the properties of state-of-the-art microkernel multiserver operating
181	decky	712	system design (e.g. software component encapsulation and composition, fine-grained isolated
		713	components), we demonstrate that it should be feasible to successfully verify larger and more
		714	complex operating systems than in the case of monolithic designs. We use formal component
		715	architecture and behavior description for the closure. The final goal -- a formally verified
		716	operating system -- is the emerging property of the combination of the various methods.
179	decky	717
		718	\medskip
		719
181	decky	720	The contribution of this paper is the shift of focus from attempts to use a single
		721	``silver-bullet'' method for formal verification of an operating system to a combination
		722	of multiple methods supported by a suitable architecture of the operating system.
		723	The main benefit is a much larger coverage of the set of all hypothetical properties.
179	decky	724
		725	We also argue that the approach should be suitable for the mainstream
		726	general-purpose operating systems in the near future, because they are gradually
		727	incorporating more microkernel-based features and fine-grained software components.
		728
		729	Although the evaluation of the proposed approach on HelenOS is still work-in-progress, the
		730	preliminary results and estimates are promising.
		731
		732	\medskip
		733
181	decky	734	\noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of
		735	the HelenOS project. Without their vision and dedication the work on this paper would be almost impossible
179	decky	736
		737	This work was partially supported by the Ministry of Education of the Czech Republic
		738	(grant MSM\-0021620838).
		739
		740	\begin{thebibliography}{99}
		741	\bibitem{billion}Bessey A., Block K., Chelf B., Chou A., Fulton B., Hallem S., Henri-Gros C., Kamsky A., McPeak S., Engler D.: \emph{A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World}, Communications of the ACM, Vol. 53 No. 2, pp 66-75, 2010
		742	\bibitem{bp}Kofron J.: \emph{Checking Software Component Behavior Using Behavior Protocols and Spin}, in proceedings of Applied Computing 2007, Seoul, Korea, pp. 1513-1517, 2007
		743	\bibitem{gcc}GCC, the GNU Compiler Collection, \texttt{http://gcc.gnu.org/}
		744	\bibitem{clang}Clang: a C language family frontend for LLVM, \texttt{http://clang.llvm.org/}
		745	\bibitem{clanganalyzer}Clang Static Analyzer, \texttt{http://clang-analyzer.llvm.org/}
		746	\bibitem{cocome}Bulej L., Bures T., Coupaye T., Decky M., Jezek P., Parizek P., Plasil F., Poch T., Rivierre N., Sery O., Tuma P.: \emph{CoCoME in Fractal}, chapter in The Common Component Modeling Example: Comparing Software Component Models, Springer-Verlag, LNCS 5153, 2008
		747	\bibitem{coverity}Coverity, \texttt{http://scan.coverity.com/}
		748	\bibitem{procasor}Mencl V.: \emph{Deriving Behavior Specifications from Textual Use Cases}, in Proceedings of Workshop on Intelligent Technologies for Software Engineering (WITSE04, Sep 21, 2004, part of ASE 2004), Linz, Austria, pp. 331 - 341, Oesterreichische Computer Gesellschaft, 2004
		749	\bibitem{framac}Frama-C, \texttt{http://frama-c.cea.fr/}
		750	\bibitem{c}Lawlis P. K: \emph{Guidelines for Choosing a Computer Language: Support for the Visionary Organization}, Ada Information Clearinghouse, \texttt{http://archive.adaic.com/docs/reports/lawlis/k.htm}, 1998
		751	\bibitem{helenos}HelenOS Project, \texttt{http://www.helenos.org/}
		752	\bibitem{adl}Oplustil T.: \emph{Inheritance in Architecture Description Languages}, reviewed section of Proceedings of the Week of Doctoral Students 2003 conference (WDS 2003), Charles University, Prague, Czech Republic, 2003, pp. 124 - 131, 2003
		753	\bibitem{marketshare}Operating System Market Share, \texttt{http://marketshare.hitslink.com/report\-.aspx?qprid=8\&qptimeframe=M\&qpsp=132}, retrieved on 2010-02-28
		754	\bibitem{seL4}Klein G., Elphinstone K., Heiser G., Andronick J., Cock D., Derrin P., Elkaduwe D., Engelhardt K., Kolanski R., Norrish M., Sewell T., Tuch H., Winwood S.: \emph{seL4: Formal verification of an OS kernel}, Proceedings of the 22nd ACM Symposium on Operating Systems Principles, Big Sky, MT, USA, 2009
		755	\bibitem{spin}Spin, \texttt{http://spinroot.com/}
		756	\bibitem{stanse}Stanse: Static Analysis Framework for C code, \texttt{http://stanse.fi.muni.cz/}
		757	\bibitem{valgrind}Valgrind, \texttt{http://valgrind.org/}
		758	\bibitem{vcc}VCC, \texttt{http://vcc.codeplex.com/}
		759	\end{thebibliography}
		760	\end{document}

Subversion Repositories HelenOS-doc

(root)/papers/isarcs10/isarcs10.tex @ 184 – Rev 182