WebSVN – HelenOS-doc – Blame – /papers/isarcs10/isarcs10.tex

Rev	Author	Line No.	Line
179	decky	1	\documentclass{llncs}
		2	\usepackage{graphicx}
		3
		4	\title{A Road to a Formally Verified General-Purpose Operating System}
		5	\author{Martin D\v{e}ck\'{y}}
		6	\institute{Department of Software Engineering\\
		7	Faculty of Mathematics and Physics, Charles University\\
		8	Malostransk\'{e} n\'{a}m\v{e}st\'{i} 25, Prague 1, 118~00, Czech Republic\\
		9	\email{martin.decky@dsrg.mff.cuni.cz}}
		10
		11	\begin{document}
		12	\maketitle
		13
		14	\begin{abstract}
		15	Methods of formal description and verification represent a viable way for achieving
		16	fundamentally bug-free software. However, in reality only a small subset of the existing operating
		17	systems were ever formally verified, despite the fact that an operating system is a critical part
		18	of almost any other software system. This paper summarizes the challenges involved in formal verification
		19	of operating systems, points out several key design choices which should make the formal verification
		20	easier and presents a work-in-progress and initial experiences with formal verification of HelenOS,
		21	a state-of-the-art microkernel-based operating system, which, however, was not designed specifically
		22	with formal verification in mind, as this is mostly prohibitive due to time and budget constrains.
		23
		24	The contribution of this paper is the shift of focus from attempts to use a single ``silver-bullet''
		25	formal verification method which would be able to verify everything to a combination of multiple
		26	formalisms and techniques to cover various aspects of the verification process. The formally verified
		27	operating system is the emerging property of the combination.
		28	\end{abstract}
		29
		30	\section{Introduction}
		31	\label{introduction}
		32	In the context of formal verification of software, it is always important to model the outer
		33	environment with a proper level of abstraction. Weak assumptions on the environment make the formal
		34	verification less feasible, because they yield many degrees of freedom which render
		35	the properties which we want to verify overly complex. But strong assumptions on the environment are likewise
		36	not without a price: Any practical usability of the verification results depends on the question
		37	whether we are really able to create a physical environment which satisfies the assumptions.
		38
		39	There are some assumptions which are universal for any formal verification method. For the sake of
		40	simplicity let us call them \emph{idealized hardware}. Let us also cover the computational model
		41	and basic mathematical principles by this term.
		42
		43	What is \emph{idealized hardware}? We usually assume that the software is executed by a ``perfect'' hardware
		44	which works exactly according to its specification. Every divergence between the intended and actual
		45	behavior of software counts as a software bug and never as an issue in some hardware component. We
		46	implicitly assume that the hardware is perfect in the sense of physical realization and all probabilities
		47	of accidental mechanical and electrical failures are expected to be zero.
		48
		49	It is trivial to see that the \emph{idealized hardware} assumptions can never hold in real world.
		50	Accidental mechanical and electrical failures are inevitable in principle and their probability can
		51	never be a mathematical zero. Does this mean that any formal verification method is ultimately futile
		52	since we are unable to ensure validity of the most elementary assumptions? Probably not. We just need
		53	to be aware about the inherent and fundamental limitations of the formal methods which result from
		54	the shortcomings of the physical world.
		55
		56	The more precisely we model the software environment of the software under discussion, the more precisely
		57	we can calculate or estimate the resulting impact of the imperfections of the physical world on our software
		58	system. This is the most important motivation for formal reasoning about the correctness of the operating system.
		59
		60	\medskip
		61
		62	In more detail, operating systems have a somewhat special position among all software:
		63
		64	\begin{itemize}
		65	\item \emph{Operating systems are usually designed to run on bare hardware.} This means that except
		66	the \emph{idealized hardware} assumptions we do not have to take any or almost any extra assumptions
		67	into account.
		68	\item \emph{Operating systems create the lowest software layer and provide services to essentially
		69	all other software.} Considering the principle of recursion, the properties of the
		70	operating systems which we prove or disprove form the assumptions for the upper layers
		71	of software. Thus the dependability of end-user and enterprise software systems is limited
		72	by the dependability of the operating system.
		73	\item \emph{Operating systems are non-trivial software on their own.} The way
		74	they are generally designed and programmed (spanning both the kernel and user mode,
		75	manipulating execution contexts and concurrency, handling critical hardware-related
		76	operations) represent significant and interesting challenges for the formal verification
		77	methods and tools.
		78	\end{itemize}
		79
		80	Even in the informal understanding, the dependability of an operating system greatly determines the perceived
		81	dependability of the entire software stack. This led to several research initiatives in the recent years
		82	which target the creation of a formally verified operating systems from scratch~\cite{seL4}.
		83
		84	However, 98~\%\footnote{98~\% of client computers connected to the Internet as of January 2010~\cite{marketshare}.} of
		85	the market share of general-purpose operating systems is taken by Windows, Mac~OS~X and Linux.
		86	These systems were clearly not designed with formal verification in mind from the very beginning.
		87	The situation on the embedded, real-time and special-purpose operating systems market is probably different,
		88	but it is unlikely that the situation in the large domain of desktops and servers is going to change
		89	very rapidly in the near future.
		90
		91	Therefore we need to face the challenges of applying formal methods on an existing code base in the domain
		92	of general-purpose operating systems. Fortunately, the software design qualities of the general-purpose
		93	operating systems gradually improve over time. We can see then novel approaches in the operating systems
		94	research from the late 1980s and 1990s (microkernel design, user space file system and device drivers, etc.)
		95	to slowly emerge in the originally monolithic operating systems. We can also see better code quality thanks
		96	to improved software engineering (code review, proper escalation management, etc.).
		97
		98	\medskip
		99
		100	This paper proposes an approach and presents a work-in-progress case study of formal verification
		101	of an general-purpose research operating system, which was also not created specifically with formal
		102	verification in mind from the very beginning, but it was designed according to state-of-the-art operating systems
		103	principles.
		104
		105	\medskip
		106
		107	\noindent\textbf{Structure of the Paper.} In Section \ref{context} we introduce the case study in more detail and explain
		108	why we believe it is relevant. In Section \ref{analysis} we discuss our proposal of the combination of formal methods
		109	and tools. In Section \ref{evaluation} we present the preliminary evaluation of our efforts and estimate the complexity
		110	of the imminent next steps we want to take according to our proposal. Finally, in Section \ref{conclusion}
		111	we present the conclusion of the paper.
		112
		113	\section{Context}
		114	\label{context}
		115	Two very common schemes of operating system design are \emph{monolithic design} and \emph{microkernel design}.
		116	Without going into much detail of any specific operating system, we can define the monolithic design as
		117	a preference to put numerous aspects of the core operating system functionality into the kernel,
		118	while microkernel design is a preference to keep the kernel small, with just a minimal set of features.
		119
		120	The features which are missing from the kernel in the microkernel design are implemented in user space, usually
		121	by means of libraries, servers and/or software components.
		122
		123	\subsection{HelenOS}
		124	\label{helenos}
		125	\emph{HelenOS} is a general-purpose research operating system which is being developed at Charles
		126	University in Prague. The source code is available under the BSD open source license and can be
		127	freely downloaded from the project web site~\cite{helenos}. The authors of the code base are
		128	both from the academia and from the open source community (several contributors are employed
		129	as Solaris kernel developers and many are freelance IT professionals). We consistently strive to support
		130	the research and also the practical motivations for developing the system.
		131
		132	HelenOS uses a preemptive priority-feedback scheduler, it supports SMP hardware and it is
		133	designed to be highly portable. Currently it runs on 7 distinct hardware architectures, including the
		134	most common IA-32, x86-64 (AMD64), IA-64, SPARC~v9 and PowerPC. It also runs on ARMv7 and MIPS,
		135	but currently only in simulators and not on physical hardware.
		136
		137	Although HelenOS is still far from being an everyday replacement for Linux or Windows due to the lack
		138	of end-user applications (whose development is extremely time-consuming, but unfortunately of
		139	no scientific value), the essential foundations such as file system support and TCP/IP networking
		140	are already in place.
		141
		142	HelenOS does not currently target embedded devices (although the ARMv7 port can be easily modified to
		143	run on various embedded boards) and does not implement real-time scheduling and synchronization.
		144	Many development projects such as task snapshoting and migration, generic device driver
		145	framework, support for MMU-less platforms and performance monitoring are currently underway.
		146
		147	\medskip
		148
		149	HelenOS has a microkernel multiserver design, but the guiding principles of the HelenOS design are
		150	actually more elaborate:
		151
		152	\begin{itemize}
		153	\item \emph{(Microkernel principle)} Every functionality of the operating system that does not
		154	have to be necessary implemented in the kernel should be implemented in user space. This
		155	implies that subsystems such as the file system, device drivers (except those which are
		156	essential for the basic kernel functionality), naming and trading services, networking,
		157	human interface and similar features should be implemented in user space.
		158	\item \emph{(Full-fledged principle)} Features which need to be implemented in kernel should
		159	be designed with full-fledged algorithms and data structures. In contrast
		160	to several other microkernel operating systems, where the authors have deliberately chosen
		161	the most simplistic approach (static memory allocation, na\"{\i}ve algorithms, simple data
		162	structures), HelenOS microkernel tries to use the most advanced and suitable means available.
		163	It contains features such as AVL and B+ trees, hashing tables, SLAB memory allocator, multiple
		164	in-kernel synchronization primitives, fine-grained locking and so on.
		165	\item \emph{(Multiserver principle)} Subsystems in user space should be decomposed with the smallest
		166	reasonable granularity. Each unit of decomposition should be encapsulated in a separate task.
		167	The tasks represent software components with isolated address spaces. From the design point of
		168	view the kernel can be seen as a separate component, too.
		169	\item \emph{(Split of mechanism and policy)} The kernel should only provide low-level mechanisms,
		170	while the high-level policies which are built upon these mechanisms should be defined in
		171	user space. This also implies that the policies should be easily replaceable while keeping
		172	the low-level mechanisms intact.
		173	\item \emph{(Encapsulation principle)} The communication between the tasks/components should be
		174	implemented only via a set of well-defined interfaces. In the user-to-user case the preferred
		175	communication mechanism is HelenOS IPC, which provides reasonable mix of abstraction and
		176	performance (RPC-like primitives combined with implicit memory sharing for large data
		177	transfers). In case of synchronous user-to-kernel communication the usual syscalls are used.
		178	HelenOS IPC is used again for asynchronous kernel-to-user communication.
		179	\item \emph{(Portability principle)} The design and implementation should always maintain a high
		180	level of platform neutrality and portability. Platform-specific code in the kernel, core
		181	libraries and tasks implementing low-level functionality (e.g. device drivers) should be
		182	clearly separated from the generic code (either by component decomposition, naming and trading,
		183	or at least by internal compile-time APIs).
		184	\end{itemize}
		185
		186	As these design guiding principles suggest, the size of the HelenOS microkernel is considerably larger
		187	compared to ``scrupulous'' microkernel implementations. The average footprint of the kernel ranges from
		188	569~KiB when all logging messages, asserts, symbol resolution and debugging kernel console are compiled
		189	in, down to 198~KiB for a non-debugging production build. We do not believe that the raw size
		190	of the microkernel is a relevant quality criterion per se, without taking the actual feature set
		191	into account.
		192
		193	\medskip
		194
		195	The run-time architecture of HelenOS is inherently dynamic. The bindings between the components are
		196	not created at compile-time, but during the bootstrap process and can be modified to a large degree
		197	also during normal operation mode of the system (via human interaction and external events). This
		198	creates particularly interesting challenges for describing the design of the system by many formalisms.
		199
		200	\medskip
		201
		202	Yet another set of obstacles for reasoning about the properties of HelenOS lies in the design of
		203	the ubiquitous HelenOS IPC mechanism and the associated threading model. The IPC is inherently
		204	asynchronous with constant message buffers in the kernel and dynamic buffers in user space.
		205	It uses the notions of uni-directional bindings, mandatory pairing of requests and replies,
		206	binding establishment and abolishment handshakes, memory sharing and fast message forwarding.
		207
		208	For easier management of the asynchronous messages and the possibility to process multiple
		209	messages from different peers without the usual kernel threading overhead, the core user space
		210	library manages the execution flow by so-called \emph{fibrils}. A fibril is a user-space-managed thread with
		211	cooperative scheduling. A different fibril is scheduled every time the current fibril is
		212	about to be blocked while sending out IPC requests (because the kernel buffers of the addressee
		213	are full) or while waiting on an IPC reply. This allows different execution flows within the
		214	same thread to process multiple requests and replies. To safeguard proper sequencing
		215	of IPC messages and provide synchronization, special fibril-aware synchronization primitives
		216	(mutexes, condition variables, etc.) are available.
		217
		218	Because of the cooperative nature of fibrils, they might cause severe performance under-utilization
		219	in SMP configurations and system-wide bottlenecks. As multicore processors are more and more
		220	common nowadays, that would be a substantial design flaw. Therefore the fibrils can be also freely
		221	(and to some degree even automatically) combined with the usual kernel threads, which provide
		222	preemptive scheduling and true parallelism on SMP machines. Needless to say, this combination is
		223	also a grand challenge for the formal reasoning.
		224
		225	\medskip
		226
		227	To sum up, the choice of HelenOS as our case study is based on the fact that it was not designed
		228	in advance with formal verification in mind. This is similar to most general-purpose operating
		229	systems in common use. At the same time, it does not have an obsolete design and is non-trivial.
		230
		231	\subsection{The C Programming Language}
		232	A large majority of operating systems is coded in the C programming language. HelenOS is no exception
		233	to this. The choice of C in the case of kernel is usually well-motivated -- the language was designed
		234	specifically for implementing system software~\cite{c}. It is reasonably low-level in the sense that it allows
		235	to access the memory and other hardware resources with similar effectiveness as from assembler.
		236	It also requires almost no run-time support and it exports many features of the von Neumann hardware
		237	architecture to the programmer in a very straightforward, but still relatively portable way.
		238
		239	However, what is the biggest advantage of C in terms of run-time performance is also the biggest weakness
		240	for formal reasoning. The permissive memory access model of C, the lack of any reference safety
		241	enforcement, the weak type system and generally little semantic information in the code -- all these
		242	properties do not allow to make many general assumptions about the code.
		243
		244	Programming languages which target controlled environments such as Java or C\(\sharp\) are
		245	generally easier for formal reasoning because they provide a well-known set of primitives
		246	and language constructs for object ownership, threading and synchronization. Many non-imperative
		247	programming languages can be even considered to be a form of ``executable specification'' and thus
		248	very suitable for formal reasoning. In C, almost everything is left to the programmer who
		249	is free to set the rules.
		250
		251	\medskip
		252
		253	The reasons for using C in the user space of HelenOS (and other established operating systems) is
		254	probably much more questionable. Except for the core libraries and services (such as device drivers),
		255	C might be easily replaced by any high-level and perhaps even non-imperative programming language.
		256	The reasons for using C in this context is mostly historical.
		257
		258	However, as we have stated in Section \ref{introduction}, the way general-purpose operating systems
		259	are implemented changes only slowly and therefore any propositions which require radical modification
		260	of the existing code base before committing to the formal verification are not realistic.
		261
		262	\section{Analysis}
		263	\label{analysis}
		264
		265	\begin{figure}[t]
		266	\begin{center}
		267	\resizebox*{120mm}{!}{\includegraphics{diag}}
		268	\caption{Overview of the formalisms and tools proposed.}
		269	\label{fig:diag}
		270	\end{center}
		271	\end{figure}
		272
		273	In this section, we analyze the properties we would like to check in a general-purpose
		274	operating system. Each set of properties usually requires a specific verification method,
		275	tool or toolchain.
		276
		277	Our approach will be mostly bottom-to-top, or, in other words, from the lower levels of abstraction
		278	to the higher levels of abstraction. If the verification fails on a lower level, it usually
		279	does not make much sense to continue with the higher levels of abstraction until the issues
		280	are tackled. A structured overview of the formalisms and tools can be seen on Figure \ref{fig:diag}.
		281
		282	Please note that the titles of the following sections do not follow any particular established
		283	taxonomy of verification methods. We have simply chosen the names to be descriptive.
		284
		285	\subsection{C Language Compiler and Continuous Integration Tool}
		286	\label{clang}
		287	The initial levels of abstraction do not go far from the C source code. First, we would certainly like to
		288	know whether our code base is compliant with the programming language specification and passes
		289	only the basic semantic checks (proper number and types of arguments passed to functions, etc.).
		290	It is perhaps not very surprising that these decisions can be make by any plain C compiler.
		291	However, with the current implementation of HelenOS even this is not quite trivial.
		292	Besides the requirement to support 7 hardware platforms, the system's compile-time configuration
		293	can be also affected by approximately 65 configuration options, most of which are booleans, the rest
		294	are enumerated types.
		295
		296	These configuration options are bound by logical propositions in conjunctive or disjunctive
		297	normal forms and while some options are freely configurable, the value of others gets inferred
		298	by the build system of HelenOS. Thus, the overall number of distinct configurations in which
		299	HelenOS can be compiled is at least one order of magnitude larger than the plain number
		300	of supported hardware platforms.
		301
		302	Various configuration options affect conditional compilation and linking. The programmers
		303	are used to make sure that the source code compiles and links fine with respect to the
		304	most common and obvious configurations, but the unforeseen interaction of the less common
		305	configuration options might cause linking or even compilation errors.
		306
		307	\medskip
		308
		309	A straightforward solution is to generate all distinct configurations, starting from the
		310	open variables and inferring the others. This can be part of the continuous integration
		311	process which would try to compile and link the sources in all distinct configurations.
		312
		313	If we want to be really pedantic, we should also make sure that we run all the relevant higher
		314	level verification methods on all configurations generated by this step. That would certainly
		315	require to multiply the time required by the verification methods at least by the number
		316	of the distinct configurations. Constraining the set of configurations to just the most
		317	representative ones is perhaps a reasonable compromise.
		318
		319	\subsection{Regression and Unit Tests}
		320	Although some people would argue whether testing is a formal verification method, we still include
		321	it into the big picture. Running regression and unit tests which are part of HelenOS code base
		322	in the continuous integration process is fairly easy. The only complication lies in the technicalities:
		323	We need to setup an automated network of physical machines and simulators which can run the
		324	appropriate compilation outputs for the specific platforms. We need to be able to reboot
		325	them remotely and distribute the boot images to them. And last but not least, we need to be
		326	able to gather the results from them.
		327
		328	\subsection{Instrumentation}
		329	Instrumentation tools for detecting memory leaks, performance bottlenecks and soft-deadlocks
		330	are also not usually considered to be formal verification tools. They are also rarely utilized
		331	on regular basis as part of the continuous integration process. But again, it might be helpful
		332	to just mention them in the context of regression and unit tests.
		333
		334	If some regression or unit tests fail, they sometimes do not give sufficient information to
		335	tell immediately what is the root cause of the issue. In that case running the faulting tests
		336	on manually or automatically instrumented executable code might provide more data and point
		337	more directly to the actual bug.
		338
		339	\subsection{Verifying C Language Compiler}
		340	C language compilers are traditionally not considered to be formal verification tools. Many people
		341	just say that C compilers are good at generating executable code, but do not care much about the semantics
		342	of the source code (on the other hand, formal verification tools usually do not generate any executable code
		343	at all). However, with recent development in the compiler domain, the old paradigms are shifting.
		344
		345	As the optimization passes and general maturity of the compilers improve over time,
		346	the compilers try to extract and use more and more semantic information from the source code.
		347	The C language is quite poor on explicit semantic information, but the verifying compilers
		348	try to rely on vendor-specific language extensions and on the fact that some semantic information
		349	can be added to the source code without changing the resulting executable code.
		350
		351	The checks done by the verifying compilers cannot result in fatal errors in the usual cases (they
		352	are just warnings). Firstly, the compilers still need to successfully compile a well-formed C source
		353	code compliant to some older standard (e.g. C89) even when it is not up with the current quality
		354	expectations. Old legacy source code should still pass the compilation as it did decades ago.
		355
		356	Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation
		357	or exhaustive traversal of a model state space. They are mostly realized as abstract syntax tree
		358	transformations much in the line with the supporting routines of the compilation process (data
		359	and control flow graph analysis, dead code elimination, register allocation, etc.) and the evaluation
		360	function is basically the matching of antipatterns of common programming bugs.
		361
		362	The checks are usually conservative. The verifying compilers identify code constructs which are suspicious,
		363	which might arise out of programmer's bad intuition and so on, but even these code snippets cannot be
		364	tagged as definitive bugs (since the programmer can be simply in a position where he/she really wants to
		365	do something very strange, but nevertheless legitimate). It is upon the programmer
		366	to examine the root cause of the compiler warning, tell whether it is really a bug or just a false
		367	positive and fix the issue by either amending some additional semantic information (e.g. adding an
		368	explicit typecast or a vendor-specific language extension) or rewriting the code.
		369
		370	\subsection{Static Analyzer}
		371	Static analyzers try to go deeper than verifying compilers. Besides detecting common antipatterns of
		372	bugs, they also use techniques such as abstract interpretation to check for more complex properties.
		373
		374	Most commercial static analyzers come with a predefined set of properties which cannot be easily changed.
		375	They are coupled with the commonly used semantics of the environment and generate domain-specific models
		376	of the software based not only on the syntax of the source code, but also based on the assumptions derived
		377	from the memory access model, allocation and deallocation rules, tracking of references and tracking of
		378	concurrency locks.
		379
		380	The biggest advantage of static analyzers is that they can be easily included in the development or
		381	continuous integration process as an additional automated step, very similar to the verifying compilers.
		382	No definition of properties is needed and false positives can be relatively easily eliminated by amending
		383	some explicit additional information to the source code within the boundaries of the programming language.
		384
		385	The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion}, but static
		386	analyzers are still relatively limited by the kind of bugs they are designed to detect. They
		387	are usually good at pointing out common issues with security implications (specific types of buffer
		388	and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten
		389	deallocation of resources and release of locks, etc.). Unfortunately, many static analyzers
		390	only analyze a single-threaded control flow and are thus unable to detect concurrency issues
		391	such as deadlocks.
		392
		393	\subsection{Static Verifier}
		394	There is one key difference between a static analyzer and a static verifier: Static verifiers
		395	allow the user to specify one's own properties, in terms of preconditions, postconditions and
		396	invariants in the code. Many static verifiers also target true multithreaded usage patterns
		397	and have the capability to check proper locking order, hand-over-hand locking and even liveliness.
		398
		399	In the context of an operating system kernel and core libraries two kinds of properties are
		400	common:
		401
		402	\begin{itemize}
		403	\item \emph{Consistency constrains:} These properties define the correct way how data is supposed
		404	to be manipulated by some related set of subroutines. Checking for these
		405	properties ensures that data structures and internal states will not get corrupt due
		406	to bugs in the functions and methods which are designed to manipulate them.
		407	\item \emph{Interface enforcement:} These properties define the correct semantics by which
		408	a set of subroutines should be used by the rest of the code. Checking for these properties
		409	ensures that some API is always used by the rest of the code in a specified way
		410	and all possible error states are detected and reported.
		411	\end{itemize}
		412
		413	\subsection{Model Checker}
		414	\label{modelcheck}
		415	On the first sight it does not seem to be reasonable to consider general model checkers as
		416	relevant independent tools for formal verification of an existing operating system. While many
		417	different tools use model checkers as their backends, verifying a complete model of the entire
		418	system created by hand seems to be infeasible both in the sense of time required for the model
		419	creation and resources required by the checker to finish the exhaustive traversal of the model's
		420	address space.
		421
		422	Nevertheless, model checkers on their own can still serve a good job verifying abstract
		423	properties of key algorithms without dealing with the technical details of the implementation.
		424	Various properties of synchronization algorithms, data structures and communication protocols
		425	can be verified in the most generic conditions by model checkers, answering the
		426	question whether they are designed properly in theory.
		427
		428	If the implementation of these algorithms and protocols do not behave correctly, we can be sure
		429	that the root cause is in the non-compliance between the design and implementation and not a
		430	fundamental flaw of the design itself.
		431
		432	\subsection{Behavior Checker}
		433	All previously mentioned verification methods were targeting internal properties of the operating system
		434	components. If we are moving to a higher-level abstraction in order to specify correct interaction of the encapsulated
		435	components in terms of interface compatibility and communication, we can utilize \emph{Behavior Protocols}~\cite{bp}.
		436
		437	To gain the knowledge about the architecture of the whole operating system in terms of software
		438	component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}.
		439	This language has the possibility to capture interface types (with method signatures), primitive
		440	components (in terms of provided and required interfaces), composite components (an architectural
		441	compositions of primitive components) and the bindings between the respective interfaces of the
		442	components.
		443
		444	Unfortunately, the description is usually static, which is not quite suitable for the dynamic
		445	nature of HelenOS and other operating systems. This limitation can be circumvented by considering a relevant
		446	snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to
		447	a statically defined architecture.
		448
		449	Behavior Protocol checkers can target three main goals:
		450
		451	\begin{itemize}
		452	\item \emph{Horizontal compliance:} Also called \emph{compatibility}. The goal is to check
		453	whether the specifications of components that are bound together are semantically
		454	compatible. All required interfaces need to be bound to provided interfaces and
		455	the communication between the components cannot lead to \emph{no activity} (a deadlock)
		456	or a \emph{bad activity} (a livelock).
		457	\item \emph{Vertical compliance:} Also called \emph{substituability}. The goal is to check whether
		458	it is possible to replace a set of primitive components that are nested inside a composite
		459	component by the composite component itself. In other words, this compliance can answer the
		460	questions whether the architecture description of the system is sound with respect to the hierarchical
		461	composition of the components.
		462	\item \emph{Compliance between the specification and the implementation:} Using various means
		463	for generating artificial environments for an isolated component the checker is able to
		464	partially answer the question whether the implementation of the component is an instantiation
		465	of the component specification.
		466	\end{itemize}
		467
		468	Horizontal and vertical compliance checking can be done exhaustively. This is a fundamental property
		469	which allows the reasoning about the dependability of the entire component-based operating system.
		470	Assuming that the lower-level verification methods (described in Sections \ref{clang} to \ref{modelcheck})
		471	prove some specific properties of the primitive components, we can be sure that the composition of
		472	the primitive components into composite components and ultimately into the whole operating system
		473	does not break these properties.
		474
		475	The feasibility of many lower-level verification methods from Sections \ref{clang} to \ref{modelcheck}
		476	depends largely on the size and complexity of the code under verification. If the entire operating
		477	system is decomposed into primitive components with a reasonable granularity, it is more likely that the
		478	individual primitive components can be verified against a large number of properties. Thanks to the
		479	recursive component composition we can then be sure that these properties also hold for the entire system.
		480	In monolithic operating systems without a clear decomposition we cannot do this inference and we
		481	need to verify the lower-level properties over a much larger code base (e.g. the whole monolithic kernel).
		482
		483	\medskip
		484
		485	The compliance between the behavior specification and the actual behavior of the implementation is, unfortunately,
		486	the missing link in the chain. This compliance cannot be easily verified in an exhaustive manner. If there is
		487	a discrepancy between the specified and the actual behavior of the components, we cannot conclude anything about
		488	the properties holding in the entire system.
		489
		490	However, there is one way how to improve the situation: \emph{code generation}. If we generate implementation
		491	from the specification, the compliance between them is axiomatic. If we are able to generate enough
		492	code from the specification to run into the hand-written ``business code'' where we check for
		493	the lower-level properties, the conclusions about the component composition are going to hold.
		494
		495	\subsection{Behavior Description Generator}
		496	To conclude our path towards higher abstractions we can utilize tools that can
		497	generate the behavior descriptions from \emph{textual use cases} written in a domain-constrained English.
		498	These generated artifacts can be then compared (e.g. via vertical compliance checking) with the formal
		499	specification. Although the comparison might not provide clean-cut results, it can still be
		500	helpful to confront the more-or-less informal user expectations on the system with the exact formal description.
		501
		502	\subsection{Summary}
		503	\label{missing}
		504	So far, we have proposed a compact combination of formal methods which start at the level of C programming
		505	language, offer the possibility to check for the presence of various common antipatterns, to check for generic
		506	algorithm-related properties, consistency constrains, interface enforcements and conclude with a framework
		507	to make these properties hold even in the case of a large operating system composed from many
		508	components of compliant behavior.
		509
		510	We are also able to fill in some of the missing pieces by other software engineering approaches
		511	such as regression and unit testing and instrumentation.
		512
		513	\medskip
		514
		515	We have spoken only about the functional properties. In general, we cannot apply the same formalisms
		516	and methods on extra-functional properties (e.g. timing properties, performance properties, etc.).
		517	And although it probably does make a good sense to reason about component composition for the extra-functional
		518	properties, the exact relation might be different compared to the functional properties.
		519
		520	The extra-functional properties need to be tackled by our future work.
		521
		522	\section{Evaluation}
		523	\label{evaluation}
		524	This section copies the structure of the previous Section \ref{analysis} and adds HelenOS-specific
		525	evaluation of the the proposed formalisms and tools. As this is still largely a work-in-progress,
		526	in many cases just the initial observations can be made.
		527
		528	\subsection{Verifying C Language Compiler and Continuous Integration Tool}
		529	The primary C compiler used by HelenOS is \emph{GNU GCC 4.4.3} (all platforms)~\cite{gcc} and \emph{Clang 2.6.0}
		530	(IA-32)~\cite{clang}. We have taken some effort to support also \emph{ICC} and \emph{Sun Studio} C compilers,
		531	but the compatibility with these compilers in not guaranteed.
		532
		533	The whole code base is compiled with the \texttt{-Wall} and \texttt{-Wextra} compilation options. These options turn on
		534	most of the verification checks of the compilers. The compilers trip on common bug antipatterns such
		535	as implicit typecasting of pointer types, comparison of signed and unsigned integer values (danger
		536	of unchecked overflows), the usage of uninitialized variables, the presence of unused local variables,
		537	NULL-pointer dereferencing (determined by conservative local control flow analysis), functions
		538	with non-void return typed that do not return any value and so on. We treat all compilation warnings
		539	as fatal errors, thus the code base must pass without any warnings.
		540
		541	We also turn on several more specific and strict checks. These checks helped to discover several
		542	latent bugs in the source code:
		543
		544	\begin{itemize}
		545	\item \texttt{-Wfloat-equal} Check for exact equality comparison between floating point values. The
		546	usage of equal comparator on floats is usually misguided due to the inherent computational errors
		547	of floats.
		548	\item \texttt{-Wcast-align} Check for code which casts pointers to a type with a stricter alignment
		549	requirement. On many RISC-based platforms this can cause run-time unaligned access exceptions.
		550	\item \texttt{-Wconversion} Check for code where the implicit type conversion (e.g. from float to integer,
		551	between signed and unsigned integers or between integers with different number of bits) can
		552	cause the actual value to change.
		553	\end{itemize}
		554
		555	To enhance the semantic information in the source code, we use GCC-specific language extensions to annotate
		556	some particular kernel and core library routines:
		557
		558	\begin{itemize}
		559	\item \texttt{\_\_attribute\_\_((noreturn))} Functions marked in this way never finish from the point of view
		560	of the current sequential execution flow. The most common case are the routines which restore previously saved
		561	execution context.
		562	\item \texttt{\_\_attribute\_\_((returns\_twice))} Functions marked in this way may return multiple times from
		563	the point of view of the current sequential execution flow. This is the case of routines which save the current
		564	execution context (first the function returns as usual, but the function can eventually ``return again''
		565	when the context is being restored).
		566	\end{itemize}
		567
		568	The use of these extensions has pointed out to several hard-to-debug bugs on the IA-64 platform.
		569
		570	\medskip
		571
		572	The automated continuous integration building system is currently work-in-progress. Thus, we do not
		573	test all possible configurations of HelenOS with each changeset yet. Currently only
		574	a representative set of 14 configurations (at least one for each supported platform) is tested by hand
		575	by the developers before committing any non-trivial changeset.
		576
		577	From occasional tests of other configurations by hand and the frequency of compilation, linkage and
		578	even run-time problems we conclude that the automated testing of all feasible configurations will
		579	be very beneficial.
		580
		581	\subsection{Regression and Unit Tests}
		582	As already stated in the previous section, the continuous integration building system has not been finished
		583	yet. Therefore regression and unit tests are executed occasionally by hand, which is time consuming
		584	and prone to human omissions. An automated approach is definitively going to be very helpful.
		585
		586	\subsection{Instrumentation}
		587	We are in the process of implementing our own code instrumentation framework which is motivated mainly
		588	by the need to support MMU-less architectures in the future. But this framework might be also very helpful
		589	in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar existing tool
		590	because of the estimated complexity to adopt it for the usage in HelenOS.
		591
		592	HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since that
		593	time the code base has not been analyzed by Coverity.
		594
		595	\subsection{Static Analyzer}
		596	The integration of various static analyzers into the HelenOS continuous integration process is underway.
		597	For the initial evaluation we have used \emph{Stanse}~\cite{stanse} and \emph{Clang Analyzer}~\cite{clanganalyzer}.
		598	Both of them showed to be moderately helpful to point out instances of unreachable dead code, use of language
		599	constructs which have ambiguous semantics in C and one case of possible NULL-pointer dereference.
		600
		601	The open framework of Clang seems to be very promising for implementing domain-specific checks (and at
		602	the same time it is also a very promising compiler framework). Our mid-term goal is to incorporate some of the features
		603	of Stanse and VCC (see Section \ref{staticverifier2}) into Clang Analyzer.
		604
		605	\subsection{Static Verifier}
		606	\label{staticverifier2}
		607	We have started to extend the source code of HelenOS kernel with annotations understood
		608	by \emph{Frama-C}~\cite{framac} and \emph{VCC}~\cite{vcc}. Initially we have targeted simple kernel data structures
		609	(doubly-linked circular lists) and basic locking operations. Currently we are evaluating the initial experiences
		610	and we are trying to identify the most suitable methodology, but we expect quite promising results.
		611
		612	As the VCC is based on the Microsoft C++ Compiler, which does not support many GCC extensions, we have been
		613	faced with the requirement to preprocess the source code to be syntactically accepted by VCC. This turned out
		614	to be feasible.
		615
		616	\subsection{Model Checker}
		617	We are in the process of creating models of kernel wait queues (basic HelenOS kernel synchronization
		618	primitive) and futexes (basic user space thread synchronization primitive) using \emph{Promela} and
		619	verify several formal properties (deadlock freedom, fairness) in \emph{Spin}~\cite{spin}. As both the Promela language
		620	and the Spin model checker are mature and commonly used tools for such purposes, we expect no major problems
		621	with this approach. Because both synchronization primitives are relatively complex, utilizing a model checker
		622	should provide a much more trustworthy proof of the required properties than ``paper and pencil''.
		623
		624	\subsection{Behavior Checker}
		625	We have created an architecture description in ADL language similar to SOFA ADL~\cite{adl} for the majority of the
		626	HelenOS components and created the Behavior Protocol specification of these components. The architecture
		627	is a snapshot of the dynamic architecture just after a successful bootstrap.
		628
		629	Both the architecture and behavior description is readily available as part of the source code repository
		630	of HelenOS, including tools which can preprocess the Behavior Protocols according to the architecture description
		631	and create an output suitable for \emph{bp2promela} checker~\cite{bp}.
		632
		633	As the resulting complexity of the description is larger than any of the previously published case studies
		634	on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune the bp2promela
		635	checker to process the input.
		636
		637	\medskip
		638
		639	We have not started to generate code from the architecture description so far because of time constrains. However,
		640	we believe that this is a very promising way to go.
		641
		642	\subsection{Behavior Description Generator}
		643	We have not tackled the issue of behavior description generation yet, although tools such as \emph{Procasor}~\cite{procasor} are readily
		644	available. We do not consider it our priority at this time.
		645
		646	\section{Conclusion}
		647	\label{conclusion}
		648	In this paper we propose a complex combination of various formal verification methods and tools
		649	to achieve the verification of an entire general-purpose operating system. The proposed approach generally follows
		650	a bottom-to-top path, starting with low-level checks using state-of-the-art verifying C language compilers,
		651	following by static analyzers and static verifiers. In specific contexts regression and unit tests,
		652	code instrumentation and model checkers for the sake of verification of key algorithms
		653	are utilized.
		654
		655	Thanks to the properties of state-of-the-art microkernel multiserver operating
		656	system design (e.g. software component encapsulation and composition), we demonstrate
		657	that it should be feasible to successfully verify larger and more complex operating systems than
		658	in the case of monolithic designs. We use formal component architecture and behavior
		659	description for the closure. The final goal -- a formally verified operating system -- is the
		660	emerging property of the combination of the various methods.
		661
		662	\medskip
		663
		664	The contribution of this paper is the shift of focus from attempts to use a single ``silver-bullet''
		665	method for formal verification of an operating system to a combination of multiple methods supported
		666	by a suitable architecture of the operating system.
		667
		668	We also argue that the approach should be suitable for the mainstream
		669	general-purpose operating systems in the near future, because they are gradually
		670	incorporating more microkernel-based features and fine-grained software components.
		671
		672	Although the evaluation of the proposed approach on HelenOS is still work-in-progress, the
		673	preliminary results and estimates are promising.
		674
		675	\medskip
		676
		677	\noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of the HelenOS
		678	project. Without their vision and dedication the work on this paper would be almost impossible
		679
		680	This work was partially supported by the Ministry of Education of the Czech Republic
		681	(grant MSM\-0021620838).
		682
		683	\begin{thebibliography}{99}
		684	\bibitem{billion}Bessey A., Block K., Chelf B., Chou A., Fulton B., Hallem S., Henri-Gros C., Kamsky A., McPeak S., Engler D.: \emph{A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World}, Communications of the ACM, Vol. 53 No. 2, pp 66-75, 2010
		685	\bibitem{bp}Kofron J.: \emph{Checking Software Component Behavior Using Behavior Protocols and Spin}, in proceedings of Applied Computing 2007, Seoul, Korea, pp. 1513-1517, 2007
		686	\bibitem{gcc}GCC, the GNU Compiler Collection, \texttt{http://gcc.gnu.org/}
		687	\bibitem{clang}Clang: a C language family frontend for LLVM, \texttt{http://clang.llvm.org/}
		688	\bibitem{clanganalyzer}Clang Static Analyzer, \texttt{http://clang-analyzer.llvm.org/}
		689	\bibitem{cocome}Bulej L., Bures T., Coupaye T., Decky M., Jezek P., Parizek P., Plasil F., Poch T., Rivierre N., Sery O., Tuma P.: \emph{CoCoME in Fractal}, chapter in The Common Component Modeling Example: Comparing Software Component Models, Springer-Verlag, LNCS 5153, 2008
		690	\bibitem{coverity}Coverity, \texttt{http://scan.coverity.com/}
		691	\bibitem{procasor}Mencl V.: \emph{Deriving Behavior Specifications from Textual Use Cases}, in Proceedings of Workshop on Intelligent Technologies for Software Engineering (WITSE04, Sep 21, 2004, part of ASE 2004), Linz, Austria, pp. 331 - 341, Oesterreichische Computer Gesellschaft, 2004
		692	\bibitem{framac}Frama-C, \texttt{http://frama-c.cea.fr/}
		693	\bibitem{c}Lawlis P. K: \emph{Guidelines for Choosing a Computer Language: Support for the Visionary Organization}, Ada Information Clearinghouse, \texttt{http://archive.adaic.com/docs/reports/lawlis/k.htm}, 1998
		694	\bibitem{helenos}HelenOS Project, \texttt{http://www.helenos.org/}
		695	\bibitem{adl}Oplustil T.: \emph{Inheritance in Architecture Description Languages}, reviewed section of Proceedings of the Week of Doctoral Students 2003 conference (WDS 2003), Charles University, Prague, Czech Republic, 2003, pp. 124 - 131, 2003
		696	\bibitem{marketshare}Operating System Market Share, \texttt{http://marketshare.hitslink.com/report\-.aspx?qprid=8\&qptimeframe=M\&qpsp=132}, retrieved on 2010-02-28
		697	\bibitem{seL4}Klein G., Elphinstone K., Heiser G., Andronick J., Cock D., Derrin P., Elkaduwe D., Engelhardt K., Kolanski R., Norrish M., Sewell T., Tuch H., Winwood S.: \emph{seL4: Formal verification of an OS kernel}, Proceedings of the 22nd ACM Symposium on Operating Systems Principles, Big Sky, MT, USA, 2009
		698	\bibitem{spin}Spin, \texttt{http://spinroot.com/}
		699	\bibitem{stanse}Stanse: Static Analysis Framework for C code, \texttt{http://stanse.fi.muni.cz/}
		700	\bibitem{valgrind}Valgrind, \texttt{http://valgrind.org/}
		701	\bibitem{vcc}VCC, \texttt{http://vcc.codeplex.com/}
		702	\end{thebibliography}
		703	\end{document}

Subversion Repositories HelenOS-doc

(root)/papers/isarcs10/isarcs10.tex – Rev 179