Subversion Repositories HelenOS-doc

Rev

Rev 183 | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1. \documentclass{llncs}
  2. \usepackage{graphicx}
  3.  
  4. \title{A Road to a Formally Verified General-Purpose Operating System}
  5. \author{Martin D\v{e}ck\'{y}}
  6. \institute{Department of Distributed and Dependable Systems\\
  7.           Faculty of Mathematics and Physics, Charles University\\
  8.           Malostransk\'{e} n\'{a}m\v{e}st\'{i} 25, Prague 1, 118~00, Czech Republic\\
  9.           \email{martin.decky@d3s.mff.cuni.cz}}
  10.  
  11. \begin{document}
  12.     \maketitle
  13.    
  14.     \begin{abstract}
  15.         Methods of formal description and verification represent a viable way for achieving
  16.         fundamentally bug-free software. However, in reality only a small subset of the existing operating
  17.         systems were ever formally verified, despite the fact that an operating system is a critical part
  18.         of almost any other software system. This paper points out several key design choices which
  19.         should make the formal verification of an operating system easier and presents a work-in-progress
  20.         and initial experiences with formal verification of HelenOS, a state-of-the-art microkernel-based
  21.         operating system, which, however, was not designed specifically with formal verification in mind,
  22.         as this is mostly prohibitive due to time and budget constrains.
  23.        
  24.         The contribution of this paper is the shift of focus from attempts to use a single ``silver-bullet''
  25.         formal verification method which would be able to verify everything to a combination of multiple
  26.         formalisms and techniques to successfully cover various aspects of the operating system.
  27.         A reliable and dependable operating system is the emerging property of the combination,
  28.         thanks to the suitable architecture of the operating system.
  29.     \end{abstract}
  30.    
  31.     \section{Introduction}
  32.         \label{introduction}
  33.         Operating systems (OSes for short) have a somewhat special position among all software.
  34.         OSes are usually designed to run on bare hardware. This means that they do not require
  35.         any special assumptions on the environment except the assumptions on the properties and
  36.         behavior of hardware. In many cases it is perfectly valid to consider the hardware
  37.         as \emph{idealized hardware} (zero mathematical probability of failure, perfect compliance
  38.         with the specifications, etc.). This means that it is solely the OS that defines the
  39.         environment for other software.
  40.        
  41.         OSes represent the lowest software layer and provide services to essentially all other
  42.         software. Considering the principle of recursion, the properties of an OS form the
  43.         assumptions for the upper layers of software. Thus the dependability of end-user and
  44.         enterprise software systems is always limited by the dependability of the OS.
  45.        
  46.         Finally, OSes are non-trivial software on their own. The way they are generally designed
  47.         and programmed (spanning both the kernel and user mode, manipulating execution contexts
  48.         and concurrency, handling critical hardware-related operations) represent significant
  49.         and interesting challenges for software analysis.
  50.        
  51.         \medskip
  52.        
  53.         These are probably the most important reasons that led to several research initiatives
  54.         in the recent years which target the creation of a formally verified OSes from scratch
  55.         (e.g. \cite{seL4}). Methods of formal description and verification provide fundamentally
  56.         better guarantees of desirable properties than non-exhaustive engineering methods such
  57.         as testing.
  58.        
  59.         However, 98~\%\footnote{98~\% of client computers connected to the Internet as of January
  60.         2010~\cite{marketshare}.} of the market share of general-purpose OSes is taken
  61.         by Windows, Mac~OS~X and Linux. These systems were clearly not designed with formal
  62.         verification in mind from the very beginning. The situation on the embedded, real-time
  63.         and special-purpose OSes market is probably different, but it is unlikely that the
  64.         segmentation of the desktop and server OSes market is going to change very rapidly
  65.         in the near future.
  66.        
  67.         The architecture of these major desktop and server OSes is monolithic, which makes
  68.         any attempts to do formal verification on them extremely challenging due to the large
  69.         state space. Fortunately we can observe that aspects of several novel approaches from
  70.         the OS research from the late 1980s and early 1990s (microkernel design, user space
  71.         file system and device drivers, etc.) are slowly emerging in these originally purely
  72.         monolithic implementations.
  73.        
  74.         \medskip
  75.        
  76.         In this paper we show how specific design choices can markedly improve the feasibility
  77.         of verification of an OS, even if the particular OS was not designed
  78.         specifically with formal verification in mind. These design choices can be gradually
  79.         introduced (and in fact some of them have already been introduced) to mainstream
  80.         general-purpose OSes.
  81.        
  82.         Our approach is not based on using a single ``silver-bullet'' formalism, methodology or
  83.         tool, but on combining various engineering, semi-formal and formal approaches.
  84.         While the lesser formal approaches give lesser guarantees, they can complement
  85.         the formal approaches on their boundaries and increase the coverage of the set of
  86.         all hypothetical interesting properties of the system.
  87.        
  88.         We also demonstrate work-in-progress case study of an general-purpose research OS
  89.         that was not created specifically with formal verification in mind from the very
  90.         beginning, but that was designed according to state-of-the-art OS principles.
  91.        
  92.         \medskip
  93.        
  94.         \noindent\textbf{Structure of the Paper.} In Section \ref{design} we introduce
  95.         the design choices and our case study in more detail. In Section \ref{analysis} we
  96.         discuss our approach of the combination of methods and tools. In Section \ref{evaluation}
  97.         we present a preliminary evaluation of our efforts and propose the imminent next steps
  98.         to take. Finally, in Section \ref{conclusion} we present the conclusion of the paper.
  99.    
  100.     \section{Operating Systems Design}
  101.         \label{design}
  102.         Two very common schemes of OS design are \emph{monolithic design} and \emph{microkernel design}.
  103.         Without going into much detail of any specific implementation, we can define the monolithic design as
  104.         a preference to put numerous aspects of the core OS functionality into the kernel, while microkernel
  105.         design is a preference to keep the kernel small, with just a minimal set of features.
  106.        
  107.         The features which are missing from the kernel in the microkernel design are implemented
  108.         in user space, usually by means of libraries, servers (e.g. processes/tasks) and/or software components.
  109.        
  110.         \subsection{HelenOS}
  111.             \label{helenos}
  112.             \emph{HelenOS} is a general-purpose research OS which is being developed at Charles
  113.             University in Prague. The source code is available under the BSD open source license and can be
  114.             freely downloaded from the project web site~\cite{helenos}. The authors of the code base are
  115.             both from the academia and from the open source community (several contributors are employed
  116.             as Solaris kernel developers and many are freelance IT professionals).
  117.            
  118.             HelenOS uses a preemptive priority-feedback scheduler, it supports SMP hardware and it is
  119.             designed to be highly portable. Currently it runs on 7 distinct hardware architectures, including the
  120.             most common IA-32, x86-64 (AMD64), IA-64, SPARC~v9 and PowerPC. It also runs on ARMv7 and MIPS,
  121.             but currently only in simulators and not on physical hardware.
  122.            
  123.             Although HelenOS is still far from being an everyday replacement for Linux or Windows due to the lack
  124.             of end-user applications (whose development is extremely time-consuming, but unfortunately of
  125.             no scientific value), the essential foundations such as file system support and TCP/IP networking
  126.             are already in place.
  127.            
  128.             HelenOS does not currently target embedded devices (although the ARMv7 port can be very easily
  129.             modified to run on various embedded boards) and does not implement real-time features.
  130.             Many development projects such as task snapshoting and migration, support for MMU-less
  131.             platforms and performance monitoring are currently underway.
  132.            
  133.             \medskip
  134.            
  135.             HelenOS can be briefly described as microkernel multiserver OS. However, the actual design
  136.             guiding principles of the HelenOS are more elaborate:
  137.            
  138.             \begin{description}
  139.                 \item[Microkernel principle] Every functionality of the OS that does not
  140.                       have to be necessary implemented in the kernel should be implemented in user space. This
  141.                       implies that subsystems such as the file system, device drivers (except those which are
  142.                       essential for the basic kernel functionality), naming and trading services, networking,
  143.                       human interface and similar features should be implemented in user space.
  144.                 \item[Full-fledged principle] Features which need to be placed in kernel should
  145.                       be implemented by full-fledged algorithms and data structures. In contrast
  146.                       to several other microkernel OSes, where the authors have deliberately chosen
  147.                       the most simplistic approach (static memory allocation, na\"{\i}ve algorithms, simple data
  148.                       structures), HelenOS microkernel tries to use the most advanced and suitable means available.
  149.                       It contains features such as AVL and B+ trees, hashing tables, SLAB memory allocator, multiple
  150.                       in-kernel synchronization primitives, fine-grained locking and so on.
  151.                 \item[Multiserver principle] Subsystems in user space should be decomposed with the smallest
  152.                       reasonable granularity. Each unit of decomposition should be encapsulated in a separate task.
  153.                       The tasks represent software components with isolated address spaces. From the design point of
  154.                       view the kernel can be seen as a separate software component, too.
  155.                 \item[Split of mechanism and policy] The kernel should only provide low-level me\-chanisms,
  156.                       while the high-level policies which are built upon these mechanisms should be defined in
  157.                       user space. This also implies that the policies should be easily replaceable while keeping
  158.                       the low-level mechanisms intact.
  159.                 \item[Encapsulation principle] The communication between the tasks/components should be
  160.                       implemented only via a set of well-defined interfaces. In the user-to-user case the preferred
  161.                       communication mechanism is HelenOS IPC, which provides reasonable mix of abstraction and
  162.                       performance (RPC-like primitives combined with implicit memory sharing for large data
  163.                       transfers). In case of synchronous user-to-kernel communication the usual syscalls are used.
  164.                       HelenOS IPC is used again for asynchronous kernel-to-user communication.
  165.                 \item[Portability principle] The design and implementation should always maintain a high
  166.                       level of platform neutrality and portability. Platform-specific code in the kernel, core
  167.                       libraries and tasks implementing device drivers should be clearly separated from the
  168.                       generic code (either by component decomposition or at least by internal compile-time APIs).
  169.             \end{description}
  170.            
  171.             In Section \ref{analysis} we argue that several of these design principles significantly improve
  172.             the feasibility of formal verification of the entire system. On the other hand, other design principles
  173.             induce new interesting challenges for formal description and verification.
  174.            
  175.             The run-time architecture of HelenOS is inherently dynamic. The bindings between the components are
  176.             not created at compile-time, but during bootstrap and can be modified to a large degree also during
  177.             normal operation mode of the system (via human interaction and external events).
  178.            
  179.             The design of the ubiquitous HelenOS IPC mechanism and the associated threading model present
  180.             the possibility to significantly reduce the size of the state space which needs to be explored
  181.             by formal verification tools, but at the same time it is quite hard to express these
  182.             constrains in many formalisms. The IPC is inherently asynchronous with constant message buffers
  183.             in the kernel and dynamic buffers in user space. It uses the notions of uni-directional bindings,
  184.             mandatory pairing of requests and replies, binding establishment and abolishment handshakes,
  185.             memory sharing and fast message forwarding.
  186.            
  187.             For easier management of the asynchronous messages and the possibility to process multiple
  188.             messages from different peers without the usual kernel threading overhead, the core user space
  189.             library manages the execution flow by so-called \emph{fibrils}. A fibril is a user-space-managed
  190.             thread with cooperative scheduling. A different fibril is scheduled every time the current fibril
  191.             is about to be blocked while sending out IPC requests (because the kernel buffers of the addressee
  192.             are full) or while waiting on an IPC reply. This allows different execution flows within the
  193.             same thread to process multiple requests and replies. To safeguard proper sequencing of IPC
  194.             messages and provide synchronization, special fibril-aware synchronization primitives
  195.             (mutexes, condition variables, etc.) are available.
  196.            
  197.             Because of the cooperative nature of fibrils, they might cause severe performance under-utilization
  198.             in SMP configurations and system-wide bottlenecks. As multicore processors are more and more
  199.             common nowadays, that would be a substantial design flaw. Therefore the fibrils can be also freely
  200.             (and to some degree even automatically) combined with the usual kernel threads, which provide
  201.             preemptive scheduling and true parallelism on SMP machines. Needless to say, this combination is
  202.             also a grand challenge for the formal reasoning.
  203.            
  204.             \medskip
  205.            
  206.             Incidentally, the \emph{full-fledged principle} causes that the size of the HelenOS microkernel is
  207.             considerably larger compared to other ``scrupulous'' microkernel implementations. The average
  208.             footprint of the kernel on IA-32 ranges from 569~KiB when all logging messages, asserts, symbol
  209.             resolution and the debugging kernel console are compiled in, down to 198~KiB for a non-debugging
  210.             production build. But we do not believe that the raw size of the microkernel is a relevant quality
  211.             criterion per se, without taking the actual feature set into account.
  212.            
  213.             \medskip
  214.            
  215.             To sum up, the choice of HelenOS as our case study is based on the fact that it was not designed
  216.             in advance with formal verification in mind (some of the design principles might be beneficial,
  217.             but others might be disadvantageous), but the design of HelenOS is also non-trivial and not obsolete.
  218.            
  219.         \subsection{The C Programming Language}
  220.             A large majority of OSes is coded in the C programming language (HelenOS is no exception
  221.             to this). The choice of C in the case of kernel is usually well-motivated, since the C language was designed
  222.             specifically for implementing system software~\cite{c}: It is reasonably low-level in the sense that it allows
  223.             to access the memory and other hardware resources with similar effectiveness as from assembler;
  224.             It also requires almost no run-time support and it exports many features of the von Neumann hardware
  225.             architecture to the programmer in a very straightforward, but still relatively portable way.
  226.            
  227.             However, what is the biggest advantage of C in terms of run-time performance is also the biggest weakness
  228.             for formal reasoning. The permissive memory access model of C, the lack of any reference safety
  229.             enforcement, the weak type system and generally little semantic information in the code -- all these
  230.             properties do not allow to make many general assumptions about the code.
  231.            
  232.             Programming languages which target controlled environments such as Java and C\(\sharp\) are
  233.             generally easier for formal reasoning because they provide a well-known set of primitives
  234.             and language constructs for object ownership, threading and synchronization. Many non-imperative
  235.             programming languages can be even considered to be a form of ``executable specification'' and thus
  236.             very suitable for formal reasoning. In C, almost everything is left to the programmer who
  237.             is free to set the rules.
  238.            
  239.             \medskip
  240.            
  241.             The reasons for frequent use of C in the user space of many established OSes (and HelenOS) is
  242.             probably much more questionable. In the case of HelenOS, except for the core libraries and tasks
  243.             (such as device drivers), C might be easily replaced by any high-level and perhaps even
  244.             non-imperative programming language. The reasons for using C in this context are mostly historical.
  245.            
  246.             However, as we have stated in Section \ref{introduction}, the way general-purpose OSes
  247.             are implemented changes only slowly and therefore any propositions which require radical modification
  248.             of the existing code base before committing to the formal verification are not realistic.
  249.        
  250.     \section{Analysis}
  251.         \label{analysis}
  252.        
  253.         \begin{figure}[t]
  254.             \begin{center}
  255.                 \resizebox*{125mm}{!}{\includegraphics{diag}}
  256.                 \caption{Overview of the formalisms and tools proposed.}
  257.                 \label{fig:diag}
  258.             \end{center}
  259.         \end{figure}
  260.        
  261.         In this section, we analyze the properties we would like to check in a general-purpose
  262.         OS. Each set of properties usually requires a specific verification method, tool or toolchain.
  263.        
  264.         Our approach will be mostly bottom-up, or, in other words, from the lower levels of abstraction
  265.         to the higher levels of abstraction. If the verification fails on a lower level, it usually
  266.         does not make much sense to continue with the higher levels of abstraction until the issues
  267.         are tackled. A structured overview of the formalisms, methods and tools can be seen on
  268.         Figure \ref{fig:diag}.
  269.        
  270.         \medskip
  271.        
  272.         Some of the proposed methods cannot be called ``formal methods'' in the rigorous understanding
  273.         of the term. However, even methods which are based on semi-formal reasoning and non-exhaustive
  274.         testing provide some limited guarantees in their specific context. A valued property
  275.         of the formal methods is to preserve these limited guarantees even on the higher levels
  276.         of abstraction, thus allowing the semi-formal methods to complement the big picture
  277.         where the formal methods do not provide any feasible verification so far. This increases
  278.         the coverage of the set of all hypothetical interesting properties of the system (although
  279.         it is probably impossible to formally define this entire set).
  280.        
  281.         \medskip
  282.        
  283.         Please note that the titles of the following sections do not follow any particular established
  284.         taxonomy. We have simply chosen the names to be intuitively descriptive.
  285.        
  286.         \subsection{C Language Compiler and Continuous Integration Tool}
  287.             \label{clang}
  288.             The initial levels of abstraction do not go far from the C source code and common engineering
  289.             approaches. First, we would certainly like to know whether our code base is compliant with the
  290.             programming language specification and passes only the basic semantic checks (proper number
  291.             and types of arguments passed to functions, etc.). It is perhaps not very surprising that
  292.             these decisions can be made by any plain C compiler. However, with the current implementation
  293.             of HelenOS even this is not quite trivial.
  294.            
  295.             Besides the requirement to support 7 hardware platforms, the system's com\-pile-time configuration
  296.             can be also affected by approximately 65 configuration options, most of which are booleans,
  297.             the rest are enumerated types.
  298.            
  299.             These configuration options are bound by logical propositions in conjunctive or disjunctive
  300.             normal forms and while some options are freely configurable, the value of others gets inferred
  301.             by the build system of HelenOS. The overall number of distinct configurations in which
  302.             HelenOS can be compiled is at least one order of magnitude larger than the plain number
  303.             of supported hardware platforms.
  304.            
  305.             Various configuration options affect conditional compilation and linking. The programmers
  306.             are used to make sure that the source code compiles and links fine with respect to the
  307.             most common and obvious configurations, but the unforeseen interaction of the less common
  308.             configuration options might cause linking or even compilation errors.
  309.            
  310.             \medskip
  311.            
  312.             A straightforward solution is to generate all distinct configurations, starting from the
  313.             open variables and inferring the others. This can be part of the continuous integration
  314.             process which would try to compile and link the sources in all distinct configurations.
  315.            
  316.             If we want to be really pedantic, we should also make sure that we run all higher
  317.             level verification methods on all configurations generated by this step. That would certainly
  318.             require to multiply the time required by the verification methods at least by the number
  319.             of the distinct configurations. Constraining the set of configurations to just the most
  320.             representative ones is perhaps a reasonable compromise to make the verification realistic.
  321.        
  322.         \subsection{Regression and Unit Tests}
  323.             Running regression and unit tests which are part of HelenOS code base in the continuous
  324.             integration process is fairly easy. The only complication lies in the technicalities:
  325.             We need to setup an automated network of physical machines and simulators which can run the
  326.             appropriate compilation outputs for the specific platforms. We need to be able to reboot
  327.             them remotely and distribute the boot images to them. And last but not least, we need to be
  328.             able to gather the results from them.
  329.            
  330.             Testing is always non-exhaustive, thus the guarantees provided by tests are strictly limited
  331.             to the use cases and contexts which are being explicitly tested. However, it is arguably
  332.             easier to express many common use cases in the primary programming language than in some
  333.             different formalism. As we follow the bottom-up approach, filtering out the most obvious
  334.             bugs by testing can save us a lot of valuable time which would be otherwise waisted by
  335.             a futile verification by more formal (and more time-consuming) methods.
  336.        
  337.         \subsection{Instrumentation}
  338.             Instrumentation tools for detecting memory leaks, performance bottlenecks and soft-deadlocks
  339.             are also not usually considered to be formal verification tools (since it is hard to define
  340.             exact formal properties which are being verified by the non-exhaustive nature of these tools).
  341.             They are also rarely utilized on regular basis as part of the continuous integration process.
  342.             But again, it might be helpful to just mention them in the big picture.
  343.            
  344.             If some regression or unit tests fail, they sometimes do not give sufficient information to
  345.             tell immediately what is the root cause of the issue. In that case running the faulting tests
  346.             on manually or automatically instrumented executable code might provide more data and point
  347.             more directly to the actual bug.
  348.        
  349.         \subsection{Verifying C Language Compiler}
  350.             C language compilers are traditionally also not considered to be formal verification tools.
  351.             Many people just say that C compilers are good at generating executable code, but do not
  352.             care much about the semantics of the source code (on the other hand, formal verification
  353.             tools usually do not generate any executable code at all). However, with recent development
  354.             in the compiler domain, the old paradigms are shifting.
  355.            
  356.             As the optimization passes and general maturity of the compilers improve over time,
  357.             the compilers try to extract and use more and more semantic information from the source code.
  358.             The C language is quite poor on explicit semantic information, but the verifying compilers
  359.             try to rely on vendor-specific language extensions and on the fact that some semantic information
  360.             can be added to the source code without changing the resulting executable code.
  361.            
  362.             The checks done by the verifying compilers cannot result in fatal errors in the usual cases (they
  363.             are just warnings). Firstly, the compilers still need to successfully compile a well-formed C source
  364.             code compliant to some older standard (e.g. C89) even when it is not up with the current quality
  365.             expectations. Old legacy source code should still pass the compilation as it did decades ago.
  366.            
  367.             Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation.
  368.             They are mostly realized as abstract syntax tree transformations much in the line with the supporting
  369.             routines of the compilation process (data and control flow graph analysis, dead code elimination,
  370.             register allocation, etc.) and the evaluation function is basically the matching of antipatterns
  371.             of common programming bugs.
  372.            
  373.             The checks are usually conservative. The verifying compilers identify code constructs which are suspicious,
  374.             which might arise out of programmer's bad intuition and so on, but even these code snippets cannot be
  375.             tagged as definitive bugs (since the programmer can be simply in a position where he/she really wants to
  376.             do something very strange, but nevertheless legitimate). It is upon the programmer
  377.             to examine the root cause of the compiler warning, tell whether it is really a bug or just a false
  378.             positive and fix the issue by either amending some additional semantic information (e.g. adding an
  379.             explicit typecast or a vendor-specific language extension) or rewriting the code.
  380.            
  381.             Although this level of abstraction is coarse-grained and conservative, it can be called semi-formal,
  382.             since the properties which are being verified can be actually defined quite exactly and they
  383.             are reasonably general. They do not deal with single traces of methods, runs and use
  384.             cases like tests, but they deal with all possible contexts in which the code can run.
  385.        
  386.         \subsection{Static Analyzer}
  387.             Static analyzers try to go deeper than verifying compilers. Besides detecting common antipatterns of
  388.             bugs, they also use techniques such as abstract interpretation to check for more complex properties.
  389.            
  390.             Most commercial static analyzers come with a predefined set of properties which cannot be easily changed.
  391.             They are coupled with the commonly used semantics of the environment and generate domain-specific models
  392.             of the software based not only on the syntax of the source code, but also based on the assumptions derived
  393.             from the memory access model, allocation and deallocation rules, tracking of references and tracking of
  394.             concurrency locks.
  395.            
  396.             The biggest advantage of static analyzers is that they can be easily included in the development or
  397.             continuous integration process as an additional automated step, very similar to the verifying compilers.
  398.             No manual definition of code-specific properties is needed and false positives can be relatively easily
  399.             eliminated by amending some explicit additional information to the source code within the boundaries
  400.             of the programming language.
  401.            
  402.             The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion},
  403.             but static analyzers are still relatively limited by the kind of bugs they are designed to detect.
  404.             They are usually good at pointing out common issues with security implications (specific types of
  405.             buffer and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten
  406.             deallocation of resources and release of locks, etc.). Unfortunately, many static analyzers
  407.             only analyze a single-threaded control flow and are thus unable to detect concurrency issues
  408.             such as deadlocks.
  409.        
  410.         \subsection{Static Verifier}
  411.             There is one key difference between a static analyzer and a static verifier: Static verifiers
  412.             allow the user to specify one's own properties, in terms of preconditions, postconditions and
  413.             invariants in the code. Many static verifiers also target true multithreaded usage patterns
  414.             and have the capability to check proper locking order, hand-over-hand locking and even liveliness.
  415.            
  416.             In the context of an OS kernel and core libraries two kinds of properties are common:
  417.            
  418.             \begin{description}
  419.                 \item[Consistency constrains] These properties define the correct way how data is supposed
  420.                       to be manipulated by some related set of subroutines. Checking for these
  421.                       properties ensures that data structures and internal states will not get corrupt due
  422.                       to bugs in the functions and methods which are designed to manipulate them.
  423.                 \item[Interface enforcements] These properties define the correct semantics by which
  424.                       a set of subroutines should be used by the rest of the code. Checking for these properties
  425.                       ensures that some API is always used by the rest of the code in a specified way
  426.                       and all reported exceptions are handled by the client code.
  427.             \end{description}
  428.        
  429.         \subsection{Model Checker}
  430.             \label{modelcheck}
  431.             On the first sight it does not seem to be reasonable to consider general model checkers as
  432.             relevant independent tools for formal verification of an existing OS. While many different
  433.             tools use model checkers as their backends, verifying a complete model of the entire
  434.             system created by hand seems to be infeasible both in the sense of time required for the model
  435.             creation and resources required by the checker to finish the exhaustive traversal of the model's
  436.             state space.
  437.            
  438.             Nevertheless, model checkers on their own can still serve a good job verifying abstract
  439.             properties of key algorithms without dealing with the technical details of the implementation.
  440.             Various properties of synchronization algorithms, data structures and communication protocols
  441.             can be verified in the most generic conditions by model checkers, answering the
  442.             question whether they are designed properly in theory.
  443.            
  444.             If the implementation of these algorithms and protocols do not behave correctly, we can be sure
  445.             that the root cause is in the non-compliance between the design and implementation and not a
  446.             fundamental flaw of the design itself.
  447.        
  448.         \subsection{Architecture and Behavior Checker}
  449.             All previously mentioned verification methods were targeting internal properties of the OS
  450.             components. If we are moving to a higher-level abstraction in order to specify correct
  451.             interaction of the encapsulated components in terms of interface compatibility and communication,
  452.             we can utilize \emph{Behavior Protocols}~\cite{bp} or some other formalism describing correct
  453.             interaction between software components.
  454.            
  455.             To gain the knowledge about the architecture of the whole OS in terms of software
  456.             component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}
  457.             as the specification of the architecture of the system. This language has the possibility to capture
  458.             interface types (with method signatures), primitive components (in terms of provided and required
  459.             interfaces), composite components (an architectural compositions of primitive components) and the
  460.             bindings between the respective interfaces of the components.
  461.            
  462.             It is extremely important to define the right role of the behavior and architecture description.
  463.             A flawed approach would be to reverse-engineer this description from the source code (either manually
  464.             or via some sophisticated tool) and then verify the compliance between the description and
  465.             the implementation. However, different directions can give more interesting results:
  466.            
  467.             \begin{description}
  468.                 \item[Description as specification] Behavior and architecture description created independently
  469.                       on the source code serves the role of specification. This has the following primary
  470.                       goals of formal verification:
  471.                       \begin{description}
  472.                         \item[Horizontal compliance] Also called \emph{compatibility}. The goal is to check
  473.                              whether the specifications of components that are bound together are semantically
  474.                              compatible. All required interfaces need to be bound to provided interfaces and
  475.                              the communication between the components cannot lead to \emph{no activity} (a deadlock),
  476.                              \emph{bad activity} (a livelock) or other communication and synchronization errors.
  477.                         \item[Vertical compliance] Also called \emph{substituability}. The goal is to check whe\-ther
  478.                              it is possible to replace a set of primitive components that are nested inside a composite
  479.                              component by the composite component itself. In other words, this compliance can answer the
  480.                              question whether the architecture description of the system is sound with respect to the hierarchical
  481.                              composition of the components.
  482.                         \item[Specification and implementation compliance] Using various means
  483.                              of generating artificial environments for an isolated component a checker is able to
  484.                              partially answer the question whether the implementation of the component is an instantiation
  485.                              of the component specification.
  486.                       \end{description}
  487.                 \item[Description as abstraction] Generating the behavior and architecture description from the
  488.                       source code by means of abstract interpretation can serve the purpose of verifying various
  489.                       properties of the implementation such as invariants, preconditions and postconditions.
  490.                       This is similar to static verification, but on the level of component interfaces.
  491.             \end{description}
  492.            
  493.             Unfortunately, most of the behavior and architecture formalisms are static, which is not quite suitable
  494.             for the dynamic nature of most OSes. This limitation can be circumvented by considering a relevant
  495.             snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to
  496.             a statically defined architecture.
  497.            
  498.             \medskip
  499.            
  500.             The key features of software systems with respect to behavior and architecture checkers are the granularity
  501.             of the individual primitive components, the level of isolation and complexity of the communication mechanism
  502.             between them. Large monolithic OSes created in semantic-poor C present a severe challenge because the
  503.             isolation of the individual components is vague and the communication between them is basically unlimited
  504.             (function calls, shared resources, etc.).
  505.            
  506.             OSes with explicit component architecture and fine-grained components (such as microkernel multiserver
  507.             systems) make the feasibility of the verification much easier, since the degrees of freedom (and thus
  508.             the state space) is limited.
  509.            
  510.             Horizontal and vertical compliance checking can be done exhaustively. This is a fundamental property
  511.             which allows the reasoning about the dependability of the entire component-based OS.
  512.             Assuming that the lower-level verification methods (described in Sections \ref{clang} to \ref{modelcheck})
  513.             prove some specific properties of the primitive components, we can be sure that the composition of
  514.             the primitive components into composite components and ultimately into the whole OS
  515.             does not break these properties.
  516.            
  517.             The feasibility of many lower-level verification methods from Sections \ref{clang} to \ref{modelcheck}
  518.             depends largely on the size and complexity of the code under verification. If the entire OS
  519.             is decomposed into primitive components with a fine granularity, it is more likely that the
  520.             individual primitive components can be verified against a large number of properties. Thanks to the
  521.             recursive component composition we can then be sure that these properties also hold for the entire system.
  522.            
  523.             \medskip
  524.            
  525.             The compliance between the behavior specification and the actual behavior of the implementation is, unfortunately,
  526.             the missing link in the chain. This compliance cannot be easily verified in an exhaustive manner. If there is
  527.             a discrepancy between the specified and the actual behavior of the components, we cannot conclude anything about
  528.             the properties holding in the entire system.
  529.            
  530.             However, there is one way how to improve the situation: \emph{code generation}. If we generate implementation
  531.             from the specification, the compliance between them is axiomatic. If we are able to generate enough
  532.             code from the specification to run into the hand-written ``business code'' where we check for
  533.             the lower-level properties, the conclusions about the component composition are going to hold.
  534.        
  535.         \subsection{Behavior Description Generator}
  536.             To conclude our path towards higher abstractions we can utilize tools that can
  537.             generate the behavior descriptions from \emph{textual use cases} written in a domain-constrained English.
  538.             These generated artifacts can be then compared (e.g. via vertical compliance checking) with the formal
  539.             specification. Although the comparison might not provide clean-cut results, it can still be
  540.             helpful to confront the more-or-less informal user expectations on the system with the exact formal description.
  541.        
  542.         \subsection{Summary}
  543.             \label{missing}
  544.             So far, we have proposed a compact combination of engineering, semi-formal and formal methods which
  545.             start at the level of C programming language, offer the possibility to check for the presence of various
  546.             common antipatterns, check for generic algorithm-related properties, consistency constrains, interface
  547.             enforcements and conclude with a framework to make these properties hold even in the case of a large
  548.             OS composed from many components of compliant behavior.
  549.            
  550.             We do not claim that there are no missing pieces in the big picture or that the semi-formal verifications
  551.             might provide more guarantees in this setup. However, state-of-the-art OS design guidelines can push
  552.             further the boundaries of practical feasibility of the presented methods. The limited guarantees
  553.             of the low-level methods hold even in the composition and the high-level formal methods do not have
  554.             to deal with unlimited degrees of freedom of the primitive component implementation.
  555.            
  556.             \medskip
  557.            
  558.             We have spoken only about the functional properties. In general, we cannot apply the same formalisms
  559.             and methods on extra-functional properties (e.g. timing properties, performance properties, etc.).
  560.             And although it probably does make a good sense to reason about component composition for the extra-functi\-onal
  561.             properties, the exact relation might be different compared to the functional properties.
  562.            
  563.             The extra-functional properties need to be tackled by our future work.
  564.        
  565.     \section{Evaluation}
  566.         \label{evaluation}
  567.         This section copies the structure of the previous Section \ref{analysis} and adds HelenOS-specific
  568.         evaluation of the the proposed formalisms and tools. As this is still largely a work-in-progress,
  569.         in many cases just the initial observations can be made.
  570.        
  571.         The choice of the specific methods, tools and formalisms in this initial phase is mostly motivated
  572.         by their perceived commonality and author's claims about fitness for the given purpose. An important
  573.         part of further evaluation would certainly be to compare multiple particular approaches, tools
  574.         and formalisms to find the optimal combination.
  575.        
  576.         \subsection{Verifying C Language Compiler and Continuous Integration Tool}
  577.             The primary C compiler used by HelenOS is \emph{GNU GCC 4.4.3} (all platforms)~\cite{gcc} and \emph{Clang 2.6.0}
  578.             (IA-32)~\cite{clang}. We have taken some effort to support also \emph{ICC} and \emph{Sun Studio} C compilers,
  579.             but the compatibility with these compilers in not guaranteed.
  580.            
  581.             The whole code base is compiled with the \texttt{-Wall} and \texttt{-Wextra} compilation options. These options turn on
  582.             most of the verification checks of the compilers. The compilers trip on common bug antipatterns such
  583.             as implicit typecasting of pointer types, comparison of signed and unsigned integer values (danger
  584.             of unchecked overflows), the usage of uninitialized variables, the presence of unused local variables,
  585.             NULL-pointer dereferencing (determined by conservative local control flow analysis), functions
  586.             with non-void return typed that do not return any value and so on. We treat all compilation warnings
  587.             as fatal errors (\texttt{-Werror}), thus the code base must pass without any warnings.
  588.            
  589.             We also turn on several more specific and strict checks. These checks helped to discover several
  590.             latent bugs in the source code:
  591.            
  592.             \begin{description}
  593.                 \item[\texttt{-Wfloat-equal}] Check for exact equality comparison between floating point values. The
  594.                       usage of equal comparator on floats is usually misguided due to the inherent computational errors
  595.                       of floats.
  596.                 \item[\texttt{-Wcast-align}] Check for code which casts pointers to a type with a stricter alignment
  597.                       requirement. On many RISC-based platforms this can cause run-time unaligned access exceptions.
  598.                 \item[\texttt{-Wconversion}] Check for code where the implicit type conversion (e.g. from float to integer,
  599.                       between signed and unsigned integers or between integers with different number of bits) can
  600.                       cause the actual value to change.
  601.             \end{description}
  602.            
  603.             To enhance the semantic information in the source code, we use GCC-specific language extensions to annotate
  604.             some particular kernel and core library routines:
  605.            
  606.             \begin{description}
  607.                 \item[\texttt{\_\_attribute\_\_((noreturn))}] Functions marked in this way never finish from the point of view
  608.                       of the current sequential execution flow. The most common case are the routines which restore previously saved
  609.                       execution context.
  610.                 \item[\texttt{\_\_attribute\_\_((returns\_twice))}] Functions marked in this way may return multiple times from
  611.                       the point of view of the current sequential execution flow. This is the case of routines which save the current
  612.                       execution context (first the function returns as usual, but the function can eventually ``return again''
  613.                       when the context is being restored).
  614.             \end{description}
  615.            
  616.             The use of these extensions has pointed out to several hard-to-debug bugs on the IA-64 platform.
  617.            
  618.             \medskip
  619.            
  620.             The automated continuous integration building system is currently work-in-progress. Thus, we do not
  621.             test all possible configurations of HelenOS with each changeset yet. Currently only
  622.             a representative set of 14 configurations (at least one for each supported platform) is tested by hand
  623.             by the developers before committing any non-trivial changeset.
  624.            
  625.             From occasional tests of other configurations by hand and the frequency of compilation, linkage and
  626.             even run-time problems we conclude that the automated testing of all feasible configurations will
  627.             be very beneficial.
  628.            
  629.         \subsection{Regression and Unit Tests}
  630.             As already stated in the previous section, the continuous integration building system has not been finished
  631.             yet. Therefore regression and unit tests are executed occasionally by hand, which is time consuming
  632.             and prone to human omissions. An automated approach is definitively going to be very helpful.
  633.        
  634.         \subsection{Instrumentation}
  635.             We are in the process of implementing our own code instrumentation framework which is motivated mainly
  636.             by the need to support MMU-less architectures in the future. But this framework might be also very helpful
  637.             in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar
  638.             existing tool because of the estimated complexity to adopt it for the usage in HelenOS.
  639.        
  640.         \subsection{Static Analyzer}
  641.             The integration of various static analyzers into the HelenOS continuous integration process is underway.
  642.             For the initial evaluation we have used \emph{Stanse}~\cite{stanse} and \emph{Clang Analyzer}~\cite{clanganalyzer}.
  643.             Both of them showed to be moderately helpful to point out instances of unreachable dead code, use of language
  644.             constructs which have ambiguous semantics in C and one case of possible NULL-pointer dereference.
  645.            
  646.             The open framework of Clang seems to be very promising for implementing domain-specific checks (and at
  647.             the same time it is also a very promising compiler framework). Our mid-term goal is to incorporate some of the features
  648.             of Stanse and VCC (see Section \ref{staticverifier2}) into Clang Analyzer.
  649.            
  650.             HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since
  651.             that time the code base has not been analyzed by Coverity.
  652.        
  653.         \subsection{Static Verifier}
  654.             \label{staticverifier2}
  655.             We have started to extend the source code of HelenOS kernel with annotations understood
  656.             by \emph{Frama-C}~\cite{framac} and \emph{VCC}~\cite{vcc}. Initially we have targeted simple kernel data structures
  657.             (doubly-linked circular lists) and basic locking operations. Currently we are evaluating the initial experiences
  658.             and we are trying to identify the most suitable methodology, but we expect quite promising results.
  659.            
  660.             As the VCC is based on the Microsoft C++ Compiler, which does not support many GCC extensions, we have been
  661.             faced with the requirement to preprocess the source code to be syntactically accepted by VCC. This turned out
  662.             to be feasible.
  663.        
  664.         \subsection{Model Checker}
  665.             We are in the process of creating models of kernel wait queues (basic HelenOS kernel synchronization
  666.             primitive) and futexes (basic user space thread synchronization primitive) using \emph{Promela} and
  667.             verify several formal properties (deadlock freedom, fairness) in \emph{Spin}~\cite{spin}. As both the Promela language
  668.             and the Spin model checker are mature and commonly used tools for such purposes, we expect no major problems
  669.             with this approach. Because both synchronization primitives are relatively complex, utilizing a model checker
  670.             should provide a much more trustworthy proof of the required properties than ``paper and pencil''.
  671.            
  672.             The initial choice of Spin is motivated by its suitability to model threads, their interaction and verify
  673.             properties related to race conditions and deadlocks (which is the common sources of OS-related bugs). Other
  674.             modeling formalisms might be more suitable for different goals.
  675.        
  676.         \subsection{Architecture and Behavior Checker}
  677.             We have created an architecture description in ADL language derived from \emph{SOFA ADL}~\cite{adl} for the
  678.             majority of the HelenOS components and created the Behavior Protocol specification of these components.
  679.             Both descriptions were created independently, not by reverse-engineering the existing source code.
  680.             The architecture is a snapshot of the dynamic architecture just after a successful bootstrap of HelenOS.
  681.            
  682.             Both the architecture and behavior description is readily available as part of the source code repository
  683.             of HelenOS, including tools which can preprocess the Behavior Protocols according to the architecture description
  684.             and create an output suitable for \emph{bp2promela} checker~\cite{bp}.
  685.            
  686.             As the resulting complexity of the description is larger than any of the previously published case studies
  687.             on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune
  688.             the bp2promela checker to process the input.
  689.            
  690.             \medskip
  691.            
  692.             We have not started to generate code from the architecture description so far because of time constrains.
  693.             However, we believe that this is a very promising way to go and provide reasonable guarantees about
  694.             the compliance between the specification and the implementation.
  695.        
  696.         \subsection{Behavior Description Generator}
  697.             We have not tackled the issue of behavior description generation yet, although tools such as
  698.             \emph{Procasor}~\cite{procasor} are readily available. We do not consider it our priority at this time.
  699.    
  700.     \section{Conclusion}
  701.         \label{conclusion}
  702.         In this paper we propose a complex combination of various verification methods and tools
  703.         to achieve the verification of an entire general-purpose operating system. The proposed
  704.         approach generally follows a bottom-up route, starting with low-level checks using state-of-the-art
  705.         verifying C language compilers, following by static analyzers and static verifiers.
  706.         In specific contexts regression and unit tests, code instrumentation and model checkers
  707.         for the sake of verification of key algorithms are utilized.
  708.        
  709.         Thanks to the properties of state-of-the-art microkernel multiserver operating
  710.         system design (e.g. software component encapsulation and composition, fine-grained isolated
  711.         components), we demonstrate that it should be feasible to successfully verify larger and more
  712.         complex operating systems than in the case of monolithic designs. We use formal component
  713.         architecture and behavior description for the closure. The final goal -- a formally verified
  714.         operating system -- is the emerging property of the combination of the various methods.
  715.        
  716.         \medskip
  717.        
  718.         The contribution of this paper is the shift of focus from attempts to use a single
  719.         ``silver-bullet'' method for formal verification of an operating system to a combination
  720.         of multiple methods supported by a suitable architecture of the operating system.
  721.         The main benefit is a much larger coverage of the set of all hypothetical properties.
  722.        
  723.         We also argue that the approach should be suitable for the mainstream
  724.         general-purpose operating systems in the near future, because they are gradually
  725.         incorporating more microkernel-based features and fine-grained software components.
  726.        
  727.         Although the evaluation of the proposed approach on HelenOS is still work-in-progress, the
  728.         preliminary results and estimates are promising.
  729.        
  730.         \medskip
  731.        
  732.         \noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of
  733.         the HelenOS project. Without their vision and dedication the work on this paper would be almost impossible
  734.        
  735.         This work was partially supported by the Ministry of Education of the Czech Republic
  736.         (grant MSM\-0021620838).
  737.    
  738.     \begin{thebibliography}{99}
  739.         \bibitem{billion}Bessey A., Block K., Chelf B., Chou A., Fulton B., Hallem S., Henri-Gros C., Kamsky A., McPeak S., Engler D.: \emph{A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World}, Communications of the ACM, Vol. 53 No. 2, pp 66-75, 2010
  740.         \bibitem{bp}Kofron J.: \emph{Checking Software Component Behavior Using Behavior Protocols and Spin}, in proceedings of Applied Computing 2007, Seoul, Korea, pp. 1513-1517, 2007
  741.         \bibitem{gcc}GCC, the GNU Compiler Collection, \texttt{http://gcc.gnu.org/}
  742.         \bibitem{clang}Clang: a C language family frontend for LLVM, \texttt{http://clang.llvm.org/}
  743.         \bibitem{clanganalyzer}Clang Static Analyzer, \texttt{http://clang-analyzer.llvm.org/}
  744.         \bibitem{cocome}Bulej L., Bures T., Coupaye T., Decky M., Jezek P., Parizek P., Plasil F., Poch T., Rivierre N., Sery O., Tuma P.: \emph{CoCoME in Fractal}, chapter in The Common Component Modeling Example: Comparing Software Component Models, Springer-Verlag, LNCS 5153, 2008
  745.         \bibitem{coverity}Coverity, \texttt{http://scan.coverity.com/}
  746.         \bibitem{procasor}Mencl V.: \emph{Deriving Behavior Specifications from Textual Use Cases}, in Proceedings of Workshop on Intelligent Technologies for Software Engineering (WITSE04, Sep 21, 2004, part of ASE 2004), Linz, Austria, pp. 331 - 341, Oesterreichische Computer Gesellschaft, 2004
  747.         \bibitem{framac}Frama-C, \texttt{http://frama-c.cea.fr/}
  748.         \bibitem{c}Lawlis P. K: \emph{Guidelines for Choosing a Computer Language: Support for the Visionary Organization}, Ada Information Clearinghouse, \texttt{http://archive.adaic.com/docs/reports/lawlis/k.htm}, 1998
  749.         \bibitem{helenos}HelenOS Project, \texttt{http://www.helenos.org/}
  750.         \bibitem{adl}Oplustil T.: \emph{Inheritance in Architecture Description Languages}, reviewed section of Proceedings of the Week of Doctoral Students 2003 conference (WDS 2003), Charles University, Prague, Czech Republic, 2003, pp. 124 - 131, 2003
  751.         \bibitem{marketshare}Operating System Market Share, \texttt{http://marketshare.hitslink.com/report\-.aspx?qprid=8\&qptimeframe=M\&qpsp=132}, retrieved on 2010-02-28
  752.         \bibitem{seL4}Klein G., Elphinstone K., Heiser G., Andronick J., Cock D., Derrin P., Elkaduwe D., Engelhardt K., Kolanski R., Norrish M., Sewell T., Tuch H., Winwood S.: \emph{seL4: Formal verification of an OS kernel}, Proceedings of the 22nd ACM Symposium on Operating Systems Principles, Big Sky, MT, USA, 2009
  753.         \bibitem{spin}Spin, \texttt{http://spinroot.com/}
  754.         \bibitem{stanse}Stanse: Static Analysis Framework for C code, \texttt{http://stanse.fi.muni.cz/}
  755.         \bibitem{valgrind}Valgrind, \texttt{http://valgrind.org/}
  756.         \bibitem{vcc}VCC, \texttt{http://vcc.codeplex.com/}
  757.     \end{thebibliography}
  758. \end{document}
  759.