Subversion Repositories HelenOS-doc

Compare Revisions

Ignore whitespace Rev 180 → Rev 181

/papers/isarcs10/isarcs10.tex
286,18 → 286,20
\subsection{C Language Compiler and Continuous Integration Tool}
\label{clang}
The initial levels of abstraction do not go far from the C source code. First, we would certainly like to
know whether our code base is compliant with the programming language specification and passes
only the basic semantic checks (proper number and types of arguments passed to functions, etc.).
It is perhaps not very surprising that these decisions can be make by any plain C compiler.
However, with the current implementation of HelenOS even this is not quite trivial.
The initial levels of abstraction do not go far from the C source code and common engineering
approaches. First, we would certainly like to know whether our code base is compliant with the
programming language specification and passes only the basic semantic checks (proper number
and types of arguments passed to functions, etc.). It is perhaps not very surprising that
these decisions can be make by any plain C compiler. However, with the current implementation
of HelenOS even this is not quite trivial.
Besides the requirement to support 7 hardware platforms, the system's compile-time configuration
can be also affected by approximately 65 configuration options, most of which are booleans, the rest
are enumerated types.
can be also affected by approximately 65 configuration options, most of which are booleans,
the rest are enumerated types.
These configuration options are bound by logical propositions in conjunctive or disjunctive
normal forms and while some options are freely configurable, the value of others gets inferred
by the build system of HelenOS. Thus, the overall number of distinct configurations in which
by the build system of HelenOS. The overall number of distinct configurations in which
HelenOS can be compiled is at least one order of magnitude larger than the plain number
of supported hardware platforms.
312,26 → 314,33
open variables and inferring the others. This can be part of the continuous integration
process which would try to compile and link the sources in all distinct configurations.
If we want to be really pedantic, we should also make sure that we run all the relevant higher
If we want to be really pedantic, we should also make sure that we run all higher
level verification methods on all configurations generated by this step. That would certainly
require to multiply the time required by the verification methods at least by the number
of the distinct configurations. Constraining the set of configurations to just the most
representative ones is perhaps a reasonable compromise.
representative ones is perhaps a reasonable compromise to make the verification realistic.
\subsection{Regression and Unit Tests}
Although some people would argue whether testing is a formal verification method, we still include
it into the big picture. Running regression and unit tests which are part of HelenOS code base
in the continuous integration process is fairly easy. The only complication lies in the technicalities:
Running regression and unit tests which are part of HelenOS code base in the continuous
integration process is fairly easy. The only complication lies in the technicalities:
We need to setup an automated network of physical machines and simulators which can run the
appropriate compilation outputs for the specific platforms. We need to be able to reboot
them remotely and distribute the boot images to them. And last but not least, we need to be
able to gather the results from them.
Testing is always non-exhaustive, thus the guarantees provided by tests are strictly limited
to the use cases and contexts which are being explicitly tested. However, it is arguably
easier to express many common use cases in the primary programming language than in some
different formalism. As we follow the bottom-up approach, filtering out the most obvious
bugs by testing can save us a lot of valuable time which would be otherwise waisted by
a futile verification by more formal (and more time-consuming) methods.
\subsection{Instrumentation}
Instrumentation tools for detecting memory leaks, performance bottlenecks and soft-deadlocks
are also not usually considered to be formal verification tools. They are also rarely utilized
on regular basis as part of the continuous integration process. But again, it might be helpful
to just mention them in the context of regression and unit tests.
are also not usually considered to be formal verification tools (since it is hard to define
exact formal properties which are being verified by the non-exhaustive nature of these tools).
They are also rarely utilized on regular basis as part of the continuous integration process.
But again, it might be helpful to just mention them in the big picture.
If some regression or unit tests fail, they sometimes do not give sufficient information to
tell immediately what is the root cause of the issue. In that case running the faulting tests
339,10 → 348,11
more directly to the actual bug.
\subsection{Verifying C Language Compiler}
C language compilers are traditionally not considered to be formal verification tools. Many people
just say that C compilers are good at generating executable code, but do not care much about the semantics
of the source code (on the other hand, formal verification tools usually do not generate any executable code
at all). However, with recent development in the compiler domain, the old paradigms are shifting.
C language compilers are traditionally also not considered to be formal verification tools.
Many people just say that C compilers are good at generating executable code, but do not
care much about the semantics of the source code (on the other hand, formal verification
tools usually do not generate any executable code at all). However, with recent development
in the compiler domain, the old paradigms are shifting.
As the optimization passes and general maturity of the compilers improve over time,
the compilers try to extract and use more and more semantic information from the source code.
355,11 → 365,11
code compliant to some older standard (e.g. C89) even when it is not up with the current quality
expectations. Old legacy source code should still pass the compilation as it did decades ago.
Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation
or exhaustive traversal of a model state space. They are mostly realized as abstract syntax tree
transformations much in the line with the supporting routines of the compilation process (data
and control flow graph analysis, dead code elimination, register allocation, etc.) and the evaluation
function is basically the matching of antipatterns of common programming bugs.
Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation.
They are mostly realized as abstract syntax tree transformations much in the line with the supporting
routines of the compilation process (data and control flow graph analysis, dead code elimination,
register allocation, etc.) and the evaluation function is basically the matching of antipatterns
of common programming bugs.
The checks are usually conservative. The verifying compilers identify code constructs which are suspicious,
which might arise out of programmer's bad intuition and so on, but even these code snippets cannot be
368,6 → 378,11
to examine the root cause of the compiler warning, tell whether it is really a bug or just a false
positive and fix the issue by either amending some additional semantic information (e.g. adding an
explicit typecast or a vendor-specific language extension) or rewriting the code.
Although this level of abstraction is coarse-grained and conservative, it can be called semi-formal,
since the properties which are being verified can be actually defined quite exactly and they
are reasonably general. They do not deal with single traces of methods, runs and use
cases like tests, but they deal with all possible contexts in which the code can run.
\subsection{Static Analyzer}
Static analyzers try to go deeper than verifying compilers. Besides detecting common antipatterns of
381,13 → 396,14
The biggest advantage of static analyzers is that they can be easily included in the development or
continuous integration process as an additional automated step, very similar to the verifying compilers.
No definition of properties is needed and false positives can be relatively easily eliminated by amending
some explicit additional information to the source code within the boundaries of the programming language.
No manual definition of code-specific properties is needed and false positives can be relatively easily
eliminated by amending some explicit additional information to the source code within the boundaries
of the programming language.
The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion}, but static
analyzers are still relatively limited by the kind of bugs they are designed to detect. They
are usually good at pointing out common issues with security implications (specific types of buffer
and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten
The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion},
but static analyzers are still relatively limited by the kind of bugs they are designed to detect.
They are usually good at pointing out common issues with security implications (specific types of
buffer and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten
deallocation of resources and release of locks, etc.). Unfortunately, many static analyzers
only analyze a single-threaded control flow and are thus unable to detect concurrency issues
such as deadlocks.
401,16 → 417,16
In the context of an operating system kernel and core libraries two kinds of properties are
common:
\begin{itemize}
\item \emph{Consistency constrains:} These properties define the correct way how data is supposed
\begin{description}
\item[Consistency constrains] These properties define the correct way how data is supposed
to be manipulated by some related set of subroutines. Checking for these
properties ensures that data structures and internal states will not get corrupt due
to bugs in the functions and methods which are designed to manipulate them.
\item \emph{Interface enforcement:} These properties define the correct semantics by which
\item[Interface enforcements] These properties define the correct semantics by which
a set of subroutines should be used by the rest of the code. Checking for these properties
ensures that some API is always used by the rest of the code in a specified way
and all possible error states are detected and reported.
\end{itemize}
and all reported exceptions are handled by the client code.
\end{description}
\subsection{Model Checker}
\label{modelcheck}
431,56 → 447,80
that the root cause is in the non-compliance between the design and implementation and not a
fundamental flaw of the design itself.
\subsection{Behavior Checker}
All previously mentioned verification methods were targeting internal properties of the operating system
components. If we are moving to a higher-level abstraction in order to specify correct interaction of the encapsulated
components in terms of interface compatibility and communication, we can utilize \emph{Behavior Protocols}~\cite{bp}.
\subsection{Architecture and Behavior Checker}
All previously mentioned verification methods were targeting internal properties of the OS
components. If we are moving to a higher-level abstraction in order to specify correct
interaction of the encapsulated components in terms of interface compatibility and communication,
we can utilize \emph{Behavior Protocols}~\cite{bp} or some other formalism describing correct
interaction between software components.
To gain the knowledge about the architecture of the whole operating system in terms of software
component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}.
This language has the possibility to capture interface types (with method signatures), primitive
components (in terms of provided and required interfaces), composite components (an architectural
compositions of primitive components) and the bindings between the respective interfaces of the
components.
To gain the knowledge about the architecture of the whole OS in terms of software
component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}
as the specification of the architecture of the system. This language has the possibility to capture
interface types (with method signatures), primitive components (in terms of provided and required
interfaces), composite components (an architectural compositions of primitive components) and the
bindings between the respective interfaces of the components.
Unfortunately, the description is usually static, which is not quite suitable for the dynamic
nature of HelenOS and other operating systems. This limitation can be circumvented by considering a relevant
It is extremely important to define the right role of the behavior and architecture description.
A flawed approach would be to reverse-engineer this description from the source code (either manually
or via some sophisticated tool) and then verify the compliance between the desciption and
the implementation. However, different directions can give more interesting results:
\begin{description}
\item[Description as specification] Behavior and architecture description created independently
on the source code serves the role of specification. This has the following primary
goals of formal verification:
\begin{description}
\item[Horizontal compliance] Also called \emph{compatibility}. The goal is to check
whether the specifications of components that are bound together are semantically
compatible. All required interfaces need to be bound to provided interfaces and
the communication between the components cannot lead to \emph{no activity} (a deadlock),
\emph{bad activity} (a livelock) or other communication and synchronization errors.
\item[Vertical compliance] Also called \emph{substituability}. The goal is to check whether
it is possible to replace a set of primitive components that are nested inside a composite
component by the composite component itself. In other words, this compliance can answer the
questions whether the architecture description of the system is sound with respect to the hierarchical
composition of the components.
\item[Compliance between the specification and the implementation] Using various means
for generating artificial environments for an isolated component the checker is able to
partially answer the question whether the implementation of the component is an instantiation
of the component specification.
\end{description}
\item[Description as abstraction] Generating the behavior and architecture description from the
source code by means of abstract interpretation can serve the purpose of verifying various
properties of the implementation such as invariants, preconditions and postconditions.
This is similar to static verification, but on the level of component interfaces.
\end{description}
Unfortunately, most of the behavior and architecture formalisms are static, which is not quite suitable
for the dynamic of most OSes. This limitation can be circumvented by considering a relevant
snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to
a statically defined architecture.
Behavior Protocol checkers can target three main goals:
\medskip
\begin{itemize}
\item \emph{Horizontal compliance:} Also called \emph{compatibility}. The goal is to check
whether the specifications of components that are bound together are semantically
compatible. All required interfaces need to be bound to provided interfaces and
the communication between the components cannot lead to \emph{no activity} (a deadlock)
or a \emph{bad activity} (a livelock).
\item \emph{Vertical compliance:} Also called \emph{substituability}. The goal is to check whether
it is possible to replace a set of primitive components that are nested inside a composite
component by the composite component itself. In other words, this compliance can answer the
questions whether the architecture description of the system is sound with respect to the hierarchical
composition of the components.
\item \emph{Compliance between the specification and the implementation:} Using various means
for generating artificial environments for an isolated component the checker is able to
partially answer the question whether the implementation of the component is an instantiation
of the component specification.
\end{itemize}
The key features of software systems with respect to behavior and architecture checkers are the granularity
of the individual primitive components, the level of isolation and complexity of the communication mechanism
between them. Large monolithic OSes created in sematic-poor C present a severe challenge because the
isolation of the individual components is vague and the communication between them is basically unlimited
(function calls, shared resources, etc.).
OSes with explicit component architecture and fine-grained components (such as microkernel multiserver
systems) make the feasibility of the verification much easier, since the degrees of freedom (and thus
the state space) is limited.
Horizontal and vertical compliance checking can be done exhaustively. This is a fundamental property
which allows the reasoning about the dependability of the entire component-based operating system.
which allows the reasoning about the dependability of the entire component-based OS.
Assuming that the lower-level verification methods (described in Sections \ref{clang} to \ref{modelcheck})
prove some specific properties of the primitive components, we can be sure that the composition of
the primitive components into composite components and ultimately into the whole operating system
the primitive components into composite components and ultimately into the whole OS
does not break these properties.
The feasibility of many lower-level verification methods from Sections \ref{clang} to \ref{modelcheck}
depends largely on the size and complexity of the code under verification. If the entire operating
system is decomposed into primitive components with a reasonable granularity, it is more likely that the
depends largely on the size and complexity of the code under verification. If the entire OS
is decomposed into primitive components with a fine granularity, it is more likely that the
individual primitive components can be verified against a large number of properties. Thanks to the
recursive component composition we can then be sure that these properties also hold for the entire system.
In monolithic operating systems without a clear decomposition we cannot do this inference and we
need to verify the lower-level properties over a much larger code base (e.g. the whole monolithic kernel).
\medskip
503,14 → 543,17
\subsection{Summary}
\label{missing}
So far, we have proposed a compact combination of formal methods which start at the level of C programming
language, offer the possibility to check for the presence of various common antipatterns, to check for generic
algorithm-related properties, consistency constrains, interface enforcements and conclude with a framework
to make these properties hold even in the case of a large operating system composed from many
components of compliant behavior.
So far, we have proposed a compact combination of engineering, semi-formal and formal methods which
start at the level of C programming language, offer the possibility to check for the presence of various
common antipatterns, check for generic algorithm-related properties, consistency constrains, interface
enforcements and conclude with a framework to make these properties hold even in the case of a large
OS composed from many components of compliant behavior.
We are also able to fill in some of the missing pieces by other software engineering approaches
such as regression and unit testing and instrumentation.
We do not claim that there are no missing pieces in the big picture or that the semi-formal verifications
might provide more guarantees in this setup. However, state-of-the-art OS design guidelines can push
further the boundaries of practical feasibility of the presented methods. The limited guarantees
of the low-level methods hold even in the composition and the high-level formal methods do not have
to deal with unlimited degrees of freedom of the primitive component implementation.
\medskip
527,6 → 570,11
evaluation of the the proposed formalisms and tools. As this is still largely a work-in-progress,
in many cases just the initial observations can be made.
The choice of the specific methods, tools and formalisms in this initial phase is mostly motivated
by their perceived commonality and author's claims about fitness for the given purpose. An important
part of further evaluation would certainly be to compare multiple particular approaches, tools
and formalisms to find the optimal combination.
\subsection{Verifying C Language Compiler and Continuous Integration Tool}
The primary C compiler used by HelenOS is \emph{GNU GCC 4.4.3} (all platforms)~\cite{gcc} and \emph{Clang 2.6.0}
(IA-32)~\cite{clang}. We have taken some effort to support also \emph{ICC} and \emph{Sun Studio} C compilers,
543,29 → 591,29
We also turn on several more specific and strict checks. These checks helped to discover several
latent bugs in the source code:
\begin{itemize}
\item \texttt{-Wfloat-equal} Check for exact equality comparison between floating point values. The
\begin{description}
\item[\texttt{-Wfloat-equal}] Check for exact equality comparison between floating point values. The
usage of equal comparator on floats is usually misguided due to the inherent computational errors
of floats.
\item \texttt{-Wcast-align} Check for code which casts pointers to a type with a stricter alignment
\item[\texttt{-Wcast-align}] Check for code which casts pointers to a type with a stricter alignment
requirement. On many RISC-based platforms this can cause run-time unaligned access exceptions.
\item \texttt{-Wconversion} Check for code where the implicit type conversion (e.g. from float to integer,
\item[\texttt{-Wconversion}] Check for code where the implicit type conversion (e.g. from float to integer,
between signed and unsigned integers or between integers with different number of bits) can
cause the actual value to change.
\end{itemize}
\end{description}
To enhance the semantic information in the source code, we use GCC-specific language extensions to annotate
some particular kernel and core library routines:
\begin{itemize}
\item \texttt{\_\_attribute\_\_((noreturn))} Functions marked in this way never finish from the point of view
\begin{description}
\item[\texttt{\_\_attribute\_\_((noreturn))}] Functions marked in this way never finish from the point of view
of the current sequential execution flow. The most common case are the routines which restore previously saved
execution context.
\item \texttt{\_\_attribute\_\_((returns\_twice))} Functions marked in this way may return multiple times from
\item[\texttt{\_\_attribute\_\_((returns\_twice))}] Functions marked in this way may return multiple times from
the point of view of the current sequential execution flow. This is the case of routines which save the current
execution context (first the function returns as usual, but the function can eventually ``return again''
when the context is being restored).
\end{itemize}
\end{description}
The use of these extensions has pointed out to several hard-to-debug bugs on the IA-64 platform.
588,11 → 636,11
\subsection{Instrumentation}
We are in the process of implementing our own code instrumentation framework which is motivated mainly
by the need to support MMU-less architectures in the future. But this framework might be also very helpful
in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar existing tool
because of the estimated complexity to adopt it for the usage in HelenOS.
in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar
existing tool because of the estimated complexity to adopt it for the usage in HelenOS.
HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since that
time the code base has not been analyzed by Coverity.
HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since
that time the code base has not been analyzed by Coverity.
\subsection{Static Analyzer}
The integration of various static analyzers into the HelenOS continuous integration process is underway.
622,11 → 670,16
and the Spin model checker are mature and commonly used tools for such purposes, we expect no major problems
with this approach. Because both synchronization primitives are relatively complex, utilizing a model checker
should provide a much more trustworthy proof of the required properties than ``paper and pencil''.
The initial choice of Spin is motivated by its suitability to model threads, their interaction and verify
properties related to race conditions and deadlocks (which is the common sources of OS-related bugs). Other
modeling formalisms might be more suitable for different goals.
\subsection{Behavior Checker}
We have created an architecture description in ADL language similar to SOFA ADL~\cite{adl} for the majority of the
HelenOS components and created the Behavior Protocol specification of these components. The architecture
is a snapshot of the dynamic architecture just after a successful bootstrap.
\subsection{Architecture and Behavior Checker}
We have created an architecture description in ADL language derived from \emph{SOFA ADL}~\cite{adl} for the
majority of the HelenOS components and created the Behavior Protocol specification of these components.
Both descriptions were created independently, not by reverse-engineering the existing source code.
The architecture is a snapshot of the dynamic architecture just after a successful bootstrap of HelenOS.
Both the architecture and behavior description is readily available as part of the source code repository
of HelenOS, including tools which can preprocess the Behavior Protocols according to the architecture description
633,39 → 686,41
and create an output suitable for \emph{bp2promela} checker~\cite{bp}.
As the resulting complexity of the description is larger than any of the previously published case studies
on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune the bp2promela
checker to process the input.
on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune
the bp2promela checker to process the input.
\medskip
We have not started to generate code from the architecture description so far because of time constrains. However,
we believe that this is a very promising way to go.
We have not started to generate code from the architecture description so far because of time constrains.
However, we believe that this is a very promising way to go and provide reasonable guarantees about
the compliance between the specification and the implementation.
\subsection{Behavior Description Generator}
We have not tackled the issue of behavior description generation yet, although tools such as \emph{Procasor}~\cite{procasor} are readily
available. We do not consider it our priority at this time.
We have not tackled the issue of behavior description generation yet, although tools such as
\emph{Procasor}~\cite{procasor} are readily available. We do not consider it our priority at this time.
\section{Conclusion}
\label{conclusion}
In this paper we propose a complex combination of various formal verification methods and tools
to achieve the verification of an entire general-purpose operating system. The proposed approach generally follows
a bottom-to-top path, starting with low-level checks using state-of-the-art verifying C language compilers,
following by static analyzers and static verifiers. In specific contexts regression and unit tests,
code instrumentation and model checkers for the sake of verification of key algorithms
are utilized.
In this paper we propose a complex combination of various verification methods and tools
to achieve the verification of an entire general-purpose operating system. The proposed
approach generally follows a bottom-up route, starting with low-level checks using state-of-the-art
verifying C language compilers, following by static analyzers and static verifiers.
In specific contexts regression and unit tests, code instrumentation and model checkers
for the sake of verification of key algorithms are utilized.
Thanks to the properties of state-of-the-art microkernel multiserver operating
system design (e.g. software component encapsulation and composition), we demonstrate
that it should be feasible to successfully verify larger and more complex operating systems than
in the case of monolithic designs. We use formal component architecture and behavior
description for the closure. The final goal -- a formally verified operating system -- is the
emerging property of the combination of the various methods.
system design (e.g. software component encapsulation and composition, fine-grained isolated
components), we demonstrate that it should be feasible to successfully verify larger and more
complex operating systems than in the case of monolithic designs. We use formal component
architecture and behavior description for the closure. The final goal -- a formally verified
operating system -- is the emerging property of the combination of the various methods.
\medskip
The contribution of this paper is the shift of focus from attempts to use a single ``silver-bullet''
method for formal verification of an operating system to a combination of multiple methods supported
by a suitable architecture of the operating system.
The contribution of this paper is the shift of focus from attempts to use a single
``silver-bullet'' method for formal verification of an operating system to a combination
of multiple methods supported by a suitable architecture of the operating system.
The main benefit is a much larger coverage of the set of all hypothetical properties.
We also argue that the approach should be suitable for the mainstream
general-purpose operating systems in the near future, because they are gradually
676,8 → 731,8
\medskip
\noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of the HelenOS
project. Without their vision and dedication the work on this paper would be almost impossible
\noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of
the HelenOS project. Without their vision and dedication the work on this paper would be almost impossible
This work was partially supported by the Ministry of Education of the Czech Republic
(grant MSM\-0021620838).