286,18 → 286,20 |
|
\subsection{C Language Compiler and Continuous Integration Tool} |
\label{clang} |
The initial levels of abstraction do not go far from the C source code. First, we would certainly like to |
know whether our code base is compliant with the programming language specification and passes |
only the basic semantic checks (proper number and types of arguments passed to functions, etc.). |
It is perhaps not very surprising that these decisions can be make by any plain C compiler. |
However, with the current implementation of HelenOS even this is not quite trivial. |
The initial levels of abstraction do not go far from the C source code and common engineering |
approaches. First, we would certainly like to know whether our code base is compliant with the |
programming language specification and passes only the basic semantic checks (proper number |
and types of arguments passed to functions, etc.). It is perhaps not very surprising that |
these decisions can be make by any plain C compiler. However, with the current implementation |
of HelenOS even this is not quite trivial. |
|
Besides the requirement to support 7 hardware platforms, the system's compile-time configuration |
can be also affected by approximately 65 configuration options, most of which are booleans, the rest |
are enumerated types. |
can be also affected by approximately 65 configuration options, most of which are booleans, |
the rest are enumerated types. |
|
These configuration options are bound by logical propositions in conjunctive or disjunctive |
normal forms and while some options are freely configurable, the value of others gets inferred |
by the build system of HelenOS. Thus, the overall number of distinct configurations in which |
by the build system of HelenOS. The overall number of distinct configurations in which |
HelenOS can be compiled is at least one order of magnitude larger than the plain number |
of supported hardware platforms. |
|
312,26 → 314,33 |
open variables and inferring the others. This can be part of the continuous integration |
process which would try to compile and link the sources in all distinct configurations. |
|
If we want to be really pedantic, we should also make sure that we run all the relevant higher |
If we want to be really pedantic, we should also make sure that we run all higher |
level verification methods on all configurations generated by this step. That would certainly |
require to multiply the time required by the verification methods at least by the number |
of the distinct configurations. Constraining the set of configurations to just the most |
representative ones is perhaps a reasonable compromise. |
representative ones is perhaps a reasonable compromise to make the verification realistic. |
|
\subsection{Regression and Unit Tests} |
Although some people would argue whether testing is a formal verification method, we still include |
it into the big picture. Running regression and unit tests which are part of HelenOS code base |
in the continuous integration process is fairly easy. The only complication lies in the technicalities: |
Running regression and unit tests which are part of HelenOS code base in the continuous |
integration process is fairly easy. The only complication lies in the technicalities: |
We need to setup an automated network of physical machines and simulators which can run the |
appropriate compilation outputs for the specific platforms. We need to be able to reboot |
them remotely and distribute the boot images to them. And last but not least, we need to be |
able to gather the results from them. |
|
Testing is always non-exhaustive, thus the guarantees provided by tests are strictly limited |
to the use cases and contexts which are being explicitly tested. However, it is arguably |
easier to express many common use cases in the primary programming language than in some |
different formalism. As we follow the bottom-up approach, filtering out the most obvious |
bugs by testing can save us a lot of valuable time which would be otherwise waisted by |
a futile verification by more formal (and more time-consuming) methods. |
|
\subsection{Instrumentation} |
Instrumentation tools for detecting memory leaks, performance bottlenecks and soft-deadlocks |
are also not usually considered to be formal verification tools. They are also rarely utilized |
on regular basis as part of the continuous integration process. But again, it might be helpful |
to just mention them in the context of regression and unit tests. |
are also not usually considered to be formal verification tools (since it is hard to define |
exact formal properties which are being verified by the non-exhaustive nature of these tools). |
They are also rarely utilized on regular basis as part of the continuous integration process. |
But again, it might be helpful to just mention them in the big picture. |
|
If some regression or unit tests fail, they sometimes do not give sufficient information to |
tell immediately what is the root cause of the issue. In that case running the faulting tests |
339,10 → 348,11 |
more directly to the actual bug. |
|
\subsection{Verifying C Language Compiler} |
C language compilers are traditionally not considered to be formal verification tools. Many people |
just say that C compilers are good at generating executable code, but do not care much about the semantics |
of the source code (on the other hand, formal verification tools usually do not generate any executable code |
at all). However, with recent development in the compiler domain, the old paradigms are shifting. |
C language compilers are traditionally also not considered to be formal verification tools. |
Many people just say that C compilers are good at generating executable code, but do not |
care much about the semantics of the source code (on the other hand, formal verification |
tools usually do not generate any executable code at all). However, with recent development |
in the compiler domain, the old paradigms are shifting. |
|
As the optimization passes and general maturity of the compilers improve over time, |
the compilers try to extract and use more and more semantic information from the source code. |
355,11 → 365,11 |
code compliant to some older standard (e.g. C89) even when it is not up with the current quality |
expectations. Old legacy source code should still pass the compilation as it did decades ago. |
|
Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation |
or exhaustive traversal of a model state space. They are mostly realized as abstract syntax tree |
transformations much in the line with the supporting routines of the compilation process (data |
and control flow graph analysis, dead code elimination, register allocation, etc.) and the evaluation |
function is basically the matching of antipatterns of common programming bugs. |
Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation. |
They are mostly realized as abstract syntax tree transformations much in the line with the supporting |
routines of the compilation process (data and control flow graph analysis, dead code elimination, |
register allocation, etc.) and the evaluation function is basically the matching of antipatterns |
of common programming bugs. |
|
The checks are usually conservative. The verifying compilers identify code constructs which are suspicious, |
which might arise out of programmer's bad intuition and so on, but even these code snippets cannot be |
369,6 → 379,11 |
positive and fix the issue by either amending some additional semantic information (e.g. adding an |
explicit typecast or a vendor-specific language extension) or rewriting the code. |
|
Although this level of abstraction is coarse-grained and conservative, it can be called semi-formal, |
since the properties which are being verified can be actually defined quite exactly and they |
are reasonably general. They do not deal with single traces of methods, runs and use |
cases like tests, but they deal with all possible contexts in which the code can run. |
|
\subsection{Static Analyzer} |
Static analyzers try to go deeper than verifying compilers. Besides detecting common antipatterns of |
bugs, they also use techniques such as abstract interpretation to check for more complex properties. |
381,13 → 396,14 |
|
The biggest advantage of static analyzers is that they can be easily included in the development or |
continuous integration process as an additional automated step, very similar to the verifying compilers. |
No definition of properties is needed and false positives can be relatively easily eliminated by amending |
some explicit additional information to the source code within the boundaries of the programming language. |
No manual definition of code-specific properties is needed and false positives can be relatively easily |
eliminated by amending some explicit additional information to the source code within the boundaries |
of the programming language. |
|
The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion}, but static |
analyzers are still relatively limited by the kind of bugs they are designed to detect. They |
are usually good at pointing out common issues with security implications (specific types of buffer |
and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten |
The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion}, |
but static analyzers are still relatively limited by the kind of bugs they are designed to detect. |
They are usually good at pointing out common issues with security implications (specific types of |
buffer and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten |
deallocation of resources and release of locks, etc.). Unfortunately, many static analyzers |
only analyze a single-threaded control flow and are thus unable to detect concurrency issues |
such as deadlocks. |
401,16 → 417,16 |
In the context of an operating system kernel and core libraries two kinds of properties are |
common: |
|
\begin{itemize} |
\item \emph{Consistency constrains:} These properties define the correct way how data is supposed |
\begin{description} |
\item[Consistency constrains] These properties define the correct way how data is supposed |
to be manipulated by some related set of subroutines. Checking for these |
properties ensures that data structures and internal states will not get corrupt due |
to bugs in the functions and methods which are designed to manipulate them. |
\item \emph{Interface enforcement:} These properties define the correct semantics by which |
\item[Interface enforcements] These properties define the correct semantics by which |
a set of subroutines should be used by the rest of the code. Checking for these properties |
ensures that some API is always used by the rest of the code in a specified way |
and all possible error states are detected and reported. |
\end{itemize} |
and all reported exceptions are handled by the client code. |
\end{description} |
|
\subsection{Model Checker} |
\label{modelcheck} |
431,56 → 447,80 |
that the root cause is in the non-compliance between the design and implementation and not a |
fundamental flaw of the design itself. |
|
\subsection{Behavior Checker} |
All previously mentioned verification methods were targeting internal properties of the operating system |
components. If we are moving to a higher-level abstraction in order to specify correct interaction of the encapsulated |
components in terms of interface compatibility and communication, we can utilize \emph{Behavior Protocols}~\cite{bp}. |
\subsection{Architecture and Behavior Checker} |
All previously mentioned verification methods were targeting internal properties of the OS |
components. If we are moving to a higher-level abstraction in order to specify correct |
interaction of the encapsulated components in terms of interface compatibility and communication, |
we can utilize \emph{Behavior Protocols}~\cite{bp} or some other formalism describing correct |
interaction between software components. |
|
To gain the knowledge about the architecture of the whole operating system in terms of software |
component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}. |
This language has the possibility to capture interface types (with method signatures), primitive |
components (in terms of provided and required interfaces), composite components (an architectural |
compositions of primitive components) and the bindings between the respective interfaces of the |
components. |
To gain the knowledge about the architecture of the whole OS in terms of software |
component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl} |
as the specification of the architecture of the system. This language has the possibility to capture |
interface types (with method signatures), primitive components (in terms of provided and required |
interfaces), composite components (an architectural compositions of primitive components) and the |
bindings between the respective interfaces of the components. |
|
Unfortunately, the description is usually static, which is not quite suitable for the dynamic |
nature of HelenOS and other operating systems. This limitation can be circumvented by considering a relevant |
snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to |
a statically defined architecture. |
It is extremely important to define the right role of the behavior and architecture description. |
A flawed approach would be to reverse-engineer this description from the source code (either manually |
or via some sophisticated tool) and then verify the compliance between the desciption and |
the implementation. However, different directions can give more interesting results: |
|
Behavior Protocol checkers can target three main goals: |
|
\begin{itemize} |
\item \emph{Horizontal compliance:} Also called \emph{compatibility}. The goal is to check |
\begin{description} |
\item[Description as specification] Behavior and architecture description created independently |
on the source code serves the role of specification. This has the following primary |
goals of formal verification: |
\begin{description} |
\item[Horizontal compliance] Also called \emph{compatibility}. The goal is to check |
whether the specifications of components that are bound together are semantically |
compatible. All required interfaces need to be bound to provided interfaces and |
the communication between the components cannot lead to \emph{no activity} (a deadlock) |
or a \emph{bad activity} (a livelock). |
\item \emph{Vertical compliance:} Also called \emph{substituability}. The goal is to check whether |
the communication between the components cannot lead to \emph{no activity} (a deadlock), |
\emph{bad activity} (a livelock) or other communication and synchronization errors. |
\item[Vertical compliance] Also called \emph{substituability}. The goal is to check whether |
it is possible to replace a set of primitive components that are nested inside a composite |
component by the composite component itself. In other words, this compliance can answer the |
questions whether the architecture description of the system is sound with respect to the hierarchical |
composition of the components. |
\item \emph{Compliance between the specification and the implementation:} Using various means |
\item[Compliance between the specification and the implementation] Using various means |
for generating artificial environments for an isolated component the checker is able to |
partially answer the question whether the implementation of the component is an instantiation |
of the component specification. |
\end{itemize} |
\end{description} |
\item[Description as abstraction] Generating the behavior and architecture description from the |
source code by means of abstract interpretation can serve the purpose of verifying various |
properties of the implementation such as invariants, preconditions and postconditions. |
This is similar to static verification, but on the level of component interfaces. |
\end{description} |
|
Unfortunately, most of the behavior and architecture formalisms are static, which is not quite suitable |
for the dynamic of most OSes. This limitation can be circumvented by considering a relevant |
snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to |
a statically defined architecture. |
|
\medskip |
|
The key features of software systems with respect to behavior and architecture checkers are the granularity |
of the individual primitive components, the level of isolation and complexity of the communication mechanism |
between them. Large monolithic OSes created in sematic-poor C present a severe challenge because the |
isolation of the individual components is vague and the communication between them is basically unlimited |
(function calls, shared resources, etc.). |
|
OSes with explicit component architecture and fine-grained components (such as microkernel multiserver |
systems) make the feasibility of the verification much easier, since the degrees of freedom (and thus |
the state space) is limited. |
|
Horizontal and vertical compliance checking can be done exhaustively. This is a fundamental property |
which allows the reasoning about the dependability of the entire component-based operating system. |
which allows the reasoning about the dependability of the entire component-based OS. |
Assuming that the lower-level verification methods (described in Sections \ref{clang} to \ref{modelcheck}) |
prove some specific properties of the primitive components, we can be sure that the composition of |
the primitive components into composite components and ultimately into the whole operating system |
the primitive components into composite components and ultimately into the whole OS |
does not break these properties. |
|
The feasibility of many lower-level verification methods from Sections \ref{clang} to \ref{modelcheck} |
depends largely on the size and complexity of the code under verification. If the entire operating |
system is decomposed into primitive components with a reasonable granularity, it is more likely that the |
depends largely on the size and complexity of the code under verification. If the entire OS |
is decomposed into primitive components with a fine granularity, it is more likely that the |
individual primitive components can be verified against a large number of properties. Thanks to the |
recursive component composition we can then be sure that these properties also hold for the entire system. |
In monolithic operating systems without a clear decomposition we cannot do this inference and we |
need to verify the lower-level properties over a much larger code base (e.g. the whole monolithic kernel). |
|
\medskip |
|
503,14 → 543,17 |
|
\subsection{Summary} |
\label{missing} |
So far, we have proposed a compact combination of formal methods which start at the level of C programming |
language, offer the possibility to check for the presence of various common antipatterns, to check for generic |
algorithm-related properties, consistency constrains, interface enforcements and conclude with a framework |
to make these properties hold even in the case of a large operating system composed from many |
components of compliant behavior. |
So far, we have proposed a compact combination of engineering, semi-formal and formal methods which |
start at the level of C programming language, offer the possibility to check for the presence of various |
common antipatterns, check for generic algorithm-related properties, consistency constrains, interface |
enforcements and conclude with a framework to make these properties hold even in the case of a large |
OS composed from many components of compliant behavior. |
|
We are also able to fill in some of the missing pieces by other software engineering approaches |
such as regression and unit testing and instrumentation. |
We do not claim that there are no missing pieces in the big picture or that the semi-formal verifications |
might provide more guarantees in this setup. However, state-of-the-art OS design guidelines can push |
further the boundaries of practical feasibility of the presented methods. The limited guarantees |
of the low-level methods hold even in the composition and the high-level formal methods do not have |
to deal with unlimited degrees of freedom of the primitive component implementation. |
|
\medskip |
|
527,6 → 570,11 |
evaluation of the the proposed formalisms and tools. As this is still largely a work-in-progress, |
in many cases just the initial observations can be made. |
|
The choice of the specific methods, tools and formalisms in this initial phase is mostly motivated |
by their perceived commonality and author's claims about fitness for the given purpose. An important |
part of further evaluation would certainly be to compare multiple particular approaches, tools |
and formalisms to find the optimal combination. |
|
\subsection{Verifying C Language Compiler and Continuous Integration Tool} |
The primary C compiler used by HelenOS is \emph{GNU GCC 4.4.3} (all platforms)~\cite{gcc} and \emph{Clang 2.6.0} |
(IA-32)~\cite{clang}. We have taken some effort to support also \emph{ICC} and \emph{Sun Studio} C compilers, |
543,29 → 591,29 |
We also turn on several more specific and strict checks. These checks helped to discover several |
latent bugs in the source code: |
|
\begin{itemize} |
\item \texttt{-Wfloat-equal} Check for exact equality comparison between floating point values. The |
\begin{description} |
\item[\texttt{-Wfloat-equal}] Check for exact equality comparison between floating point values. The |
usage of equal comparator on floats is usually misguided due to the inherent computational errors |
of floats. |
\item \texttt{-Wcast-align} Check for code which casts pointers to a type with a stricter alignment |
\item[\texttt{-Wcast-align}] Check for code which casts pointers to a type with a stricter alignment |
requirement. On many RISC-based platforms this can cause run-time unaligned access exceptions. |
\item \texttt{-Wconversion} Check for code where the implicit type conversion (e.g. from float to integer, |
\item[\texttt{-Wconversion}] Check for code where the implicit type conversion (e.g. from float to integer, |
between signed and unsigned integers or between integers with different number of bits) can |
cause the actual value to change. |
\end{itemize} |
\end{description} |
|
To enhance the semantic information in the source code, we use GCC-specific language extensions to annotate |
some particular kernel and core library routines: |
|
\begin{itemize} |
\item \texttt{\_\_attribute\_\_((noreturn))} Functions marked in this way never finish from the point of view |
\begin{description} |
\item[\texttt{\_\_attribute\_\_((noreturn))}] Functions marked in this way never finish from the point of view |
of the current sequential execution flow. The most common case are the routines which restore previously saved |
execution context. |
\item \texttt{\_\_attribute\_\_((returns\_twice))} Functions marked in this way may return multiple times from |
\item[\texttt{\_\_attribute\_\_((returns\_twice))}] Functions marked in this way may return multiple times from |
the point of view of the current sequential execution flow. This is the case of routines which save the current |
execution context (first the function returns as usual, but the function can eventually ``return again'' |
when the context is being restored). |
\end{itemize} |
\end{description} |
|
The use of these extensions has pointed out to several hard-to-debug bugs on the IA-64 platform. |
|
588,11 → 636,11 |
\subsection{Instrumentation} |
We are in the process of implementing our own code instrumentation framework which is motivated mainly |
by the need to support MMU-less architectures in the future. But this framework might be also very helpful |
in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar existing tool |
because of the estimated complexity to adopt it for the usage in HelenOS. |
in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar |
existing tool because of the estimated complexity to adopt it for the usage in HelenOS. |
|
HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since that |
time the code base has not been analyzed by Coverity. |
HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since |
that time the code base has not been analyzed by Coverity. |
|
\subsection{Static Analyzer} |
The integration of various static analyzers into the HelenOS continuous integration process is underway. |
623,49 → 671,56 |
with this approach. Because both synchronization primitives are relatively complex, utilizing a model checker |
should provide a much more trustworthy proof of the required properties than ``paper and pencil''. |
|
\subsection{Behavior Checker} |
We have created an architecture description in ADL language similar to SOFA ADL~\cite{adl} for the majority of the |
HelenOS components and created the Behavior Protocol specification of these components. The architecture |
is a snapshot of the dynamic architecture just after a successful bootstrap. |
The initial choice of Spin is motivated by its suitability to model threads, their interaction and verify |
properties related to race conditions and deadlocks (which is the common sources of OS-related bugs). Other |
modeling formalisms might be more suitable for different goals. |
|
\subsection{Architecture and Behavior Checker} |
We have created an architecture description in ADL language derived from \emph{SOFA ADL}~\cite{adl} for the |
majority of the HelenOS components and created the Behavior Protocol specification of these components. |
Both descriptions were created independently, not by reverse-engineering the existing source code. |
The architecture is a snapshot of the dynamic architecture just after a successful bootstrap of HelenOS. |
|
Both the architecture and behavior description is readily available as part of the source code repository |
of HelenOS, including tools which can preprocess the Behavior Protocols according to the architecture description |
and create an output suitable for \emph{bp2promela} checker~\cite{bp}. |
|
As the resulting complexity of the description is larger than any of the previously published case studies |
on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune the bp2promela |
checker to process the input. |
on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune |
the bp2promela checker to process the input. |
|
\medskip |
|
We have not started to generate code from the architecture description so far because of time constrains. However, |
we believe that this is a very promising way to go. |
We have not started to generate code from the architecture description so far because of time constrains. |
However, we believe that this is a very promising way to go and provide reasonable guarantees about |
the compliance between the specification and the implementation. |
|
\subsection{Behavior Description Generator} |
We have not tackled the issue of behavior description generation yet, although tools such as \emph{Procasor}~\cite{procasor} are readily |
available. We do not consider it our priority at this time. |
We have not tackled the issue of behavior description generation yet, although tools such as |
\emph{Procasor}~\cite{procasor} are readily available. We do not consider it our priority at this time. |
|
\section{Conclusion} |
\label{conclusion} |
In this paper we propose a complex combination of various formal verification methods and tools |
to achieve the verification of an entire general-purpose operating system. The proposed approach generally follows |
a bottom-to-top path, starting with low-level checks using state-of-the-art verifying C language compilers, |
following by static analyzers and static verifiers. In specific contexts regression and unit tests, |
code instrumentation and model checkers for the sake of verification of key algorithms |
are utilized. |
In this paper we propose a complex combination of various verification methods and tools |
to achieve the verification of an entire general-purpose operating system. The proposed |
approach generally follows a bottom-up route, starting with low-level checks using state-of-the-art |
verifying C language compilers, following by static analyzers and static verifiers. |
In specific contexts regression and unit tests, code instrumentation and model checkers |
for the sake of verification of key algorithms are utilized. |
|
Thanks to the properties of state-of-the-art microkernel multiserver operating |
system design (e.g. software component encapsulation and composition), we demonstrate |
that it should be feasible to successfully verify larger and more complex operating systems than |
in the case of monolithic designs. We use formal component architecture and behavior |
description for the closure. The final goal -- a formally verified operating system -- is the |
emerging property of the combination of the various methods. |
system design (e.g. software component encapsulation and composition, fine-grained isolated |
components), we demonstrate that it should be feasible to successfully verify larger and more |
complex operating systems than in the case of monolithic designs. We use formal component |
architecture and behavior description for the closure. The final goal -- a formally verified |
operating system -- is the emerging property of the combination of the various methods. |
|
\medskip |
|
The contribution of this paper is the shift of focus from attempts to use a single ``silver-bullet'' |
method for formal verification of an operating system to a combination of multiple methods supported |
by a suitable architecture of the operating system. |
The contribution of this paper is the shift of focus from attempts to use a single |
``silver-bullet'' method for formal verification of an operating system to a combination |
of multiple methods supported by a suitable architecture of the operating system. |
The main benefit is a much larger coverage of the set of all hypothetical properties. |
|
We also argue that the approach should be suitable for the mainstream |
general-purpose operating systems in the near future, because they are gradually |
676,8 → 731,8 |
|
\medskip |
|
\noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of the HelenOS |
project. Without their vision and dedication the work on this paper would be almost impossible |
\noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of |
the HelenOS project. Without their vision and dedication the work on this paper would be almost impossible |
|
This work was partially supported by the Ministry of Education of the Czech Republic |
(grant MSM\-0021620838). |