WebSVN – HelenOS-doc – Path Comparison – / – /papers Rev 180 and /papers Rev 181

Regard whitespace Rev 180 → Rev 181

 /papers/isarcs10/isarcs10.tex
 ,18 → 286,20
         \subsection{C Language Compiler and Continuous Integration Tool}
             \label{clang}
-            The initial levels of abstraction do not go far from the C source code. First, we would certainly like to
-            know whether our code base is compliant with the programming language specification and passes
-            only the basic semantic checks (proper number and types of arguments passed to functions, etc.).
-            It is perhaps not very surprising that these decisions can be make by any plain C compiler.
-            However, with the current implementation of HelenOS even this is not quite trivial.
+            The initial levels of abstraction do not go far from the C source code and common engineering
+            approaches. First, we would certainly like to know whether our code base is compliant with the
+            programming language specification and passes only the basic semantic checks (proper number
+            and types of arguments passed to functions, etc.). It is perhaps not very surprising that
+            these decisions can be make by any plain C compiler. However, with the current implementation
+            of HelenOS even this is not quite trivial.
             Besides the requirement to support 7 hardware platforms, the system's compile-time configuration
-            can be also affected by approximately 65 configuration options, most of which are booleans, the rest
-            are enumerated types.
+            can be also affected by approximately 65 configuration options, most of which are booleans,
+            the rest are enumerated types.
             These configuration options are bound by logical propositions in conjunctive or disjunctive
             normal forms and while some options are freely configurable, the value of others gets inferred
-            by the build system of HelenOS. Thus, the overall number of distinct configurations in which
+            by the build system of HelenOS. The overall number of distinct configurations in which
             HelenOS can be compiled is at least one order of magnitude larger than the plain number
             of supported hardware platforms.
 ,26 → 314,33
             open variables and inferring the others. This can be part of the continuous integration
             process which would try to compile and link the sources in all distinct configurations.
-            If we want to be really pedantic, we should also make sure that we run all the relevant higher
+            If we want to be really pedantic, we should also make sure that we run all higher
             level verification methods on all configurations generated by this step. That would certainly
             require to multiply the time required by the verification methods at least by the number
             of the distinct configurations. Constraining the set of configurations to just the most
-            representative ones is perhaps a reasonable compromise.
+            representative ones is perhaps a reasonable compromise to make the verification realistic.
         \subsection{Regression and Unit Tests}
-            Although some people would argue whether testing is a formal verification method, we still include
-            it into the big picture. Running regression and unit tests which are part of HelenOS code base
-            in the continuous integration process is fairly easy. The only complication lies in the technicalities:
+            Running regression and unit tests which are part of HelenOS code base in the continuous
+            integration process is fairly easy. The only complication lies in the technicalities:
             We need to setup an automated network of physical machines and simulators which can run the
             appropriate compilation outputs for the specific platforms. We need to be able to reboot
             them remotely and distribute the boot images to them. And last but not least, we need to be
             able to gather the results from them.
+            Testing is always non-exhaustive, thus the guarantees provided by tests are strictly limited
+            to the use cases and contexts which are being explicitly tested. However, it is arguably
+            easier to express many common use cases in the primary programming language than in some
+            different formalism. As we follow the bottom-up approach, filtering out the most obvious
+            bugs by testing can save us a lot of valuable time which would be otherwise waisted by
+            a futile verification by more formal (and more time-consuming) methods.
         \subsection{Instrumentation}
             Instrumentation tools for detecting memory leaks, performance bottlenecks and soft-deadlocks
-            are also not usually considered to be formal verification tools. They are also rarely utilized
-            on regular basis as part of the continuous integration process. But again, it might be helpful
-            to just mention them in the context of regression and unit tests.
+            are also not usually considered to be formal verification tools (since it is hard to define
+            exact formal properties which are being verified by the non-exhaustive nature of these tools).
+            They are also rarely utilized on regular basis as part of the continuous integration process.
+            But again, it might be helpful to just mention them in the big picture.
             If some regression or unit tests fail, they sometimes do not give sufficient information to
             tell immediately what is the root cause of the issue. In that case running the faulting tests
 ,10 → 348,11
             more directly to the actual bug.
         \subsection{Verifying C Language Compiler}
-            C language compilers are traditionally not considered to be formal verification tools. Many people
-            just say that C compilers are good at generating executable code, but do not care much about the semantics
-            of the source code (on the other hand, formal verification tools usually do not generate any executable code
-            at all). However, with recent development in the compiler domain, the old paradigms are shifting.
+            C language compilers are traditionally also not considered to be formal verification tools.
+            Many people just say that C compilers are good at generating executable code, but do not
+            care much about the semantics of the source code (on the other hand, formal verification
+            tools usually do not generate any executable code at all). However, with recent development
+            in the compiler domain, the old paradigms are shifting.
             As the optimization passes and general maturity of the compilers improve over time,
             the compilers try to extract and use more and more semantic information from the source code.
 ,11 → 365,11
             code compliant to some older standard (e.g. C89) even when it is not up with the current quality
             expectations. Old legacy source code should still pass the compilation as it did decades ago.
-            Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation
-            or exhaustive traversal of a model state space. They are mostly realized as abstract syntax tree
-            transformations much in the line with the supporting routines of the compilation process (data
-            and control flow graph analysis, dead code elimination, register allocation, etc.) and the evaluation
-            function is basically the matching of antipatterns of common programming bugs.
+            Secondly, the checks run by the verifying compilers are usually not based on abstract interpretation.
+            They are mostly realized as abstract syntax tree transformations much in the line with the supporting
+            routines of the compilation process (data and control flow graph analysis, dead code elimination,
+            register allocation, etc.) and the evaluation function is basically the matching of antipatterns
+            of common programming bugs.
             The checks are usually conservative. The verifying compilers identify code constructs which are suspicious,
             which might arise out of programmer's bad intuition and so on, but even these code snippets cannot be
 ,6 → 379,11
             positive and fix the issue by either amending some additional semantic information (e.g. adding an
             explicit typecast or a vendor-specific language extension) or rewriting the code.
+            Although this level of abstraction is coarse-grained and conservative, it can be called semi-formal,
+            since the properties which are being verified can be actually defined quite exactly and they
+            are reasonably general. They do not deal with single traces of methods, runs and use
+            cases like tests, but they deal with all possible contexts in which the code can run.
         \subsection{Static Analyzer}
             Static analyzers try to go deeper than verifying compilers. Besides detecting common antipatterns of
             bugs, they also use techniques such as abstract interpretation to check for more complex properties.
 ,13 → 396,14
             The biggest advantage of static analyzers is that they can be easily included in the development or
             continuous integration process as an additional automated step, very similar to the verifying compilers.
-            No definition of properties is needed and false positives can be relatively easily eliminated by amending
-            some explicit additional information to the source code within the boundaries of the programming language.
+            No manual definition of code-specific properties is needed and false positives can be relatively easily
+            eliminated by amending some explicit additional information to the source code within the boundaries
+            of the programming language.
-            The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion}, but static
-            analyzers are still relatively limited by the kind of bugs they are designed to detect. They
-            are usually good at pointing out common issues with security implications (specific types of buffer
-            and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten
+            The authors of static analyzers claim large quantities of bugs detected or prevented~\cite{billion},
+            but static analyzers are still relatively limited by the kind of bugs they are designed to detect.
+            They are usually good at pointing out common issues with security implications (specific types of
+            buffer and stack overruns, usage of well-known functions in an unsafe way, clear cases of forgotten
             deallocation of resources and release of locks, etc.). Unfortunately, many static analyzers
             only analyze a single-threaded control flow and are thus unable to detect concurrency issues
             such as deadlocks.
 ,16 → 417,16
             In the context of an operating system kernel and core libraries two kinds of properties are
             common:
-            \begin{itemize}
-                \item \emph{Consistency constrains:} These properties define the correct way how data is supposed
+            \begin{description}
+                \item[Consistency constrains] These properties define the correct way how data is supposed
                       to be manipulated by some related set of subroutines. Checking for these
                       properties ensures that data structures and internal states will not get corrupt due
                       to bugs in the functions and methods which are designed to manipulate them.
-                \item \emph{Interface enforcement:} These properties define the correct semantics by which
+                \item[Interface enforcements] These properties define the correct semantics by which
                       a set of subroutines should be used by the rest of the code. Checking for these properties
                       ensures that some API is always used by the rest of the code in a specified way
-                      and all possible error states are detected and reported.
-            \end{itemize}
+                      and all reported exceptions are handled by the client code.
+            \end{description}
         \subsection{Model Checker}
             \label{modelcheck}
 ,56 → 447,80
             that the root cause is in the non-compliance between the design and implementation and not a
             fundamental flaw of the design itself.
-        \subsection{Behavior Checker}
-            All previously mentioned verification methods were targeting internal properties of the operating system
-            components. If we are moving to a higher-level abstraction in order to specify correct interaction of the encapsulated
-            components in terms of interface compatibility and communication, we can utilize \emph{Behavior Protocols}~\cite{bp}.
+        \subsection{Architecture and Behavior Checker}
+            All previously mentioned verification methods were targeting internal properties of the OS
+            components. If we are moving to a higher-level abstraction in order to specify correct
+            interaction of the encapsulated components in terms of interface compatibility and communication,
+            we can utilize \emph{Behavior Protocols}~\cite{bp} or some other formalism describing correct
+            interaction between software components.
-            To gain the knowledge about the architecture of the whole operating system in terms of software
-            component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}.
-            This language has the possibility to capture interface types (with method signatures), primitive
-            components (in terms of provided and required interfaces), composite components (an architectural
-            compositions of primitive components) and the bindings between the respective interfaces of the
-            components.
+            To gain the knowledge about the architecture of the whole OS in terms of software
+            component composition and bindings, we can use \emph{Architecture Description Language}~\cite{adl}
+            as the specification of the architecture of the system. This language has the possibility to capture
+            interface types (with method signatures), primitive components (in terms of provided and required
+            interfaces), composite components (an architectural compositions of primitive components) and the
+            bindings between the respective interfaces of the components.
-            Unfortunately, the description is usually static, which is not quite suitable for the dynamic
-            nature of HelenOS and other operating systems. This limitation can be circumvented by considering a relevant
-            snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to
-            a statically defined architecture.
+            It is extremely important to define the right role of the behavior and architecture description.
+            A flawed approach would be to reverse-engineer this description from the source code (either manually
+            or via some sophisticated tool) and then verify the compliance between the desciption and
+            the implementation. However, different directions can give more interesting results:
-            Behavior Protocol checkers can target three main goals:
-            \begin{itemize}
-                \item \emph{Horizontal compliance:} Also called \emph{compatibility}. The goal is to check
+            \begin{description}
+                \item[Description as specification] Behavior and architecture description created independently
+                      on the source code serves the role of specification. This has the following primary
+                      goals of formal verification:
+                      \begin{description}
+                        \item[Horizontal compliance] Also called \emph{compatibility}. The goal is to check
                       whether the specifications of components that are bound together are semantically
                       compatible. All required interfaces need to be bound to provided interfaces and
-                      the communication between the components cannot lead to \emph{no activity} (a deadlock)
-                      or a \emph{bad activity} (a livelock).
-                \item \emph{Vertical compliance:} Also called \emph{substituability}. The goal is to check whether
+                             the communication between the components cannot lead to \emph{no activity} (a deadlock),
+                             \emph{bad activity} (a livelock) or other communication and synchronization errors.
+                        \item[Vertical compliance] Also called \emph{substituability}. The goal is to check whether
                       it is possible to replace a set of primitive components that are nested inside a composite
                       component by the composite component itself. In other words, this compliance can answer the
                       questions whether the architecture description of the system is sound with respect to the hierarchical
                       composition of the components.
-                \item \emph{Compliance between the specification and the implementation:} Using various means
+                        \item[Compliance between the specification and the implementation] Using various means
                       for generating artificial environments for an isolated component the checker is able to
                       partially answer the question whether the implementation of the component is an instantiation
                       of the component specification.
-            \end{itemize}
+                      \end{description}
+                \item[Description as abstraction] Generating the behavior and architecture description from the
+                      source code by means of abstract interpretation can serve the purpose of verifying various
+                      properties of the implementation such as invariants, preconditions and postconditions.
+                      This is similar to static verification, but on the level of component interfaces.
+            \end{description}
+            Unfortunately, most of the behavior and architecture formalisms are static, which is not quite suitable
+            for the dynamic of most OSes. This limitation can be circumvented by considering a relevant
+            snapshot of the dynamic run-time architecture. This snapshot fixed in time is equivalent to
+            a statically defined architecture.
+            \medskip
+            The key features of software systems with respect to behavior and architecture checkers are the granularity
+            of the individual primitive components, the level of isolation and complexity of the communication mechanism
+            between them. Large monolithic OSes created in sematic-poor C present a severe challenge because the
+            isolation of the individual components is vague and the communication between them is basically unlimited
+            (function calls, shared resources, etc.).
+            OSes with explicit component architecture and fine-grained components (such as microkernel multiserver
+            systems) make the feasibility of the verification much easier, since the degrees of freedom (and thus
+            the state space) is limited.
             Horizontal and vertical compliance checking can be done exhaustively. This is a fundamental property
-            which allows the reasoning about the dependability of the entire component-based operating system.
+            which allows the reasoning about the dependability of the entire component-based OS.
             Assuming that the lower-level verification methods (described in Sections \ref{clang} to \ref{modelcheck})
             prove some specific properties of the primitive components, we can be sure that the composition of
-            the primitive components into composite components and ultimately into the whole operating system
+            the primitive components into composite components and ultimately into the whole OS
             does not break these properties.
             The feasibility of many lower-level verification methods from Sections \ref{clang} to \ref{modelcheck}
-            depends largely on the size and complexity of the code under verification. If the entire operating
-            system is decomposed into primitive components with a reasonable granularity, it is more likely that the
+            depends largely on the size and complexity of the code under verification. If the entire OS
+            is decomposed into primitive components with a fine granularity, it is more likely that the
             individual primitive components can be verified against a large number of properties. Thanks to the
             recursive component composition we can then be sure that these properties also hold for the entire system.
-            In monolithic operating systems without a clear decomposition we cannot do this inference and we
-            need to verify the lower-level properties over a much larger code base (e.g. the whole monolithic kernel).
             \medskip
 ,14 → 543,17
         \subsection{Summary}
             \label{missing}
-            So far, we have proposed a compact combination of formal methods which start at the level of C programming
-            language, offer the possibility to check for the presence of various common antipatterns, to check for generic
-            algorithm-related properties, consistency constrains, interface enforcements and conclude with a framework
-            to make these properties hold even in the case of a large operating system composed from many
-            components of compliant behavior.
+            So far, we have proposed a compact combination of engineering, semi-formal and formal methods which
+            start at the level of C programming language, offer the possibility to check for the presence of various
+            common antipatterns, check for generic algorithm-related properties, consistency constrains, interface
+            enforcements and conclude with a framework to make these properties hold even in the case of a large
+            OS composed from many components of compliant behavior.
-            We are also able to fill in some of the missing pieces by other software engineering approaches
-            such as regression and unit testing and instrumentation.
+            We do not claim that there are no missing pieces in the big picture or that the semi-formal verifications
+            might provide more guarantees in this setup. However, state-of-the-art OS design guidelines can push
+            further the boundaries of practical feasibility of the presented methods. The limited guarantees
+            of the low-level methods hold even in the composition and the high-level formal methods do not have
+            to deal with unlimited degrees of freedom of the primitive component implementation.
             \medskip
 ,6 → 570,11
         evaluation of the the proposed formalisms and tools. As this is still largely a work-in-progress,
         in many cases just the initial observations can be made.
+        The choice of the specific methods, tools and formalisms in this initial phase is mostly motivated
+        by their perceived commonality and author's claims about fitness for the given purpose. An important
+        part of further evaluation would certainly be to compare multiple particular approaches, tools
+        and formalisms to find the optimal combination.
         \subsection{Verifying C Language Compiler and Continuous Integration Tool}
             The primary C compiler used by HelenOS is \emph{GNU GCC 4.4.3} (all platforms)~\cite{gcc} and \emph{Clang 2.6.0}
             (IA-32)~\cite{clang}. We have taken some effort to support also \emph{ICC} and \emph{Sun Studio} C compilers,
 ,29 → 591,29
             We also turn on several more specific and strict checks. These checks helped to discover several
             latent bugs in the source code:
-            \begin{itemize}
-                \item \texttt{-Wfloat-equal} Check for exact equality comparison between floating point values. The
+            \begin{description}
+                \item[\texttt{-Wfloat-equal}] Check for exact equality comparison between floating point values. The
                       usage of equal comparator on floats is usually misguided due to the inherent computational errors
                       of floats.
-                \item \texttt{-Wcast-align} Check for code which casts pointers to a type with a stricter alignment
+                \item[\texttt{-Wcast-align}] Check for code which casts pointers to a type with a stricter alignment
                       requirement. On many RISC-based platforms this can cause run-time unaligned access exceptions.
-                \item \texttt{-Wconversion} Check for code where the implicit type conversion (e.g. from float to integer,
+                \item[\texttt{-Wconversion}] Check for code where the implicit type conversion (e.g. from float to integer,
                       between signed and unsigned integers or between integers with different number of bits) can
                       cause the actual value to change.
-            \end{itemize}
+            \end{description}
             To enhance the semantic information in the source code, we use GCC-specific language extensions to annotate
             some particular kernel and core library routines:
-            \begin{itemize}
-                \item \texttt{\_\_attribute\_\_((noreturn))} Functions marked in this way never finish from the point of view
+            \begin{description}
+                \item[\texttt{\_\_attribute\_\_((noreturn))}] Functions marked in this way never finish from the point of view
                       of the current sequential execution flow. The most common case are the routines which restore previously saved
                       execution context.
-                \item \texttt{\_\_attribute\_\_((returns\_twice))} Functions marked in this way may return multiple times from
+                \item[\texttt{\_\_attribute\_\_((returns\_twice))}] Functions marked in this way may return multiple times from
                       the point of view of the current sequential execution flow. This is the case of routines which save the current
                       execution context (first the function returns as usual, but the function can eventually ``return again''
                       when the context is being restored).
-            \end{itemize}
+            \end{description}
             The use of these extensions has pointed out to several hard-to-debug bugs on the IA-64 platform.
 ,11 → 636,11
         \subsection{Instrumentation}
             We are in the process of implementing our own code instrumentation framework which is motivated mainly
             by the need to support MMU-less architectures in the future. But this framework might be also very helpful
-            in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar existing tool
-            because of the estimated complexity to adopt it for the usage in HelenOS.
+            in detecting memory and generic resource leaks. We have not tried \emph{Valgrind}~\cite{valgrind} or any similar
+            existing tool because of the estimated complexity to adopt it for the usage in HelenOS.
-            HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since that
-            time the code base has not been analyzed by Coverity.
+            HelenOS was also scanned by \emph{Coverity}~\cite{coverity} in 2006 when no errors were detected. However, since
+            that time the code base has not been analyzed by Coverity.
         \subsection{Static Analyzer}
             The integration of various static analyzers into the HelenOS continuous integration process is underway.
 ,49 → 671,56
             with this approach. Because both synchronization primitives are relatively complex, utilizing a model checker
             should provide a much more trustworthy proof of the required properties than ``paper and pencil''.
-        \subsection{Behavior Checker}
-            We have created an architecture description in ADL language similar to SOFA ADL~\cite{adl} for the majority of the
-            HelenOS components and created the Behavior Protocol specification of these components. The architecture
-            is a snapshot of the dynamic architecture just after a successful bootstrap.
+            The initial choice of Spin is motivated by its suitability to model threads, their interaction and verify
+            properties related to race conditions and deadlocks (which is the common sources of OS-related bugs). Other
+            modeling formalisms might be more suitable for different goals.
+        \subsection{Architecture and Behavior Checker}
+            We have created an architecture description in ADL language derived from \emph{SOFA ADL}~\cite{adl} for the
+            majority of the HelenOS components and created the Behavior Protocol specification of these components.
+            Both descriptions were created independently, not by reverse-engineering the existing source code.
+            The architecture is a snapshot of the dynamic architecture just after a successful bootstrap of HelenOS.
             Both the architecture and behavior description is readily available as part of the source code repository
             of HelenOS, including tools which can preprocess the Behavior Protocols according to the architecture description
             and create an output suitable for \emph{bp2promela} checker~\cite{bp}.
             As the resulting complexity of the description is larger than any of the previously published case studies
-            on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune the bp2promela
-            checker to process the input.
+            on Behavior Protocols (compare to~\cite{cocome}), our current work-in-progress is to optimize and fine-tune
+            the bp2promela checker to process the input.
             \medskip
-            We have not started to generate code from the architecture description so far because of time constrains. However,
-            we believe that this is a very promising way to go.
+            We have not started to generate code from the architecture description so far because of time constrains.
+            However, we believe that this is a very promising way to go and provide reasonable guarantees about
+            the compliance between the specification and the implementation.
         \subsection{Behavior Description Generator}
-            We have not tackled the issue of behavior description generation yet, although tools such as \emph{Procasor}~\cite{procasor} are readily
-            available. We do not consider it our priority at this time.
+            We have not tackled the issue of behavior description generation yet, although tools such as
+            \emph{Procasor}~\cite{procasor} are readily available. We do not consider it our priority at this time.
     \section{Conclusion}
         \label{conclusion}
-        In this paper we propose a complex combination of various formal verification methods and tools
-        to achieve the verification of an entire general-purpose operating system. The proposed approach generally follows
-        a bottom-to-top path, starting with low-level checks using state-of-the-art verifying C language compilers,
-        following by static analyzers and static verifiers. In specific contexts regression and unit tests,
-        code instrumentation and model checkers for the sake of verification of key algorithms
-        are utilized.
+        In this paper we propose a complex combination of various verification methods and tools
+        to achieve the verification of an entire general-purpose operating system. The proposed
+        approach generally follows a bottom-up route, starting with low-level checks using state-of-the-art
+        verifying C language compilers, following by static analyzers and static verifiers.
+        In specific contexts regression and unit tests, code instrumentation and model checkers
+        for the sake of verification of key algorithms are utilized.
         Thanks to the properties of state-of-the-art microkernel multiserver operating
-        system design (e.g. software component encapsulation and composition), we demonstrate
-        that it should be feasible to successfully verify larger and more complex operating systems than
-        in the case of monolithic designs. We use formal component architecture and behavior
-        description for the closure. The final goal -- a formally verified operating system -- is the
-        emerging property of the combination of the various methods.
+        system design (e.g. software component encapsulation and composition, fine-grained isolated
+        components), we demonstrate that it should be feasible to successfully verify larger and more
+        complex operating systems than in the case of monolithic designs. We use formal component
+        architecture and behavior description for the closure. The final goal -- a formally verified
+        operating system -- is the emerging property of the combination of the various methods.
         \medskip
-        The contribution of this paper is the shift of focus from attempts to use a single ``silver-bullet''
-        method for formal verification of an operating system to a combination of multiple methods supported
-        by a suitable architecture of the operating system.
+        The contribution of this paper is the shift of focus from attempts to use a single
+        ``silver-bullet'' method for formal verification of an operating system to a combination
+        of multiple methods supported by a suitable architecture of the operating system.
+        The main benefit is a much larger coverage of the set of all hypothetical properties.
         We also argue that the approach should be suitable for the mainstream
         general-purpose operating systems in the near future, because they are gradually
 ,8 → 731,8
         \medskip
-        \noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of the HelenOS
-        project. Without their vision and dedication the work on this paper would be almost impossible
+        \noindent\textbf{Acknowledgments.} The author would like to express his gratitude to all contributors of
+        the HelenOS project. Without their vision and dedication the work on this paper would be almost impossible
         This work was partially supported by the Ministry of Education of the Czech Republic
         (grant MSM\-0021620838).

Subversion Repositories HelenOS-doc

Compare Revisions

Regard whitespace Rev 180 → Rev 181