[cfe-commits] r119288 - /cfe/trunk/lib/Checker/README.txt

Mon Nov 15 15:42:37 PST 2010

Thanks for starting on this!

On Nov 15, 2010, at 3:19 PM, Zhanyong Wan wrote:

> Author: wan
> Date: Mon Nov 15 17:19:25 2010
> New Revision: 119288
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=119288&view=rev
> Log:
> Adds a README for the Clang static analyzer to document its design and work flow.  The contents are taken from email notes by kremenek and xuzhongxing.
> 
> Added:
>    cfe/trunk/lib/Checker/README.txt
> 
> Added: cfe/trunk/lib/Checker/README.txt
> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Checker/README.txt?rev=119288&view=auto
> ==============================================================================
> --- cfe/trunk/lib/Checker/README.txt (added)
> +++ cfe/trunk/lib/Checker/README.txt Mon Nov 15 17:19:25 2010
> @@ -0,0 +1,117 @@
> +//===----------------------------------------------------------------------===//
> +// Clang Static Analyzer
> +//===----------------------------------------------------------------------===//
> +
> += Library Structure =
> +
> +The analyzer library has two layers: a (low-level) static analysis
> +engine (GRExprEngine.cpp and friends), and some static checkers
> +(*Checker.cpp).  The latter are built on top of the former via the
> +Checker and CheckerVisitor interfaces (Checker.h and
> +CheckerVisitor.h).  The Checker interface is designed to be minimal
> +and simple for checker writers, and attempts to isolate them from much
> +of the gore of the internal analysis engine.
> +
> += How It Works =
> +
> +The analyzer is inspired by several foundational research papers ([1],
> +[2]).  (FIXME: kremenek to add more links)
> +
> +In a nutshell, the analyzer is basically a source code simulator that
> +traces out possible paths of execution.  The state of the program
> +(values of variables and expressions) is encapsulated by the state
> +(GRState).  A location in the program is called a program point
> +(ProgramPoint), and the combination of state and program point is a
> +node in an exploded graph (ExplodedGraph).  The term "exploded" comes
> +from exploding the control-flow edges in the control-flow graph (CFG).
> +
> +Conceptually the analyzer does a reachability analysis through the
> +ExplodedGraph.  We start at a root node, which has the entry program
> +point and initial state, and then simulate transitions by analyzing
> +individual expressions.  The analysis of an expression can cause the
> +state to change, resulting in a new node in the ExplodedGraph with an
> +updated program point and an updated state.  A bug is found by hitting
> +a node that satisfies some "bug condition" (basically a violation of a
> +checking invariant).
> +
> +The analyzer traces out multiple paths by reasoning about branches and
> +then bifurcating the state: on the true branch the conditions of the
> +branch are assumed to be true and on the false branch the conditions
> +of the branch are assumed to be false.  Such "assumptions" create
> +constraints on the values of the program, and those constraints are
> +recorded in the GRState object (and are manipulated by the
> +ConstraintManager).  If assuming the conditions of a branch would
> +cause the constraints to be unsatisfiable, the branch is considered
> +infeasible and that path is not taken.  This is how we get
> +path-sensitivity.  We reduce exponential blow-up by caching nodes.  If
> +a new node with the same state and program point as an existing node
> +would get generated, the path "caches out" and we simply reuse the
> +existing node.  Thus the ExplodedGraph is not a DAG; it can contain
> +cycles as paths loop back onto each other and cache out.
> +
> +GRState and ExplodedNodes are basically immutable once created.  Once
> +one creates a GRState, you need to create a new one to get a new
> +GRState.  This immutability is key since the ExplodedGraph represents
> +the behavior of the analyzed program from the entry point.  To
> +represent these efficiently, we use functional data structures (e.g.,
> +ImmutableMaps) which share data between instances.
> +
> +Finally, individual Checkers work by also manipulating the analysis
> +state.  The analyzer engine talks to them via a visitor interface.
> +For example, the PreVisitCallExpr() method is called by GRExprEngine
> +to tell the Checker that we are about to analyze a CallExpr, and the
> +checker is asked to check for any preconditions that might not be
> +satisfied.  The checker can do nothing, or it can generate a new
> +GRState and ExplodedNode which contains updated checker state.  If it
> +finds a bug, it can tell the BugReporter object about the bug,
> +providing it an ExplodedNode which is the last node in the path that
> +triggered the problem.
> +
> += Working on the Analyzer =
> +
> +If you are interested in bringing up support for C++ expressions, the
> +best place to look is the visitation logic in GRExprEngine, which
> +handles the simulation of individual expressions.  There are plenty of
> +examples there of how other expressions are handled.
> +
> +If you are interested in writing checkers, look at the Checker and
> +CheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look
> +at the files named *Checker.cpp for examples on how you can implement
> +these interfaces.
> +
> += Debugging the Analyzer =
> +
> +There are some useful command-line options for debugging.  For example:
> +
> +$ clang -cc1 -help | grep analyze
> + -analyze-function <value>
> + -analyzer-display-progress
> + -analyzer-viz-egraph-graphviz
> + ...
> +
> +The first allows you to specify only analyzing a specific function.
> +The second prints to the console what function is being analyzed.  The
> +third generates a graphviz dot file of the ExplodedGraph.  This is
> +extremely useful when debugging the analyzer and viewing the
> +simulation results.
> +
> +Of course, viewing the CFG (Control-Flow Graph) is also useful:
> +
> +$ clang -cc1 -help | grep cfg
> + -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
> + -cfg-add-initializers   Add C++ initializers to CFGs for all analyses
> + -cfg-dump               Display Control-Flow Graphs
> + -cfg-view               View Control-Flow Graphs using GraphViz
> + -unoptimized-cfg        Generate unoptimized CFGs for all analyses
> +
> +-cfg-dump dumps a textual representation of the CFG to the console,
> +and -cfg-view creates a GraphViz representation.
> +
> += References =
> +
> +[1] Precise interprocedural dataflow analysis via graph reachability,
> +    T Reps, S Horwitz, and M Sagiv, POPL '95,
> +    http://portal.acm.org/citation.cfm?id=199462
> +
> +[2] A memory model for static analysis of C programs, Z Xu, T
> +    Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf
> 
> 
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits