[cfe-dev] [analyzer] Proof-of-concept for Matcher-like checker API (graph matchers)
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon May 21 11:53:25 PDT 2018
On 5/16/18 4:37 PM, Alexey Sidorin wrote:
> Hello everyone,
>
> I'd like to share some results of my investigation targeted on
> improvement of Clang Static Analyzer checker API. You can find some
> previous conversation on this topic here:
> http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html.
> In my investigation, I tried to solve a particular problem of building
> a checker without generating new nodes.
>
> --------- Introduction and design goals ---------
>
> In brief, I tried to use matchers-like API to make CSA checkers look
> like this:
>
> StatementMatcher NotChdir =
> callExpr(unless(callee(functionDecl(hasName("::chdir")))));
> Finder.addMatcher(
> hasSequence(
> postStmt(hasStatement(
> callExpr(callee(functionDecl(hasName("::chroot")))))),
> unless(stmtPoint(hasStatement(callExpr(
> callee(functionDecl(hasName("::chdir"))),
> hasArgument(0,
> hasValue(stringRegion(refersString("/")))))))),
> explodedNode(anyOf(postStmt(hasStatement(NotChdir)),
> callEnter(hasCallExpr(NotChdir))))
> .bind("bug_node")),
> &Callback);
> Finder.match(G);
>
> and I have managed to make some simple working examples.
>
> The entire diff can be found here:
> https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f
> The code itself: https://github.com/a-sid/clang/tree/a4
>
> There are several reasons why I have tried this approach.
>
> 1. AST Matchers are already extensively used API for AST checking.
> They are available both in Clang-Tidy and CSA. And I wanted to use
> existing functionality as much as possible. So, I decided to extend an
> existing API to make its usage seamless across different kinds of
> checks: path-sensitive, AST-based and CFG-based.
>
> 2. AST matchers effectively help clients to avoid a lot of checking of
> dyn_cast results. This feature not only makes them more convenient but
> also more safe because, in my experience, forgetting a nullptr/None
> check is a pretty common mistake for checker writers.
>
> 3. Testing of AST matchers don't require writing C++ code - it can be
> done interactively with clang-query tool. And I believe that we need
> similar functionality for CSA as well.
>
> I didn't want to replace existing checker API. Instead, I tried to
> make new possibilities usable independently or together with existing.
>
> --------- Design and implementation ---------
>
> The implementation consisted of a number of steps.
>
> 1. Allow matching nodes of path-sensitive graph like usual AST nodes.
> To make this possible, three actions were needed:
> 1.1. ASTTypeTraits and DynTypedNode were extended to support
> path-sensitive nodes: ExplodedNode, ProgramState, SVal, SymExpr, etc.
> The implementation with graph node support is moved into a separate
> class (ASTGraphTypeTraits) in ento namespace to resolve cyclic
> dependencies (they are still exist, unfortunately, but are
> header-only, so we can build the PoC).
> 1.2. Some additions to AST matchers were made to add support for new
> kinds of nodes.
> 1.3. To make MatchFinder able to query specific options not supported
> by pure AST, it was augmented with "Contexts". A matcher that needs to
> query the path-sensitive engine asks the Finder for the required
> Context which provides specific helper functions.
>
> As the result of this step, we are able to write matchers like
> expr(hasArgument(0, hasValue(stringRegion(refersString("/"))))).
>
> 2. Create an engine for graph exploration and matching.
> Unlike normal AST matchers, hasSequence is a path-sensitive matcher.
> It is launched against ExplodedGraph. These matchers are handled by
> GraphMatchFinder object. While searching a graph, it collects matches.
> Each match contains a pointer to the corresponding matcher and State
> ID of this match. The way how matches are translated from one state ID
> to another is determined by matcher operators.
>
> For example, SequenceVariadicOperator, which is the base of
> hasSequence() matcher, has "positive" and "negative" sub-matches. Each
> positive matcher has its corresponding State ID. In order to advance
> to the next State ID, a node being matched should match all "negative"
> matchers before the next "positive" matchers and the next "positive"
> matcher itself. Failure in matching "negative" matcher discards the
> match.
>
> The role of GraphMatchFinder is similar to MatchFinder: it is only
> responsible for graph exploration and keeping bound nodes and matchers.
>
> 3. Manage bound nodes.
> While matching a single graph node, BoundNodes from AST MatchFinder
> are used. AST matchers for path-sensitive nodes support bindings
> out-of-the-box. However, the situation with graph matching is a bit
> different. For graph matching, we have another system of bound nodes.
> Each graph node has a related map of bounds aka GDMTy (yes, the name
> is not a coincidence). GDMTy is a mapping from match ID to
> BoundNodesMap which, in part, is effectively a map from std::string to
> DynTypedNodes. This look pretty much like how GDM is organized in
> ExplodedGraph, just without one level of indirection (it can be added,
> though).
>
> MatchFinder contexts should allow support of their own bindings.
> This is how equalsBoundNode() matcher works for path-sensitive nodes:
> it just queries all available contexts for the binding.
>
> Finally, I have managed to make two checkers work in this way:
> ChrootChecker (which is present in the introduction) and
> TestAfterDivZeroChecker. Both them can be found in ChrootCheckerV2.cpp
> and TestAfterDivZeroCheckerV2.cpp correspondingly. They act like
> normal checkers: produce warnings and use visitors. The main
> difference is that they cannot generate sinks and use checkEndAnalysis
> callback. The code of these checkers can be found here:
>
> https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp
>
> https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp
>
>
>
> -------- Some features --------
>
> 1.The design of bindings has an interesting consequence: it enables
> the dynamic introspection of GDM which was pretty hard before. (Hello
> alpha renaming?)
> 2. Nothing prevents matchers to be used with existing checker API for
> simplifying conditional checking inside callbacks. The matchers are
> not planned as the replacement for the current API, but look like a
> nice complementary part.
> 3. Implicit conversion of Matcher<ProgramPoint> to
> Matcher<ExplodedNode> and Matcher<SymExpr || MemRegion> to
> Matcher<SVal> for writing shorter code.
>
> -------- Not implemented/not checked yet --------
> I tried to keep the PoC as minimal as possible. As the result, some
> features are not implemented yet, but I believe they can be added.
> 1. Usage of matchers inside checker callbacks
> This is not exactly unimplemented, but is still untested.
> 2. Dynamic matchers (clang-query interface)
> In addition to work for supporting dynamic nodes (I don't know how
> many efforts it can take), we need to enable matching against AST
> nodes, not graph. But it doesn't look as a problem, we can just make
> matchers polymorphic.
> 3. Binding to successfully matched paths is not implemented yet - only
> bindings to separate nodes. I wonder if we need path bindings at all.
> 4. Some corner cases are still FIXMEs: single-node sequences, for
> example.
> 5. GDM is not based on immutable data structures like it is done in
> CSA - it is just an STL map. Its performance can be poor (full copy on
> every new node), but I don't think that changing it to use immutable
> structures is hard.
> 6. Matching on-the-fly
> GraphMatchFinder should support on-the-fly matching during
> ExplodedGraph building. For this support, we should just call its
> advance() method each time a new node is created. However, this
> possibility wasn't checked yet.
> 7. Matching CFG and CallGraph isn't implemented because it was
> considered to be far out of simple PoC.
> 8. Only sequential matching is available now, and I didn't try to
> implement other operators yet. So, implementing a checker similar to
> PthreadLock can be tricky or even impossible for now.
>
> -------- Known and potential issues --------
> From matchers' side:
> 1. Some tests don't pass because they rely on the checker ability to
> generate sink nodes. Our matchers cannot do it by design so tests
> don't pass.
> 2. Representing checker bindings as a map can be less effective than
> storing data in structures. I didn't measure the overhead, so I cannot
> give any numbers.
> 3. Matchers are called on every node independently of its type. This
> is not what CheckerManager does. I wonder if this detail can affect
> performance as well.
> 4. Problems with cyclic dependencies. clangStaticAnalyzer has a
> dependency on clangASTMatchers, therefore, clangASTMatchers cannot
> depend on clangStaticAnalyzer. However, if we want ASTMatchers to
> support static analyzer data structures, there should be a backward
> dependency. Now this dependency is header-only.
> 5. Code duplication. This is mostly fine for a sandbox but needs to be
> reduced.
>
> From analyzer's side:
> 1. Many events are not reflected in the final ExplodedGraph. For
> example, checkers can receive PointerEscape event
Hmm, what if we "invert" this event, i.e. make it opt-out rather than
opt-in, i.e., pointers are escaped by the matcher engine automatically,
but if the checker wants a certain event to not escape the pointer, it
adds extra matching code, such as "the sequence may include 0 or more
passes of the symbol to a certain set of functions in a certain manner".
> , but the event itself will not be recorded in the ExplodedGraph -
> only changes made by checkers will. That's also true for Stmt nodes: I
> noticed same issues with PostCondition. This makes matching a bit
> harder. Should we add some information into ExplodedGraph?
> 2. (Minor) Some static analyzer data structures don't support
> traditional LLVM casting methods (dyn_cast, isa) because they lack
> classof() method. I have fixed this internally and will put a patch on
> review soon.
>
> -------- Conclusion --------
> Finally, there is a graph matching engine supporting basic
> functionality and two checkers using it. I apologize for not starting
> the discussion earlier, but I just wasn't sure that the approach will
> work. Is anybody interested to have this stuff in upstream (maybe,
> changed or improved)? If yes, I'll be happy to contribute this work
> patch-by-patch and continue its improvement. If no - well, I had
> enough fun playing with it.
>
More information about the cfe-dev
mailing list