[cfe-dev] [analyzer] Proof-of-concept for Matcher-like checker API (graph matchers)

Sat May 11 12:58:45 PDT 2019

Necro-updating the status since some demand for this or similar CSA 
feature was discovered on EuroLLVM BoF. Unfortunately, I had few time to 
develop path-sensitive matchers last year so the change list is not very 
large.

1. On-the-fly matching while building ExplodedGraph. Both offline and 
online matching is available. The way how a checker handles graphs 
depends on how it was registered in CheckerManager: there is a separate 
method registerPathMatcher() that adds matcher to the set of online 
matchers.

2. Path-sensitive matchers gained the ability to generate new nodes and 
sinks in the callbacks. There are two limitations, however.
a) AST Matchers don't allow calling callbacks until matching is 
complete, so I haven't implemented this ability for path-sensitive 
matchers too.
b) Only on-the-fly matchers are allowed to add nodes (because the 
ability to add nodes to a complete graph seems useless).

3. The primary advancement for this PoC is that path-sensitive matchers 
now can work as dynamic matchers so they can be used with clang-query 
tool. Here is an example clang-query script:

let NotChdir 
callExpr(unless(callee(functionDecl(hasName("::chdir"))))).bind("bad_action")

match
functionDecl(
   hasPath(hasSequence(
postStmt(hasStatement(callExpr(callee(functionDecl(hasName("::chroot")))).bind("chroot"))),
             unless(
stmtPoint(hasStatement(callExpr(callee(functionDecl(hasName("::chdir"))),
                                                 hasArgument(0, 
hasValue(stringRegion(refersString("/")))))))),
             explodedNode(anyOf(postStmt(hasStatement(NotChdir)),
callEnter(hasCallExpr(NotChdir)))).bind("bug_node"))))

whose output on chroot.c is:

$ bin/clang-query llvm/tools/clang/test/Analysis/chroot.c -f=debug-cmd.query
Match #1:
llvm/tools/clang/test/Analysis/chroot.c:12:3: note: "bad_action" binds here
   foo(); // expected-warning {{No call of chdir("/") immediately after 
chroot}}
   ^~~~~
llvm/tools/clang/test/Analysis/chroot.c:11:3: note: "chroot" binds here
chroot("/usr/local"); // root changed.
   ^~~~~~~~~~~~~~~~~~~~
llvm/tools/clang/test/Analysis/chroot.c:10:1: note: "root" binds here
void f1(void) {
^~~~~~~~~~~~~~~
Match #2:
llvm/tools/clang/test/Analysis/chroot.c:24:3: note: "bad_action" binds here
   foo(); // expected-warning {{No call of chdir("/") immediately after 
chroot}}
   ^~~~~
llvm/tools/clang/test/Analysis/chroot.c:22:3: note: "chroot" binds here
chroot("/usr/local"); // root changed.
   ^~~~~~~~~~~~~~~~~~~~
llvm/tools/clang/test/Analysis/chroot.c:21:1: note: "root" binds here
void f3(void) {
^~~~~~~~~~~~~~~
2 matches.

This can allow us to prototype checkers without re-compiling them after 
every change.

The code can be found on my Github:
clang: https://github.com/a-sid/clang/tree/a4-dynamic-path-matchers
clang-tools-extra: 
https://github.com/a-sid/clang-tools-extra/tree/a4-support

If there is still a need this feature, I'll start the discussion on its 
design because it requires some intrusive changes into AST Matchers code.

17.05.2018 2:37, Alexey Sidorin via cfe-dev пишет:
> Hello everyone,
>
> I'd like to share some results of my investigation targeted on 
> improvement of Clang Static Analyzer checker API. You can find some 
> previous conversation on this topic here: 
> http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html. 
> In my investigation, I tried to solve a particular problem of building 
> a checker without generating new nodes.
>
> --------- Introduction and design goals ---------
>
> In brief, I tried to use matchers-like API to make CSA checkers look 
> like this:
>
> StatementMatcher NotChdir =
>     callExpr(unless(callee(functionDecl(hasName("::chdir")))));
> Finder.addMatcher(
>     hasSequence(
>         postStmt(hasStatement(
>             callExpr(callee(functionDecl(hasName("::chroot")))))),
>         unless(stmtPoint(hasStatement(callExpr(
>             callee(functionDecl(hasName("::chdir"))),
>             hasArgument(0, 
> hasValue(stringRegion(refersString("/")))))))),
>         explodedNode(anyOf(postStmt(hasStatement(NotChdir)),
>                             callEnter(hasCallExpr(NotChdir))))
>             .bind("bug_node")),
>     &Callback);
> Finder.match(G);
>
> and I have managed to make some simple working examples.
>
> The entire diff can be found here: 
> https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f
> The code itself: https://github.com/a-sid/clang/tree/a4
>
> There are several reasons why I have tried this approach.
>
> 1. AST Matchers are already extensively used API for AST checking. 
> They are available both in Clang-Tidy and CSA. And I wanted to use 
> existing functionality as much as possible. So, I decided to extend an 
> existing API to make its usage seamless across different kinds of 
> checks: path-sensitive, AST-based and CFG-based.
>
> 2. AST matchers effectively help clients to avoid a lot of checking of 
> dyn_cast results. This feature not only makes them more convenient but 
> also more safe because, in my experience, forgetting a nullptr/None 
> check is a pretty common mistake for checker writers.
>
> 3. Testing of AST matchers don't require writing C++ code - it can be 
> done interactively with clang-query tool. And I believe that we need 
> similar functionality for CSA as well.
>
> I didn't want to replace existing checker API. Instead, I tried to 
> make new possibilities usable independently or together with existing.
>
> --------- Design and implementation ---------
>
> The implementation consisted of a number of steps.
>
> 1. Allow matching nodes of path-sensitive graph like usual AST nodes. 
> To make this possible, three actions were needed:
>  1.1. ASTTypeTraits and DynTypedNode were extended to support 
> path-sensitive nodes: ExplodedNode, ProgramState, SVal, SymExpr, etc. 
> The implementation with graph node support is moved into a separate 
> class (ASTGraphTypeTraits) in ento namespace to resolve cyclic 
> dependencies (they are still exist, unfortunately, but are 
> header-only, so we can build the PoC).
>  1.2. Some additions to AST matchers were made to add support for new 
> kinds of nodes.
>  1.3. To make MatchFinder able to query specific options not supported 
> by pure AST, it was augmented with "Contexts". A matcher that needs to 
> query the path-sensitive engine asks the Finder for the required 
> Context which provides specific helper functions.
>
> As the result of this step, we are able to write matchers like 
> expr(hasArgument(0, hasValue(stringRegion(refersString("/"))))).
>
> 2. Create an engine for graph exploration and matching.
>   Unlike normal AST matchers, hasSequence is a path-sensitive matcher. 
> It is launched against ExplodedGraph. These matchers are handled by 
> GraphMatchFinder object. While searching a graph, it collects matches. 
> Each match contains a pointer to the corresponding matcher and State 
> ID of this match. The way how matches are translated from one state ID 
> to another is determined by matcher operators.
>
>   For example, SequenceVariadicOperator, which is the base of 
> hasSequence() matcher, has "positive" and "negative" sub-matches. Each 
> positive matcher has its corresponding State ID. In order to advance 
> to the next State ID, a node being matched should match all "negative" 
> matchers before the next "positive" matchers and the next "positive" 
> matcher itself. Failure in matching "negative" matcher discards the 
> match.
>
>   The role of GraphMatchFinder is similar to MatchFinder: it is only 
> responsible for graph exploration and keeping bound nodes and matchers.
>
> 3. Manage bound nodes.
>   While matching a single graph node, BoundNodes from AST MatchFinder 
> are used. AST matchers for path-sensitive nodes support bindings 
> out-of-the-box. However, the situation with graph matching is a bit 
> different. For graph matching, we have another system of bound nodes. 
> Each graph node has a related map of bounds aka GDMTy (yes, the name 
> is not a coincidence). GDMTy is a mapping from match ID to 
> BoundNodesMap which, in part, is effectively a map from std::string to 
> DynTypedNodes. This look pretty much like how GDM is organized in 
> ExplodedGraph, just without one level of indirection (it can be added, 
> though).
>
>   MatchFinder contexts should allow support of their own bindings. 
> This is how equalsBoundNode() matcher works for path-sensitive nodes: 
> it just queries all available contexts for the binding.
>
> Finally, I have managed to make two checkers work in this way: 
> ChrootChecker (which is present in the introduction) and 
> TestAfterDivZeroChecker. Both them can be found in ChrootCheckerV2.cpp 
> and TestAfterDivZeroCheckerV2.cpp correspondingly. They act like 
> normal checkers: produce warnings and use visitors. The main 
> difference is that they cannot generate sinks and use checkEndAnalysis 
> callback. The code of these checkers can be found here:
>
> https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp 
>
> https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp 
>
>
>
> -------- Some features --------
>
> 1.The design of bindings has an interesting consequence: it enables 
> the dynamic introspection of GDM which was pretty hard before. (Hello 
> alpha renaming?)
> 2. Nothing prevents matchers to be used with existing checker API for 
> simplifying conditional checking inside callbacks. The matchers are 
> not planned as the replacement for the current API, but look like a 
> nice complementary part.
> 3. Implicit conversion of Matcher<ProgramPoint> to 
> Matcher<ExplodedNode> and Matcher<SymExpr || MemRegion> to 
> Matcher<SVal> for writing shorter code.
>
> -------- Not implemented/not checked yet --------
> I tried to keep the PoC as minimal as possible. As the result, some 
> features are not implemented yet, but I believe they can be added.
> 1. Usage of matchers inside checker callbacks
>   This is not exactly unimplemented, but is still untested.
> 2. Dynamic matchers (clang-query interface)
>   In addition to work for supporting dynamic nodes (I don't know how 
> many efforts it can take), we need to enable matching against AST 
> nodes, not graph. But it doesn't look as a problem, we can just make 
> matchers polymorphic.
> 3. Binding to successfully matched paths is not implemented yet - only 
> bindings to separate nodes. I wonder if we need path bindings at all.
> 4. Some corner cases are still FIXMEs: single-node sequences, for 
> example.
> 5. GDM is not based on immutable data structures like it is done in 
> CSA - it is just an STL map. Its performance can be poor (full copy on 
> every new node), but I don't think that changing it to use immutable 
> structures is hard.
> 6. Matching on-the-fly
>   GraphMatchFinder should support on-the-fly matching during 
> ExplodedGraph building. For this support, we should just call its 
> advance() method each time a new node is created. However, this 
> possibility wasn't checked yet.
> 7. Matching CFG and CallGraph isn't implemented because it was 
> considered to be far out of simple PoC.
> 8. Only sequential matching is available now, and I didn't try to 
> implement other operators yet. So, implementing a checker similar to 
> PthreadLock can be tricky or even impossible for now.
>
> -------- Known and potential issues --------
> From matchers' side:
> 1. Some tests don't pass because they rely on the checker ability to 
> generate sink nodes. Our matchers cannot do it by design so tests 
> don't pass.
> 2. Representing checker bindings as a map can be less effective than 
> storing data in structures. I didn't measure the overhead, so I cannot 
> give any numbers.
> 3. Matchers are called on every node independently of its type. This 
> is not what CheckerManager does. I wonder if this detail can affect 
> performance as well.
> 4. Problems with cyclic dependencies. clangStaticAnalyzer has a 
> dependency on clangASTMatchers, therefore, clangASTMatchers cannot 
> depend on clangStaticAnalyzer. However, if we want ASTMatchers to 
> support static analyzer data structures, there should be a backward 
> dependency. Now this dependency is header-only.
> 5. Code duplication. This is mostly fine for a sandbox but needs to be 
> reduced.
>
> From analyzer's side:
> 1. Many events are not reflected in the final ExplodedGraph. For 
> example, checkers can receive PointerEscape event, but the event 
> itself will not be recorded in the ExplodedGraph - only changes made 
> by checkers will. That's also true for Stmt nodes: I noticed same 
> issues with PostCondition. This makes matching a bit harder. Should we 
> add some information into ExplodedGraph?
> 2. (Minor) Some static analyzer data structures don't support 
> traditional LLVM casting methods (dyn_cast, isa) because they lack 
> classof() method. I have fixed this internally and will put a patch on 
> review soon.
>
> -------- Conclusion --------
> Finally, there is a graph matching engine supporting basic 
> functionality and two checkers using it. I apologize for not starting 
> the discussion earlier, but I just wasn't sure that the approach will 
> work. Is anybody interested to have this stuff in upstream (maybe, 
> changed or improved)? If yes, I'll be happy to contribute this work 
> patch-by-patch and continue its improvement. If no - well, I had 
> enough fun playing with it.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190511/159bbbec/attachment.html>