[cfe-dev] [analyzer] Proof-of-concept for Matcher-like checker API (graph matchers)

Kristóf Umann via cfe-dev cfe-dev at lists.llvm.org
Sat May 11 13:02:37 PDT 2019


+ a couple folks I remember expressing interest about this subject :)

On Sat, 11 May 2019, 21:59 Alexey Sidorin via cfe-dev, <
cfe-dev at lists.llvm.org> wrote:

> Necro-updating the status since some demand for this or similar CSA
> feature was discovered on EuroLLVM BoF. Unfortunately, I had few time to
> develop path-sensitive matchers last year so the change list is not very
> large.
>
> 1. On-the-fly matching while building ExplodedGraph. Both offline and
> online matching is available. The way how a checker handles graphs depends
> on how it was registered in CheckerManager: there is a separate method
> registerPathMatcher() that adds matcher to the set of online matchers.
>
> 2. Path-sensitive matchers gained the ability to generate new nodes and
> sinks in the callbacks. There are two limitations, however.
> a) AST Matchers don't allow calling callbacks until matching is complete,
> so I haven't implemented this ability for path-sensitive matchers too.
> b) Only on-the-fly matchers are allowed to add nodes (because the ability
> to add nodes to a complete graph seems useless).
>
> 3. The primary advancement for this PoC is that path-sensitive matchers
> now can work as dynamic matchers so they can be used with clang-query tool.
> Here is an example clang-query script:
>
> let NotChdir
> callExpr(unless(callee(functionDecl(hasName("::chdir"))))).bind("bad_action")
>
> match
> functionDecl(
>   hasPath(hasSequence(
>
> postStmt(hasStatement(callExpr(callee(functionDecl(hasName("::chroot")))).bind("chroot"))),
>             unless(
>
> stmtPoint(hasStatement(callExpr(callee(functionDecl(hasName("::chdir"))),
>                                                 hasArgument(0,
> hasValue(stringRegion(refersString("/")))))))),
>             explodedNode(anyOf(postStmt(hasStatement(NotChdir)),
>
> callEnter(hasCallExpr(NotChdir)))).bind("bug_node"))))
>
> whose output on chroot.c is:
>
> $ bin/clang-query llvm/tools/clang/test/Analysis/chroot.c -f
> =debug-cmd.query
>
> Match #1:
>
> llvm/tools/clang/test/Analysis/chroot.c:12:3: note: "bad_action" binds
> here
>   foo(); // expected-warning {{No call of chdir("/") immediately after
> chroot}}
>   ^~~~~
> llvm/tools/clang/test/Analysis/chroot.c:11:3: note: "chroot" binds here
>   chroot("/usr/local"); // root changed.
>   ^~~~~~~~~~~~~~~~~~~~
> llvm/tools/clang/test/Analysis/chroot.c:10:1: note: "root" binds here
> void f1(void) {
> ^~~~~~~~~~~~~~~
>
> Match #2:
>
> llvm/tools/clang/test/Analysis/chroot.c:24:3: note: "bad_action" binds
> here
>   foo(); // expected-warning {{No call of chdir("/") immediately after
> chroot}}
>   ^~~~~
> llvm/tools/clang/test/Analysis/chroot.c:22:3: note: "chroot" binds here
>   chroot("/usr/local"); // root changed.
>   ^~~~~~~~~~~~~~~~~~~~
> llvm/tools/clang/test/Analysis/chroot.c:21:1: note: "root" binds here
> void f3(void) {
> ^~~~~~~~~~~~~~~
> 2 matches.
>
> This can allow us to prototype checkers without re-compiling them after
> every change.
>
> The code can be found on my Github:
> clang: https://github.com/a-sid/clang/tree/a4-dynamic-path-matchers
> clang-tools-extra:
> https://github.com/a-sid/clang-tools-extra/tree/a4-support
>
> If there is still a need this feature, I'll start the discussion on its
> design because it requires some intrusive changes into AST Matchers code.
>
>
>
> 17.05.2018 2:37, Alexey Sidorin via cfe-dev пишет:
>
> Hello everyone,
>
> I'd like to share some results of my investigation targeted on improvement
> of Clang Static Analyzer checker API. You can find some previous
> conversation on this topic here:
> http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html.
> In my investigation, I tried to solve a particular problem of building a
> checker without generating new nodes.
>
> --------- Introduction and design goals ---------
>
> In brief, I tried to use matchers-like API to make CSA checkers look like
> this:
>
> StatementMatcher NotChdir =
>     callExpr(unless(callee(functionDecl(hasName("::chdir")))));
> Finder.addMatcher(
>     hasSequence(
>         postStmt(hasStatement(
>             callExpr(callee(functionDecl(hasName("::chroot")))))),
>         unless(stmtPoint(hasStatement(callExpr(
>             callee(functionDecl(hasName("::chdir"))),
>             hasArgument(0, hasValue(stringRegion(refersString("/")))))))),
>         explodedNode(anyOf(postStmt(hasStatement(NotChdir)),
>                             callEnter(hasCallExpr(NotChdir))))
>             .bind("bug_node")),
>     &Callback);
> Finder.match(G);
>
> and I have managed to make some simple working examples.
>
> The entire diff can be found here:
> https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f
> The code itself: https://github.com/a-sid/clang/tree/a4
>
> There are several reasons why I have tried this approach.
>
> 1. AST Matchers are already extensively used API for AST checking. They
> are available both in Clang-Tidy and CSA. And I wanted to use existing
> functionality as much as possible. So, I decided to extend an existing API
> to make its usage seamless across different kinds of checks:
> path-sensitive, AST-based and CFG-based.
>
> 2. AST matchers effectively help clients to avoid a lot of checking of
> dyn_cast results. This feature not only makes them more convenient but also
> more safe because, in my experience, forgetting a nullptr/None check is a
> pretty common mistake for checker writers.
>
> 3. Testing of AST matchers don't require writing C++ code - it can be done
> interactively with clang-query tool. And I believe that we need similar
> functionality for CSA as well.
>
> I didn't want to replace existing checker API. Instead, I tried to make
> new possibilities usable independently or together with existing.
>
> --------- Design and implementation ---------
>
> The implementation consisted of a number of steps.
>
> 1. Allow matching nodes of path-sensitive graph like usual AST nodes. To
> make this possible, three actions were needed:
>  1.1. ASTTypeTraits and DynTypedNode were extended to support
> path-sensitive nodes: ExplodedNode, ProgramState, SVal, SymExpr, etc. The
> implementation with graph node support is moved into a separate class
> (ASTGraphTypeTraits) in ento namespace to resolve cyclic dependencies (they
> are still exist, unfortunately, but are header-only, so we can build the
> PoC).
>  1.2. Some additions to AST matchers were made to add support for new
> kinds of nodes.
>  1.3. To make MatchFinder able to query specific options not supported by
> pure AST, it was augmented with "Contexts". A matcher that needs to query
> the path-sensitive engine asks the Finder for the required Context which
> provides specific helper functions.
>
> As the result of this step, we are able to write matchers like
> expr(hasArgument(0, hasValue(stringRegion(refersString("/"))))).
>
> 2. Create an engine for graph exploration and matching.
>   Unlike normal AST matchers, hasSequence is a path-sensitive matcher. It
> is launched against ExplodedGraph. These matchers are handled by
> GraphMatchFinder object. While searching a graph, it collects matches. Each
> match contains a pointer to the corresponding matcher and State ID of this
> match. The way how matches are translated from one state ID to another is
> determined by matcher operators.
>
>   For example, SequenceVariadicOperator, which is the base of
> hasSequence() matcher, has "positive" and "negative" sub-matches. Each
> positive matcher has its corresponding State ID. In order to advance to the
> next State ID, a node being matched should match all "negative" matchers
> before the next "positive" matchers and the next "positive" matcher itself.
> Failure in matching "negative" matcher discards the match.
>
>   The role of GraphMatchFinder is similar to MatchFinder: it is only
> responsible for graph exploration and keeping bound nodes and matchers.
>
> 3. Manage bound nodes.
>   While matching a single graph node, BoundNodes from AST MatchFinder are
> used. AST matchers for path-sensitive nodes support bindings
> out-of-the-box. However, the situation with graph matching is a bit
> different. For graph matching, we have another system of bound nodes. Each
> graph node has a related map of bounds aka GDMTy (yes, the name is not a
> coincidence). GDMTy is a mapping from match ID to BoundNodesMap which, in
> part, is effectively a map from std::string to DynTypedNodes. This look
> pretty much like how GDM is organized in ExplodedGraph, just without one
> level of indirection (it can be added, though).
>
>   MatchFinder contexts should allow support of their own bindings. This is
> how equalsBoundNode() matcher works for path-sensitive nodes: it just
> queries all available contexts for the binding.
>
> Finally, I have managed to make two checkers work in this way:
> ChrootChecker (which is present in the introduction) and
> TestAfterDivZeroChecker. Both them can be found in ChrootCheckerV2.cpp and
> TestAfterDivZeroCheckerV2.cpp correspondingly. They act like normal
> checkers: produce warnings and use visitors. The main difference is that
> they cannot generate sinks and use checkEndAnalysis callback. The code of
> these checkers can be found here:
>
>
> https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp
>
> https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp
>
>
> -------- Some features --------
>
> 1.The design of bindings has an interesting consequence: it enables the
> dynamic introspection of GDM which was pretty hard before. (Hello alpha
> renaming?)
> 2. Nothing prevents matchers to be used with existing checker API for
> simplifying conditional checking inside callbacks. The matchers are not
> planned as the replacement for the current API, but look like a nice
> complementary part.
> 3. Implicit conversion of Matcher<ProgramPoint> to Matcher<ExplodedNode>
> and Matcher<SymExpr || MemRegion> to Matcher<SVal> for writing shorter
> code.
>
> -------- Not implemented/not checked yet --------
> I tried to keep the PoC as minimal as possible. As the result, some
> features are not implemented yet, but I believe they can be added.
> 1. Usage of matchers inside checker callbacks
>   This is not exactly unimplemented, but is still untested.
> 2. Dynamic matchers (clang-query interface)
>   In addition to work for supporting dynamic nodes (I don't know how many
> efforts it can take), we need to enable matching against AST nodes, not
> graph. But it doesn't look as a problem, we can just make matchers
> polymorphic.
> 3. Binding to successfully matched paths is not implemented yet - only
> bindings to separate nodes. I wonder if we need path bindings at all.
> 4. Some corner cases are still FIXMEs: single-node sequences, for example.
> 5. GDM is not based on immutable data structures like it is done in CSA -
> it is just an STL map. Its performance can be poor (full copy on every new
> node), but I don't think that changing it to use immutable structures is
> hard.
> 6. Matching on-the-fly
>   GraphMatchFinder should support on-the-fly matching during ExplodedGraph
> building. For this support, we should just call its advance() method each
> time a new node is created. However, this possibility wasn't checked yet.
> 7. Matching CFG and CallGraph isn't implemented because it was considered
> to be far out of simple PoC.
> 8. Only sequential matching is available now, and I didn't try to
> implement other operators yet. So, implementing a checker similar to
> PthreadLock can be tricky or even impossible for now.
>
> -------- Known and potential issues --------
> From matchers' side:
> 1. Some tests don't pass because they rely on the checker ability to
> generate sink nodes. Our matchers cannot do it by design so tests don't
> pass.
> 2. Representing checker bindings as a map can be less effective than
> storing data in structures. I didn't measure the overhead, so I cannot give
> any numbers.
> 3. Matchers are called on every node independently of its type. This is
> not what CheckerManager does. I wonder if this detail can affect
> performance as well.
> 4. Problems with cyclic dependencies. clangStaticAnalyzer has a dependency
> on clangASTMatchers, therefore, clangASTMatchers cannot depend on
> clangStaticAnalyzer. However, if we want ASTMatchers to support static
> analyzer data structures, there should be a backward dependency. Now this
> dependency is header-only.
> 5. Code duplication. This is mostly fine for a sandbox but needs to be
> reduced.
>
> From analyzer's side:
> 1. Many events are not reflected in the final ExplodedGraph. For example,
> checkers can receive PointerEscape event, but the event itself will not be
> recorded in the ExplodedGraph - only changes made by checkers will. That's
> also true for Stmt nodes: I noticed same issues with PostCondition. This
> makes matching a bit harder. Should we add some information into
> ExplodedGraph?
> 2. (Minor) Some static analyzer data structures don't support traditional
> LLVM casting methods (dyn_cast, isa) because they lack classof() method. I
> have fixed this internally and will put a patch on review soon.
>
> -------- Conclusion --------
> Finally, there is a graph matching engine supporting basic functionality
> and two checkers using it. I apologize for not starting the discussion
> earlier, but I just wasn't sure that the approach will work. Is anybody
> interested to have this stuff in upstream (maybe, changed or improved)? If
> yes, I'll be happy to contribute this work patch-by-patch and continue its
> improvement. If no - well, I had enough fun playing with it.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190511/753ec817/attachment.html>


More information about the cfe-dev mailing list