<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Necro-updating the status since some

      demand for this or similar CSA feature was discovered on EuroLLVM

      BoF. Unfortunately, I had few time to develop path-sensitive

      matchers last year so the change list is not very large.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">1. On-the-fly matching while building

      ExplodedGraph. Both offline and online matching is available. The

      way how a checker handles graphs depends on how it was registered

      in CheckerManager: there is a separate method

      registerPathMatcher() that adds matcher to the set of online

      matchers.<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">2. Path-sensitive matchers gained the

      ability to generate new nodes and sinks in the callbacks. There

      are two limitations, however.</div>

    <div class="moz-cite-prefix">a) AST Matchers don't allow calling

      callbacks until matching is complete, so I haven't implemented

      this ability for path-sensitive matchers too.</div>

    <div class="moz-cite-prefix">b) Only on-the-fly matchers are allowed

      to add nodes (because the ability to add nodes to a complete graph

      seems useless).<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">3. The primary advancement for this PoC

      is that path-sensitive matchers now can work as dynamic matchers

      so they can be used with clang-query tool. Here is an example

      clang-query script:</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">let NotChdir

callExpr(unless(callee(functionDecl(hasName("::chdir"))))).bind("bad_action")<br>

      <br>

      match<br>

      functionDecl(<br>

        hasPath(hasSequence(<br>

postStmt(hasStatement(callExpr(callee(functionDecl(hasName("::chroot")))).bind("chroot"))),<br>

                  unless(<br>

stmtPoint(hasStatement(callExpr(callee(functionDecl(hasName("::chdir"))),<br>

                                                      hasArgument(0,

      hasValue(stringRegion(refersString("/")))))))),<br>

                  explodedNode(anyOf(postStmt(hasStatement(NotChdir)),<br>

      callEnter(hasCallExpr(NotChdir)))).bind("bug_node"))))<br>

      <br>

      whose output on chroot.c is:</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">

      <div class="de1">$ bin<span class="sy0">/</span>clang-query <span

          class="sy0"></span>llvm<span class="sy0">/</span>tools<span

          class="sy0">/</span>clang<span class="sy0">/</span>test<span

          class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c

        <span class="re5">-f</span>=debug-cmd.query</div>

      <div class="de1"> </div>

      <div class="de1">Match <span class="co0">#1:</span></div>

      <div class="de1"> </div>

      <div class="de2">llvm<span class="sy0">/</span>tools<span

          class="sy0">/</span>clang<span class="sy0">/</span>test<span

          class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span

          class="nu0">12</span>:<span class="nu0">3</span>: note: <span

          class="st0">"bad_action"</span> binds here</div>

      <div class="de1">  foo<span class="br0">(</span><span class="br0">)</span>;

        <span class="sy0">//</span> expected-warning <span class="br0">{</span><span

          class="br0">{</span>No call of chdir<span class="br0">(</span><span

          class="st0">"/"</span><span class="br0">)</span> immediately

        after <span class="kw2">chroot</span><span class="br0">}</span><span

          class="br0">}</span></div>

      <div class="de1">  ^~~~~</div>

      <div class="de1">llvm<span class="sy0">/</span>tools<span

          class="sy0">/</span>clang<span class="sy0">/</span>test<span

          class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span

          class="nu0">11</span>:<span class="nu0">3</span>: note: <span

          class="st0">"chroot"</span> binds here</div>

      <div class="de1">  <span class="kw2">chroot</span><span

          class="br0">(</span><span class="st0">"/usr/local"</span><span

          class="br0">)</span>; <span class="sy0">//</span> root

        changed.</div>

      <div class="de2">  ^~~~~~~~~~~~~~~~~~~~</div>

      <div class="de1">llvm<span class="sy0">/</span>tools<span

          class="sy0">/</span>clang<span class="sy0">/</span>test<span

          class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span

          class="nu0">10</span>:<span class="nu0">1</span>: note: <span

          class="st0">"root"</span> binds here</div>

      <div class="de1">void f1<span class="br0">(</span>void<span

          class="br0">)</span> <span class="br0">{</span></div>

      <div class="de1">^~~~~~~~~~~~~~~</div>

      <div class="de1"> </div>

      <div class="de2">Match <span class="co0">#2:</span></div>

      <div class="de1"> </div>

      <div class="de1"><span class="sy0"></span>llvm<span class="sy0">/</span>tools<span

          class="sy0">/</span>clang<span class="sy0">/</span>test<span

          class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span

          class="nu0">24</span>:<span class="nu0">3</span>: note: <span

          class="st0">"bad_action"</span> binds here</div>

      <div class="de1">  foo<span class="br0">(</span><span class="br0">)</span>;

        <span class="sy0">//</span> expected-warning <span class="br0">{</span><span

          class="br0">{</span>No call of chdir<span class="br0">(</span><span

          class="st0">"/"</span><span class="br0">)</span> immediately

        after <span class="kw2">chroot</span><span class="br0">}</span><span

          class="br0">}</span></div>

      <div class="de1">  ^~~~~</div>

      <div class="de2">llvm<span class="sy0">/</span>tools<span

          class="sy0">/</span>clang<span class="sy0">/</span>test<span

          class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span

          class="nu0">22</span>:<span class="nu0">3</span>: note: <span

          class="st0">"chroot"</span> binds here</div>

      <div class="de1">  <span class="kw2">chroot</span><span

          class="br0">(</span><span class="st0">"/usr/local"</span><span

          class="br0">)</span>; <span class="sy0">//</span> root

        changed.</div>

      <div class="de1">  ^~~~~~~~~~~~~~~~~~~~</div>

      <div class="de1">llvm<span class="sy0">/</span>tools<span

          class="sy0">/</span>clang<span class="sy0">/</span>test<span

          class="sy0">/</span>Analysis<span class="sy0">/</span>chroot.c:<span

          class="nu0">21</span>:<span class="nu0">1</span>: note: <span

          class="st0">"root"</span> binds here</div>

      <div class="de1">void f3<span class="br0">(</span>void<span

          class="br0">)</span> <span class="br0">{</span></div>

      <div class="de2">^~~~~~~~~~~~~~~</div>

      <div class="de1"><span class="nu0">2</span> matches.</div>

      <div class="de1"><br>

      </div>

      This can allow us to prototype checkers without re-compiling them

      after every change. <br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">The code can be found on my Github:</div>

    <div class="moz-cite-prefix">clang: <a

        class="moz-txt-link-freetext"

        href="https://github.com/a-sid/clang/tree/a4-dynamic-path-matchers">https://github.com/a-sid/clang/tree/a4-dynamic-path-matchers</a><br>

    </div>

    <div class="moz-cite-prefix">clang-tools-extra: <a

        class="moz-txt-link-freetext"

        href="https://github.com/a-sid/clang-tools-extra/tree/a4-support">https://github.com/a-sid/clang-tools-extra/tree/a4-support</a></div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">If there is still a need this feature,

      I'll start the discussion on its design because it requires some

      intrusive changes into AST Matchers code.</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">17.05.2018 2:37, Alexey Sidorin via

      cfe-dev пишет:<br>

    </div>

    <blockquote type="cite"

      cite="mid:64673e0b-3e74-15fd-ae2f-0fb176476eaf@ya.ru">Hello

      everyone, <br>

      <br>

      I'd like to share some results of my investigation targeted on

      improvement of Clang Static Analyzer checker API. You can find

      some previous conversation on this topic here:

      <a class="moz-txt-link-freetext"

href="http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html">http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html</a>.

      In my investigation, I tried to solve a particular problem of

      building a checker without generating new nodes. <br>

      <br>

      --------- Introduction and design goals --------- <br>

      <br>

      In brief, I tried to use matchers-like API to make CSA checkers

      look like this: <br>

      <br>

      StatementMatcher NotChdir = <br>

          callExpr(unless(callee(functionDecl(hasName("::chdir"))))); <br>

      Finder.addMatcher( <br>

          hasSequence( <br>

              postStmt(hasStatement( <br>

                  callExpr(callee(functionDecl(hasName("::chroot")))))),

      <br>

              unless(stmtPoint(hasStatement(callExpr( <br>

                  callee(functionDecl(hasName("::chdir"))), <br>

                  hasArgument(0,

      hasValue(stringRegion(refersString("/")))))))), <br>

              explodedNode(anyOf(postStmt(hasStatement(NotChdir)), <br>

                                  callEnter(hasCallExpr(NotChdir)))) <br>

                  .bind("bug_node")), <br>

          &Callback); <br>

      Finder.match(G); <br>

      <br>

      and I have managed to make some simple working examples. <br>

      <br>

      The entire diff can be found here:

      <a class="moz-txt-link-freetext"

href="https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f">https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f</a><br>

      The code itself: <a class="moz-txt-link-freetext"

        href="https://github.com/a-sid/clang/tree/a4">https://github.com/a-sid/clang/tree/a4</a>

      <br>

      <br>

      There are several reasons why I have tried this approach. <br>

      <br>

      1. AST Matchers are already extensively used API for AST checking.

      They are available both in Clang-Tidy and CSA. And I wanted to use

      existing functionality as much as possible. So, I decided to

      extend an existing API to make its usage seamless across different

      kinds of checks: path-sensitive, AST-based and CFG-based. <br>

      <br>

      2. AST matchers effectively help clients to avoid a lot of

      checking of dyn_cast results. This feature not only makes them

      more convenient but also more safe because, in my experience,

      forgetting a nullptr/None check is a pretty common mistake for

      checker writers. <br>

      <br>

      3. Testing of AST matchers don't require writing C++ code - it can

      be done interactively with clang-query tool. And I believe that we

      need similar functionality for CSA as well. <br>

      <br>

      I didn't want to replace existing checker API. Instead, I tried to

      make new possibilities usable independently or together with

      existing. <br>

      <br>

      --------- Design and implementation --------- <br>

      <br>

      The implementation consisted of a number of steps. <br>

      <br>

      1. Allow matching nodes of path-sensitive graph like usual AST

      nodes. To make this possible, three actions were needed: <br>

       1.1. ASTTypeTraits and DynTypedNode were extended to support

      path-sensitive nodes: ExplodedNode, ProgramState, SVal, SymExpr,

      etc. The implementation with graph node support is moved into a

      separate class (ASTGraphTypeTraits) in ento namespace to resolve

      cyclic dependencies (they are still exist, unfortunately, but are

      header-only, so we can build the PoC). <br>

       1.2. Some additions to AST matchers were made to add support for

      new kinds of nodes. <br>

       1.3. To make MatchFinder able to query specific options not

      supported by pure AST, it was augmented with "Contexts". A matcher

      that needs to query the path-sensitive engine asks the Finder for

      the required Context which provides specific helper functions. <br>

      <br>

      As the result of this step, we are able to write matchers like

      expr(hasArgument(0, hasValue(stringRegion(refersString("/"))))). <br>

      <br>

      2. Create an engine for graph exploration and matching. <br>

        Unlike normal AST matchers, hasSequence is a path-sensitive

      matcher. It is launched against ExplodedGraph. These matchers are

      handled by GraphMatchFinder object. While searching a graph, it

      collects matches. Each match contains a pointer to the

      corresponding matcher and State ID of this match. The way how

      matches are translated from one state ID to another is determined

      by matcher operators. <br>

      <br>

        For example, SequenceVariadicOperator, which is the base of

      hasSequence() matcher, has "positive" and "negative" sub-matches.

      Each positive matcher has its corresponding State ID. In order to

      advance to the next State ID, a node being matched should match

      all "negative" matchers before the next "positive" matchers and

      the next "positive" matcher itself. Failure in matching "negative"

      matcher discards the match. <br>

      <br>

        The role of GraphMatchFinder is similar to MatchFinder: it is

      only responsible for graph exploration and keeping bound nodes and

      matchers. <br>

      <br>

      3. Manage bound nodes. <br>

        While matching a single graph node, BoundNodes from AST

      MatchFinder are used. AST matchers for path-sensitive nodes

      support bindings out-of-the-box. However, the situation with graph

      matching is a bit different. For graph matching, we have another

      system of bound nodes. Each graph node has a related map of bounds

      aka GDMTy (yes, the name is not a coincidence). GDMTy is a mapping

      from match ID to BoundNodesMap which, in part, is effectively a

      map from std::string to DynTypedNodes. This look pretty much like

      how GDM is organized in ExplodedGraph, just without one level of

      indirection (it can be added, though). <br>

      <br>

        MatchFinder contexts should allow support of their own bindings.

      This is how equalsBoundNode() matcher works for path-sensitive

      nodes: it just queries all available contexts for the binding. <br>

      <br>

      Finally, I have managed to make two checkers work in this way:

      ChrootChecker (which is present in the introduction) and

      TestAfterDivZeroChecker. Both them can be found in

      ChrootCheckerV2.cpp and TestAfterDivZeroCheckerV2.cpp

      correspondingly. They act like normal checkers: produce warnings

      and use visitors. The main difference is that they cannot generate

      sinks and use checkEndAnalysis callback. The code of these

      checkers can be found here: <br>

      <br>

      <a class="moz-txt-link-freetext"

href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp</a>

      <br>

      <a class="moz-txt-link-freetext"

href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp</a>

      <br>

      <br>

      <br>

      -------- Some features -------- <br>

      <br>

      1.The design of bindings has an interesting consequence: it

      enables the dynamic introspection of GDM which was pretty hard

      before. (Hello alpha renaming?) <br>

      2. Nothing prevents matchers to be used with existing checker API

      for simplifying conditional checking inside callbacks. The

      matchers are not planned as the replacement for the current API,

      but look like a nice complementary part. <br>

      3. Implicit conversion of Matcher<ProgramPoint> to

      Matcher<ExplodedNode> and Matcher<SymExpr ||

      MemRegion> to Matcher<SVal> for writing shorter code. <br>

      <br>

      -------- Not implemented/not checked yet -------- <br>

      I tried to keep the PoC as minimal as possible. As the result,

      some features are not implemented yet, but I believe they can be

      added. <br>

      1. Usage of matchers inside checker callbacks <br>

        This is not exactly unimplemented, but is still untested. <br>

      2. Dynamic matchers (clang-query interface) <br>

        In addition to work for supporting dynamic nodes (I don't know

      how many efforts it can take), we need to enable matching against

      AST nodes, not graph. But it doesn't look as a problem, we can

      just make matchers polymorphic. <br>

      3. Binding to successfully matched paths is not implemented yet -

      only bindings to separate nodes. I wonder if we need path bindings

      at all. <br>

      4. Some corner cases are still FIXMEs: single-node sequences, for

      example. <br>

      5. GDM is not based on immutable data structures like it is done

      in CSA - it is just an STL map. Its performance can be poor (full

      copy on every new node), but I don't think that changing it to use

      immutable structures is hard. <br>

      6. Matching on-the-fly <br>

        GraphMatchFinder should support on-the-fly matching during

      ExplodedGraph building. For this support, we should just call its

      advance() method each time a new node is created. However, this

      possibility wasn't checked yet. <br>

      7. Matching CFG and CallGraph isn't implemented because it was

      considered to be far out of simple PoC. <br>

      8. Only sequential matching is available now, and I didn't try to

      implement other operators yet. So, implementing a checker similar

      to PthreadLock can be tricky or even impossible for now. <br>

      <br>

      -------- Known and potential issues -------- <br>

      From matchers' side: <br>

      1. Some tests don't pass because they rely on the checker ability

      to generate sink nodes. Our matchers cannot do it by design so

      tests don't pass. <br>

      2. Representing checker bindings as a map can be less effective

      than storing data in structures. I didn't measure the overhead, so

      I cannot give any numbers. <br>

      3. Matchers are called on every node independently of its type.

      This is not what CheckerManager does. I wonder if this detail can

      affect performance as well. <br>

      4. Problems with cyclic dependencies. clangStaticAnalyzer has a

      dependency on clangASTMatchers, therefore, clangASTMatchers cannot

      depend on clangStaticAnalyzer. However, if we want ASTMatchers to

      support static analyzer data structures, there should be a

      backward dependency. Now this dependency is header-only. <br>

      5. Code duplication. This is mostly fine for a sandbox but needs

      to be reduced. <br>

      <br>

      From analyzer's side: <br>

      1. Many events are not reflected in the final ExplodedGraph. For

      example, checkers can receive PointerEscape event, but the event

      itself will not be recorded in the ExplodedGraph - only changes

      made by checkers will. That's also true for Stmt nodes: I noticed

      same issues with PostCondition. This makes matching a bit harder.

      Should we add some information into ExplodedGraph? <br>

      2. (Minor) Some static analyzer data structures don't support

      traditional LLVM casting methods (dyn_cast, isa) because they lack

      classof() method. I have fixed this internally and will put a

      patch on review soon. <br>

      <br>

      -------- Conclusion -------- <br>

      Finally, there is a graph matching engine supporting basic

      functionality and two checkers using it. I apologize for not

      starting the discussion earlier, but I just wasn't sure that the

      approach will work. Is anybody interested to have this stuff in

      upstream (maybe, changed or improved)? If yes, I'll be happy to

      contribute this work patch-by-patch and continue its improvement.

      If no - well, I had enough fun playing with it. <br>

      <br>

      _______________________________________________ <br>

      cfe-dev mailing list <br>

      <a class="moz-txt-link-abbreviated"

        href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>

      <br>

      <a class="moz-txt-link-freetext"

        href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>

      <br>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>