<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Hi Gabor,<br>

      <br>

      Thank you for the reply! You can find my answers inline.<br>

      <br>

      17.05.2018 15:28, Gábor Horváth пишет:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

      <div dir="ltr">

        <div>Hi Alexey,</div>

        <div><br>

        </div>

        <div>In general, I like the idea of having a more declarative

          way to define new</div>

        <div>checks.<br>

        </div>

        <div><br>

        </div>

        <div>

          <div class="gmail_quote">

            <div dir="ltr">On Thu, 17 May 2018 at 01:37, Alexey Sidorin

              <<a href="mailto:alexey.v.sidorin@ya.ru"

                moz-do-not-send="true">alexey.v.sidorin@ya.ru</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello

              everyone,<br>

              <br>

              I'd like to share some results of my investigation

              targeted on <br>

              improvement of Clang Static Analyzer checker API. You can

              find some <br>

              previous conversation on this topic here: <br>

              <a

href="http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html</a>.

              <br>

              In my investigation, I tried to solve a particular problem

              of building a <br>

              checker without generating new nodes.<br>

            </blockquote>

            <div><br>

            </div>

            <div>Why do you focus on such checks that does not have

              traits, does not generate new nodes. <br>

            </div>

            <div>Do you find this to be the majority of the checks you

              need to implement? <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    Allowing checkers to generate new nodes and sinks leads to coverage

    issues: the coverage changes if we disable or enable checkers. The

    ability to modify state (Environment and Store) also makes us think

    about how checkers will interact with each other.<br>

    Regarding traits - path matchers are not completely trait-less. They

    just use other GDM and can only store DynTypedNodes (extended by

    path-sensitive nodes). The bindings that matchers have added while

    matching separate graph nodes are propagated across ExplodedGraph.

    Actually, path bindings act very similar to checker traits, and they

    offer a subset of GDM abilities.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              --------- Introduction and design goals ---------<br>

              <br>

              In brief, I tried to use matchers-like API to make CSA

              checkers look <br>

              like this:<br>

              <br>

              StatementMatcher NotChdir =<br>

              callExpr(unless(callee(functionDecl(hasName("::chdir")))));<br>

              Finder.addMatcher(<br>

                   hasSequence(<br>

                       postStmt(hasStatement(<br>

              callExpr(callee(functionDecl(hasName("::chroot")))))),<br>

                       unless(stmtPoint(hasStatement(callExpr(<br>

                           callee(functionDecl(hasName("::chdir"))),<br>

                           hasArgument(0,

              hasValue(stringRegion(refersString("/")))))))),<br>

              explodedNode(anyOf(postStmt(hasStatement(NotChdir)),<br>

              callEnter(hasCallExpr(NotChdir))))<br>

                           .bind("bug_node")),<br>

                   &Callback);<br>

              Finder.match(G);<br>

            </blockquote>

            <div><br>

            </div>

            <div>I think, a common (design) pitfall of writing checks is

              to try to match against</div>

            <div>the AST when a symbolic execution callback should be

              used instead.</div>

            <div>I am a bit afraid having an API like this would make

              this pitfall more common.</div>

            <div>Maybe a separation between the path sensitive, flow

              sensitive and</div>

            <div>AST based checks is good for the checker writers and

              new users and</div>

            <div>I am not sure that the same abstraction fits all of

              these cases.</div>

            <div><br>

            </div>

            <div>In case of path sensitive checks I wonder if a state

              machine like abstraction would</div>

            <div>fit the use cases better (and also cover checks that

              are using traits)</div>

            <div>where the guarding conditions of the state transitions

              can include AST matcher like checks. <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    In my experience, nearly 2/3 path-sensitive checkers we wrote follow

    state machine abstraction. But it is possible to implement other

    matcher operator that will use another matching strategy

    (grammar-like, for example), and declare a path matcher based on it.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              and I have managed to make some simple working examples.<br>

              <br>

              The entire diff can be found here: <br>

              <a

href="https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f</a><br>

              The code itself: <a

                href="https://github.com/a-sid/clang/tree/a4"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/tree/a4</a><br>

              <br>

              There are several reasons why I have tried this approach.<br>

              <br>

              1. AST Matchers are already extensively used API for AST

              checking. They <br>

              are available both in Clang-Tidy and CSA. And I wanted to

              use existing <br>

              functionality as much as possible. So, I decided to extend

              an existing <br>

              API to make its usage seamless across different kinds of

              checks: <br>

              path-sensitive, AST-based and CFG-based.<br>

            </blockquote>

            <div><br>

            </div>

            <div>I am not sure if this is a good idea. I think the

              checker author supposed to</div>

            <div>be aware if the check is AST based, flow sensitive, or

              path sensitive. <br>

            </div>

            <div>And this should also be easy to tell by reading the

              code. I am also not</div>

            <div>sure whether there is an abstraction that fits all.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Yes, there is never ideal solution. But we can just use descriptive

    matcher names to notify users that they are going to use

    path-sensitive analysis. Even in this small PoC, path matchers are

    placed into a separate header and into a separate namespace (and are

    even part of clangStaticAnalyzer, not clangASTMatchers), so one

    should really understand what he does to make path matchers work.<br>

    Also, we can often meet some hand-written AST matching inside

    checker callbacks. We can just make it more consise.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>I think the reason why this idea works well for checks

              that only inspect the</div>

            <div>exploded graph, because in this case we are still doing

              pattern matching on</div>

            <div>graphs in a similar manner as doing pattern matching on

              the AST.</div>

            <div><br>

            </div>

            <div>But does this generalize later on to stateful checks? <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    I'm not sure I understood you correctly. If you mean that stateful

    checkers are checkers that need to modify state, then I think we

    need to put state modification out of checkers if possible to get

    reproducible coverage independently on checkers enabled or disabled.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              2. AST matchers effectively help clients to avoid a lot of

              checking of <br>

              dyn_cast results. This feature not only makes them more

              convenient but <br>

              also more safe because, in my experience, forgetting a

              nullptr/None <br>

              check is a pretty common mistake for checker writers.<br>

            </blockquote>

            <div><br>

            </div>

            <div>

              <div>I think if something cannot be null we should return

                references, otherwise</div>

              <div>the client need to check for the pointer beeing null

                (unless there are some</div>

              <div>dynamic invariant that would make the check

                redundant). If an API does</div>

              <div>not match this philosophy, maybe we should fix the

                API first. </div>

              <div><br>

              </div>

              <div>Regardless of fixing the API, I agree that it would

                be great to have higher</div>

              <div>level APIs for checker authors.</div>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              3. Testing of AST matchers don't require writing C++ code

              - it can be <br>

              done interactively with clang-query tool. And I believe

              that we need <br>

              similar functionality for CSA as well.<br>

            </blockquote>

            <div><br>

            </div>

            Having a query tool for exploded graphs could be very

            helpful, I agree.</div>

          <div class="gmail_quote">These graphs tend to be very large

            and sometimes it is not trivial to</div>

          <div class="gmail_quote">find the relevant parts during

            debugging.<br>

          </div>

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              I didn't want to replace existing checker API. Instead, I

              tried to make <br>

              new possibilities usable independently or together with

              existing.<br>

              <br>

              --------- Design and implementation ---------<br>

              <br>

              The implementation consisted of a number of steps.<br>

              <br>

              1. Allow matching nodes of path-sensitive graph like usual

              AST nodes. To <br>

              make this possible, three actions were needed:<br>

                1.1. ASTTypeTraits and DynTypedNode were extended to

              support <br>

              path-sensitive nodes: ExplodedNode, ProgramState, SVal,

              SymExpr, etc. <br>

            </blockquote>

            <div><br>

            </div>

            <div>I wonder, if in general it is a good idea to pattern

              match in checks on SymExpr.</div>

            <div>A more general approach here would be to use the

              constraint solver for queries</div>

            <div>in a similar manner how ProgramState::assume works.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    It is not only possible, but is already implemented - partially, of

    course. You can take a look at canBeZero() matcher in

    TestAfterDivZeroV2.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              The implementation with graph node support is moved into a

              separate <br>

              class (ASTGraphTypeTraits) in ento namespace to resolve

              cyclic <br>

              dependencies (they are still exist, unfortunately, but are

              header-only, <br>

              so we can build the PoC).<br>

                1.2. Some additions to AST matchers were made to add

              support for new <br>

              kinds of nodes.<br>

                1.3. To make MatchFinder able to query specific options

              not supported <br>

              by pure AST, it was augmented with "Contexts". A matcher

              that needs to <br>

              query the path-sensitive engine asks the Finder for the

              required Context <br>

              which provides specific helper functions.<br>

              <br>

              As the result of this step, we are able to write matchers

              like <br>

              expr(hasArgument(0,

              hasValue(stringRegion(refersString("/"))))).<br>

            </blockquote>

            <div><br>

            </div>

            <div>I think this is the part of the proposal that I like

              the most, it would be</div>

            <div>a very concise way to write down guarding conditions.<br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              2. Create an engine for graph exploration and matching.<br>

                 Unlike normal AST matchers, hasSequence is a

              path-sensitive matcher. <br>

              It is launched against ExplodedGraph. These matchers are

              handled by <br>

              GraphMatchFinder object. While searching a graph, it

              collects matches. <br>

              Each match contains a pointer to the corresponding matcher

              and State ID <br>

              of this match. The way how matches are translated from one

              state ID to <br>

              another is determined by matcher operators.<br>

            </blockquote>

            <div><br>

            </div>

            <div>Is this matching done after the exploded graph already

              built? If so,</div>

            <div>these checks will be unable to generate sink nodes. I

              think having sink</div>

            <div>nodes sometimes desirable even if the check itself does

              not have a trait.</div>

          </div>

        </div>

      </div>

    </blockquote>

    In the implemented checkers - yes, the matching is done on the

    ExplodedGraph already built. But nothing prevents GraphMatchFinder

    to be used as a part of CheckerManager or ExprEngine. It has a

    method called advance() that performs a single transition to the new

    node not processed before, so it can be fed with single nodes that

    engine creates during graph building. The matching on-the-fly is

    possible, just was not implemented.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

      <div dir="ltr">

        <div>

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

                 For example, SequenceVariadicOperator, which is the

              base of <br>

              hasSequence() matcher, has "positive" and "negative"

              sub-matches. Each <br>

              positive matcher has its corresponding State ID. In order

              to advance to <br>

              the next State ID, a node being matched should match all

              "negative" <br>

              matchers before the next "positive" matchers and the next

              "positive" <br>

              matcher itself. Failure in matching "negative" matcher

              discards the match.<br>

              <br>

                 The role of GraphMatchFinder is similar to MatchFinder:

              it is only <br>

              responsible for graph exploration and keeping bound nodes

              and matchers.<br>

              <br>

              3. Manage bound nodes.<br>

                 While matching a single graph node, BoundNodes from AST

              MatchFinder <br>

              are used. AST matchers for path-sensitive nodes support

              bindings <br>

              out-of-the-box. However, the situation with graph matching

              is a bit <br>

              different. For graph matching, we have another system of

              bound nodes. <br>

              Each graph node has a related map of bounds aka GDMTy

              (yes, the name is <br>

              not a coincidence). GDMTy is a mapping from match ID to

              BoundNodesMap <br>

              which, in part, is effectively a map from std::string to

              DynTypedNodes. <br>

              This look pretty much like how GDM is organized in

              ExplodedGraph, just <br>

              without one level of indirection (it can be added,

              though).<br>

              <br>

                 MatchFinder contexts should allow support of their own

              bindings. This <br>

              is how equalsBoundNode() matcher works for path-sensitive

              nodes: it just <br>

              queries all available contexts for the binding.<br>

              <br>

              Finally, I have managed to make two checkers work in this

              way: <br>

              ChrootChecker (which is present in the introduction) and <br>

              TestAfterDivZeroChecker. Both them can be found in

              ChrootCheckerV2.cpp <br>

              and TestAfterDivZeroCheckerV2.cpp correspondingly. They

              act like normal <br>

              checkers: produce warnings and use visitors. The main

              difference is that <br>

              they cannot generate sinks and use checkEndAnalysis

              callback. The code <br>

              of these checkers can be found here:<br>

              <br>

              <a

href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp</a><br>

              <a

href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp</a><br>

              <br>

              <br>

              -------- Some features --------<br>

              <br>

              1.The design of bindings has an interesting consequence:

              it enables the <br>

              dynamic introspection of GDM which was pretty hard before.

              (Hello alpha <br>

              renaming?)<br>

              2. Nothing prevents matchers to be used with existing

              checker API for <br>

              simplifying conditional checking inside callbacks. The

              matchers are not <br>

              planned as the replacement for the current API, but look

              like a nice <br>

              complementary part.<br>

              3. Implicit conversion of Matcher<ProgramPoint> to

              Matcher<ExplodedNode> <br>

              and Matcher<SymExpr || MemRegion> to

              Matcher<SVal> for writing shorter code.<br>

              <br>

              -------- Not implemented/not checked yet --------<br>

              I tried to keep the PoC as minimal as possible. As the

              result, some <br>

              features are not implemented yet, but I believe they can

              be added.<br>

              1. Usage of matchers inside checker callbacks<br>

                 This is not exactly unimplemented, but is still

              untested.<br>

              2. Dynamic matchers (clang-query interface)<br>

                 In addition to work for supporting dynamic nodes (I

              don't know how <br>

              many efforts it can take), we need to enable matching

              against AST nodes, <br>

              not graph. But it doesn't look as a problem, we can just

              make matchers <br>

              polymorphic.<br>

              3. Binding to successfully matched paths is not

              implemented yet - only <br>

              bindings to separate nodes. I wonder if we need path

              bindings at all.<br>

              4. Some corner cases are still FIXMEs: single-node

              sequences, for example.<br>

              5. GDM is not based on immutable data structures like it

              is done in CSA <br>

              - it is just an STL map. Its performance can be poor (full

              copy on every <br>

              new node), but I don't think that changing it to use

              immutable <br>

              structures is hard.<br>

              6. Matching on-the-fly<br>

                 GraphMatchFinder should support on-the-fly matching

              during <br>

              ExplodedGraph building. For this support, we should just

              call its <br>

              advance() method each time a new node is created. However,

              this <br>

              possibility wasn't checked yet.<br>

              7. Matching CFG and CallGraph isn't implemented because it

              was <br>

              considered to be far out of simple PoC.<br>

              8. Only sequential matching is available now, and I didn't

              try to <br>

              implement other operators yet. So, implementing a checker

              similar to <br>

              PthreadLock can be tricky or even impossible for now.<br>

              <br>

              -------- Known and potential issues --------<br>

               From matchers' side:<br>

              1. Some tests don't pass because they rely on the checker

              ability to <br>

              generate sink nodes. Our matchers cannot do it by design

              so tests don't <br>

              pass.<br>

              2. Representing checker bindings as a map can be less

              effective than <br>

              storing data in structures. I didn't measure the overhead,

              so I cannot <br>

              give any numbers.<br>

              3. Matchers are called on every node independently of its

              type. This is <br>

              not what CheckerManager does. I wonder if this detail can

              affect <br>

              performance as well.<br>

              4. Problems with cyclic dependencies. clangStaticAnalyzer

              has a <br>

              dependency on clangASTMatchers, therefore,

              clangASTMatchers cannot <br>

              depend on clangStaticAnalyzer. However, if we want

              ASTMatchers to <br>

              support static analyzer data structures, there should be a

              backward <br>

              dependency. Now this dependency is header-only.<br>

              5. Code duplication. This is mostly fine for a sandbox but

              needs to be <br>

              reduced.<br>

              <br>

               From analyzer's side:<br>

              1. Many events are not reflected in the final

              ExplodedGraph. For <br>

              example, checkers can receive PointerEscape event, but the

              event itself <br>

              will not be recorded in the ExplodedGraph - only changes

              made by <br>

              checkers will. That's also true for Stmt nodes: I noticed

              same issues <br>

              with PostCondition. This makes matching a bit harder.

              Should we add some <br>

              information into ExplodedGraph?<br>

              2. (Minor) Some static analyzer data structures don't

              support <br>

              traditional LLVM casting methods (dyn_cast, isa) because

              they lack <br>

              classof() method. I have fixed this internally and will

              put a patch on <br>

              review soon.<br>

              <br>

              -------- Conclusion --------<br>

              Finally, there is a graph matching engine supporting basic

              functionality <br>

              and two checkers using it. I apologize for not starting

              the discussion <br>

              earlier, but I just wasn't sure that the approach will

              work. Is anybody <br>

              interested to have this stuff in upstream (maybe, changed

              or improved)? <br>

              If yes, I'll be happy to contribute this work

              patch-by-patch and <br>

              continue its improvement. If no - well, I had enough fun

              playing with it.<br>

            </blockquote>

            <div><br>

            </div>

            <div>Thanks for sharing this results. Regardless of being

              upstreamed as is or</div>

            <div>in a separate form or not at all, this is an

              interesting experiment for the</div>

            <div>community. <br>

            </div>

            <div> </div>

          </div>

        </div>

      </div>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>