<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">18.05.2018 04:47, Artem Dergachev

      пишет:<br>

    </div>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      On 5/17/18 5:28 AM, Gábor Horváth wrote:<br>

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>Hi Alexey,</div>

          <div><br>

          </div>

          <div>In general, I like the idea of having a more declarative

            way to define new</div>

          <div>checks.<br>

          </div>

          <div><br>

          </div>

          <div>

            <div class="gmail_quote">

              <div dir="ltr">On Thu, 17 May 2018 at 01:37, Alexey

                Sidorin <<a href="mailto:alexey.v.sidorin@ya.ru"

                  moz-do-not-send="true">alexey.v.sidorin@ya.ru</a>>

                wrote:<br>

              </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello

                everyone,<br>

                <br>

                I'd like to share some results of my investigation

                targeted on <br>

                improvement of Clang Static Analyzer checker API. You

                can find some <br>

                previous conversation on this topic here: <br>

                <a

href="http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html"

                  rel="noreferrer" target="_blank"

                  moz-do-not-send="true">http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html</a>.

                <br>

                In my investigation, I tried to solve a particular

                problem of building a <br>

                checker without generating new nodes.<br>

              </blockquote>

              <div><br>

              </div>

              <div>Why do you focus on such checks that does not have

                traits, does not generate new nodes. <br>

              </div>

              <div>Do you find this to be the majority of the checks you

                need to implement? <br>

              </div>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      Ok, so as far as i understand, these new checkers don't support

      traits because they don't *need* them. Instead, they plan to use

      "backreferences" in matchers, i.e. double file descriptor close

      can be described as:<br>

      <br>

        hasSequence(<br>

          call(hasName("fclose"), hasArg(0, sym().bind("fd"))),<br>

          call(hasName("fclose"), hasArg(0, sym(equalsBoundNode("fd"))))<br>

        );<br>

      <br>

      Right?<br>

    </blockquote>

    Yes, exactly.<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      So we can't eliminate the path on which the pointer is null after

      we dereference it, like we normally do, we can't introduce state

      splits, we can't produce fatal error nodes, but we can still use

      finite state machines.<br>

    </blockquote>

    As I told before, I don't like state splitting and generating sinks

    in checker. However, it is not _exactly_ true that we cannot do it

    with matchers - because there are not only matchers, but are also

    _callbacks_. If we perform a match on-the-fly (I didn't implement

    this, unfortunately, but it doesn't look hard), we can do almost

    anything in the callback after a match was found.<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      The real question is, how do we implement RetainCountChecker that

      needs to count the potentially infinite number of retains and

      releases before emitting the warning, and is therefore not a

      finite state machine?^^<br>

    </blockquote>

    I'm not sure that I understood you correctly, but if we want to use

    something different from sequential model, we can implement our

    custom path matchers - like it is done for AST matchers. It took

    some time to implement a working (but still poorly designed)

    example:<br>

<a class="moz-txt-link-freetext" href="https://github.com/a-sid/clang/commit/7b94a5e648eed32b9fdc7cb7797b9450b0f3d1ed#diff-0aec754c14839c59eb296a889a7940dbR180">https://github.com/a-sid/clang/commit/7b94a5e648eed32b9fdc7cb7797b9450b0f3d1ed#diff-0aec754c14839c59eb296a889a7940dbR180</a>.

    <br>

    <br>

    The idea was to generalize some most common search patterns into

    matchers. Counting can be implemented pretty easily because counter

    states are still integers. However, if our state cannot be

    represented as an integer, there can be some troubles.<br>

    What I have also realized is that some way(s) of using different

    matchers together is needed. Reference counting is pretty useless on

    its own, it needs some actions to match if we are searching for a

    bug.<br>

    <br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>

            <div class="gmail_quote">

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                --------- Introduction and design goals ---------<br>

                <br>

                In brief, I tried to use matchers-like API to make CSA

                checkers look <br>

                like this:<br>

                <br>

                StatementMatcher NotChdir =<br>

                callExpr(unless(callee(functionDecl(hasName("::chdir")))));<br>

                Finder.addMatcher(<br>

                     hasSequence(<br>

                         postStmt(hasStatement(<br>

                callExpr(callee(functionDecl(hasName("::chroot")))))),<br>

                         unless(stmtPoint(hasStatement(callExpr(<br>

                             callee(functionDecl(hasName("::chdir"))),<br>

                             hasArgument(0,

                hasValue(stringRegion(refersString("/")))))))),<br>

                explodedNode(anyOf(postStmt(hasStatement(NotChdir)),<br>

                callEnter(hasCallExpr(NotChdir))))<br>

                             .bind("bug_node")),<br>

                     &Callback);<br>

                Finder.match(G);<br>

              </blockquote>

              <div><br>

              </div>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      Aha, so am i understanding correctly that unless() here is not

      your normal unless()? The normal unless() clause would have meant<br>

      <br>

          "somewhere between chroot() and the offending call there's a

      node that's not chdir()"<br>

      <br>

      but what we're trying to say is<br>

      <br>

          "nowhere between chroot() and the offending call there's a

      node that's chdir()"<br>

      <br>

      ?<br>

    </blockquote>

    Yes. I should probably create a normal matcher and don't reuse

    unless() for this to make it less confusive.<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>

            <div class="gmail_quote">

              <div>I think, a common (design) pitfall of writing checks

                is to try to match against</div>

              <div>the AST when a symbolic execution callback should be

                used instead.</div>

              <div>I am a bit afraid having an API like this would make

                this pitfall more common.</div>

              <div>Maybe a separation between the path sensitive, flow

                sensitive and</div>

              <div>AST based checks is good for the checker writers and

                new users and</div>

              <div>I am not sure that the same abstraction fits all of

                these cases.</div>

              <div><br>

              </div>

              <div>In case of path sensitive checks I wonder if a state

                machine like abstraction would</div>

              <div>fit the use cases better (and also cover checks that

                are using traits)</div>

              <div>where the guarding conditions of the state

                transitions can include AST matcher like checks. <br>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                and I have managed to make some simple working examples.<br>

                <br>

                The entire diff can be found here: <br>

                <a

href="https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f"

                  rel="noreferrer" target="_blank"

                  moz-do-not-send="true">https://github.com/a-sid/clang/commit/9a0b1d1d9b3cf41b551a663f041f54d5427aa72f</a><br>

                The code itself: <a

                  href="https://github.com/a-sid/clang/tree/a4"

                  rel="noreferrer" target="_blank"

                  moz-do-not-send="true">https://github.com/a-sid/clang/tree/a4</a><br>

                <br>

                There are several reasons why I have tried this

                approach.<br>

                <br>

                1. AST Matchers are already extensively used API for AST

                checking. They <br>

                are available both in Clang-Tidy and CSA. And I wanted

                to use existing <br>

                functionality as much as possible. So, I decided to

                extend an existing <br>

                API to make its usage seamless across different kinds of

                checks: <br>

                path-sensitive, AST-based and CFG-based.<br>

              </blockquote>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      When we say CFG-based checks, we usually mean "all paths" checks

      that find bugs indicated by a certain invariant holding on all

      paths through a certain program point(s). Such as

      TestAfterDivideZero or DeadStore.<br>

      <br>

      The problem with CFG-based "all paths" checks that i don't really

      know how to solve and that's probably not going to be solved by

      giving them a good API is error reporting. Imagine a conversation

      with a user:<br>

      <br>

        A n a l y z e r :  Hey this assignment is not used.<br>

      <br>

        U s e r :  But hey, it's used here *points to the code*.<br>

      <br>

        A n a l y z e r :  But this code is dead.<br>

      <br>

        U s e r :  Why so? Suppose 'x' is equal to true, we jump

      straight into this code.<br>

      <br>

        A n a l y z e r :  That's because the initializer for 'x' is in

      fact always evaluated to false.<br>

      <br>

        U s e r :  WHY t___t<br>

      <br>

      Etc. The first piece of information is produced by a DeadStore

      checker. The second piece of information is produced by some sort

      of DeadCode checker (currently alpha). The third piece of

      information is produced by some sort of SameResLogicalExpr checker

      (currently in the list of potential checkers). All three checkers

      are all-paths checkers. And unless all three checker's diagnostics

      are presented to the user at the same time, the user will be

      unable to understand the bug report.<br>

      <br>

      I believe that we should really solve this problem somehow (or

      explain to me why it isn't a problem) before we try to introduce a

      massive increase in the number of CFG-based checkers we have

      (currently 1).<br>

    </blockquote>

    Looks like I have used wrong words to describe what I wanted to do

    with CFG. Initially, I wanted to support CFG search as well but

    quicky realized nobody just needs it; however, some rudiments are

    still remaining in the code. Instead, I meant exposing CFG analyses

    in a user-friendly way, like it is done in TestAfterDivZeroV2. I

    think it makes much more sense because CFG analyses like

    reachability or postdomination can be used for FP suppression in the

    checkers and in the analyzer engine.<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>

            <div class="gmail_quote">

              <div>I am not sure if this is a good idea. I think the

                checker author supposed to</div>

              <div>be aware if the check is AST based, flow sensitive,

                or path sensitive. <br>

              </div>

              <div>And this should also be easy to tell by reading the

                code. I am also not</div>

              <div>sure whether there is an abstraction that fits all.</div>

              <div><br>

              </div>

              <div>I think the reason why this idea works well for

                checks that only inspect the</div>

              <div>exploded graph, because in this case we are still

                doing pattern matching on</div>

              <div>graphs in a similar manner as doing pattern matching

                on the AST.</div>

              <div><br>

              </div>

              <div>But does this generalize later on to stateful checks?

                <br>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                2. AST matchers effectively help clients to avoid a lot

                of checking of <br>

                dyn_cast results. This feature not only makes them more

                convenient but <br>

                also more safe because, in my experience, forgetting a

                nullptr/None <br>

                check is a pretty common mistake for checker writers.<br>

              </blockquote>

              <div><br>

              </div>

              <div>

                <div>I think if something cannot be null we should

                  return references, otherwise</div>

                <div>the client need to check for the pointer beeing

                  null (unless there are some</div>

                <div>dynamic invariant that would make the check

                  redundant). If an API does</div>

                <div>not match this philosophy, maybe we should fix the

                  API first. </div>

                <div><br>

                </div>

                <div>Regardless of fixing the API, I agree that it would

                  be great to have higher</div>

                <div>level APIs for checker authors.</div>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                3. Testing of AST matchers don't require writing C++

                code - it can be <br>

                done interactively with clang-query tool. And I believe

                that we need <br>

                similar functionality for CSA as well.<br>

              </blockquote>

              <div><br>

              </div>

              Having a query tool for exploded graphs could be very

              helpful, I agree.</div>

            <div class="gmail_quote">These graphs tend to be very large

              and sometimes it is not trivial to</div>

            <div class="gmail_quote">find the relevant parts during

              debugging.<br>

            </div>

            <div class="gmail_quote">

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                I didn't want to replace existing checker API. Instead,

                I tried to make <br>

                new possibilities usable independently or together with

                existing.<br>

                <br>

                --------- Design and implementation ---------<br>

                <br>

                The implementation consisted of a number of steps.<br>

                <br>

                1. Allow matching nodes of path-sensitive graph like

                usual AST nodes. To <br>

                make this possible, three actions were needed:<br>

                  1.1. ASTTypeTraits and DynTypedNode were extended to

                support <br>

                path-sensitive nodes: ExplodedNode, ProgramState, SVal,

                SymExpr, etc. <br>

              </blockquote>

              <div><br>

              </div>

              <div>I wonder, if in general it is a good idea to pattern

                match in checks on SymExpr.</div>

              <div>A more general approach here would be to use the

                constraint solver for queries</div>

              <div>in a similar manner how ProgramState::assume works.<br>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> The

                implementation with graph node support is moved into a

                separate <br>

                class (ASTGraphTypeTraits) in ento namespace to resolve

                cyclic <br>

                dependencies (they are still exist, unfortunately, but

                are header-only, <br>

                so we can build the PoC).<br>

                  1.2. Some additions to AST matchers were made to add

                support for new <br>

                kinds of nodes.<br>

                  1.3. To make MatchFinder able to query specific

                options not supported <br>

                by pure AST, it was augmented with "Contexts". A matcher

                that needs to <br>

                query the path-sensitive engine asks the Finder for the

                required Context <br>

                which provides specific helper functions.<br>

                <br>

                As the result of this step, we are able to write

                matchers like <br>

                expr(hasArgument(0,

                hasValue(stringRegion(refersString("/"))))).<br>

              </blockquote>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      Yeah, we'll also eventually need operators over symbols, eg.:<br>

      <br>

          arraySubscriptExpr(hasSubscript(expr(hasSVal(sym(isLessThan(<br>

      sym(equalsBoundNode("array_size")))).bind("array_subscript")))))<br>

    </blockquote>

    There is nothing impossible here - such expressions should be easy

    to write with augmented AST matchers used in this PoC.<br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      Still, SVal matchers are something i always wanted. I even made

      SVal visitors. The latter turned out to be pretty useless though

      :/<br>

      <br>

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>

            <div class="gmail_quote">

              <div>I think this is the part of the proposal that I like

                the most, it would be</div>

              <div>a very concise way to write down guarding conditions.<br>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                2. Create an engine for graph exploration and matching.<br>

                   Unlike normal AST matchers, hasSequence is a

                path-sensitive matcher. <br>

                It is launched against ExplodedGraph. These matchers are

                handled by <br>

                GraphMatchFinder object. While searching a graph, it

                collects matches. <br>

                Each match contains a pointer to the corresponding

                matcher and State ID <br>

                of this match. The way how matches are translated from

                one state ID to <br>

                another is determined by matcher operators.<br>

              </blockquote>

              <div><br>

              </div>

              <div>Is this matching done after the exploded graph

                already built? If so,</div>

              <div>these checks will be unable to generate sink nodes. I

                think having sink</div>

              <div>nodes sometimes desirable even if the check itself

                does not have a trait.</div>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      I'm really curious about how the performance scales with the

      number of checkers. Is it going to be significantly more expensive

      to run 100 DSL-based checkers on a fully built graph than to run

      100 normal checkers as the graph is being built?<br>

    </blockquote>

    No benchmarks => no answer, sorry. My wild guess won't work here

    :(<br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      This boils down to: by looking at the sub-matcher, will you be

      able to filter out checkers that don't need to act upon the node

      as quickly as you do that when calling checker callbacks?<br>

    </blockquote>

    <br>

    I think it's a hard goal to achieve.<br>

    1. To separate matcher callbacks, we need for matchers to expose

    meta-information. Some single matchers also can participate in

    multiple callbacks (anyOf()).<br>

    2. To make matchers callable with a matcher which is a part of it,

    we need to re-design the path matchers. They should have a mapping

    between sub-matchers and actions required if a sub-matcher matches.<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      And this immediately leads to: how difficult would it be to simply

      auto-generate an old-style checker (with state traits and

      callbacks) from the new-style matcher? Maybe that's something we

      should focus on instead of trying to match a fully constructed

      graph? Maybe it'd also be more powerful, i.e. allow certain

      imperative side effects whenever partial match is encountered?<br>

    </blockquote>

    We can use matchers inside checker callbacks if we need extended

    functionality. The auto-generation for matchers with side effects is

    non-trivial and needs supporting partial match actions. However,

    since we have BindableMatchers, we can also have ActingMatchers

    (unsupported dynamically) that have `.call()` method.<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com">

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>

            <div class="gmail_quote">

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                   For example, SequenceVariadicOperator, which is the

                base of <br>

                hasSequence() matcher, has "positive" and "negative"

                sub-matches. Each <br>

                positive matcher has its corresponding State ID. In

                order to advance to <br>

                the next State ID, a node being matched should match all

                "negative" <br>

                matchers before the next "positive" matchers and the

                next "positive" <br>

                matcher itself. Failure in matching "negative" matcher

                discards the match.<br>

                <br>

                   The role of GraphMatchFinder is similar to

                MatchFinder: it is only <br>

                responsible for graph exploration and keeping bound

                nodes and matchers.<br>

                <br>

                3. Manage bound nodes.<br>

                   While matching a single graph node, BoundNodes from

                AST MatchFinder <br>

                are used. AST matchers for path-sensitive nodes support

                bindings <br>

                out-of-the-box. However, the situation with graph

                matching is a bit <br>

                different. For graph matching, we have another system of

                bound nodes. <br>

                Each graph node has a related map of bounds aka GDMTy

                (yes, the name is <br>

                not a coincidence). GDMTy is a mapping from match ID to

                BoundNodesMap <br>

                which, in part, is effectively a map from std::string to

                DynTypedNodes. <br>

                This look pretty much like how GDM is organized in

                ExplodedGraph, just <br>

                without one level of indirection (it can be added,

                though).<br>

                <br>

                   MatchFinder contexts should allow support of their

                own bindings. This <br>

                is how equalsBoundNode() matcher works for

                path-sensitive nodes: it just <br>

                queries all available contexts for the binding.<br>

                <br>

                Finally, I have managed to make two checkers work in

                this way: <br>

                ChrootChecker (which is present in the introduction) and

                <br>

                TestAfterDivZeroChecker. Both them can be found in

                ChrootCheckerV2.cpp <br>

                and TestAfterDivZeroCheckerV2.cpp correspondingly. They

                act like normal <br>

                checkers: produce warnings and use visitors.</blockquote>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      I wonder if our false positive suppression visitors (or maybe even

      regular checker bugvisitors) can be rewritten with these matchers.

      trackNullOrUndefValue() and its system of visitors doesn't look

      like a matcher at all to me, unfortunately - it can potentially

      track an infinite amount of events. But every individual visitor

      within that system probably is.<br>

    </blockquote>

    Hm, I didn't even think about it. Do you think it can give us some

    advantages for BugVisitors?<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">The

                main difference is that <br>

                they cannot generate sinks and use checkEndAnalysis

                callback. The code <br>

                of these checkers can be found here:<br>

                <br>

                <a

href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp"

                  rel="noreferrer" target="_blank"

                  moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/ChrootCheckerV2.cpp</a><br>

                <a

href="https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp"

                  rel="noreferrer" target="_blank"

                  moz-do-not-send="true">https://github.com/a-sid/clang/blob/a4/lib/StaticAnalyzer/Checkers/TestAfterDivZeroCheckerV2.cpp</a><br>

                <br>

                <br>

                -------- Some features --------<br>

                <br>

                1.The design of bindings has an interesting consequence:

                it enables the <br>

                dynamic introspection of GDM which was pretty hard

                before. (Hello alpha <br>

                renaming?)<br>

                2. Nothing prevents matchers to be used with existing

                checker API for <br>

                simplifying conditional checking inside callbacks. The

                matchers are not <br>

                planned as the replacement for the current API, but look

                like a nice <br>

                complementary part.<br>

                3. Implicit conversion of Matcher<ProgramPoint> to

                Matcher<ExplodedNode> <br>

                and Matcher<SymExpr || MemRegion> to

                Matcher<SVal> for writing shorter code.<br>

                <br>

                -------- Not implemented/not checked yet --------<br>

                I tried to keep the PoC as minimal as possible. As the

                result, some <br>

                features are not implemented yet, but I believe they can

                be added.<br>

                1. Usage of matchers inside checker callbacks<br>

                   This is not exactly unimplemented, but is still

                untested.<br>

                2. Dynamic matchers (clang-query interface)<br>

                   In addition to work for supporting dynamic nodes (I

                don't know how <br>

                many efforts it can take), we need to enable matching

                against AST nodes, <br>

                not graph. But it doesn't look as a problem, we can just

                make matchers <br>

                polymorphic.<br>

                3. Binding to successfully matched paths is not

                implemented yet - only <br>

                bindings to separate nodes. I wonder if we need path

                bindings at all.<br>

                4. Some corner cases are still FIXMEs: single-node

                sequences, for example.<br>

                5. GDM is not based on immutable data structures like it

                is done in CSA <br>

                - it is just an STL map. Its performance can be poor

                (full copy on every <br>

                new node), but I don't think that changing it to use

                immutable <br>

                structures is hard.<br>

                6. Matching on-the-fly<br>

                   GraphMatchFinder should support on-the-fly matching

                during <br>

                ExplodedGraph building. For this support, we should just

                call its <br>

                advance() method each time a new node is created.

                However, this <br>

                possibility wasn't checked yet.<br>

                7. Matching CFG and CallGraph isn't implemented because

                it was <br>

                considered to be far out of simple PoC.<br>

                8. Only sequential matching is available now, and I

                didn't try to <br>

                implement other operators yet. So, implementing a

                checker similar to <br>

                PthreadLock can be tricky or even impossible for now.<br>

                <br>

                -------- Known and potential issues --------<br>

                 From matchers' side:<br>

                1. Some tests don't pass because they rely on the

                checker ability to <br>

                generate sink nodes. Our matchers cannot do it by design

                so tests don't <br>

                pass.<br>

                2. Representing checker bindings as a map can be less

                effective than <br>

                storing data in structures. I didn't measure the

                overhead, so I cannot <br>

                give any numbers.<br>

                3. Matchers are called on every node independently of

                its type. This is <br>

                not what CheckerManager does. I wonder if this detail

                can affect <br>

                performance as well.<br>

                4. Problems with cyclic dependencies.

                clangStaticAnalyzer has a <br>

                dependency on clangASTMatchers, therefore,

                clangASTMatchers cannot <br>

                depend on clangStaticAnalyzer. However, if we want

                ASTMatchers to <br>

                support static analyzer data structures, there should be

                a backward <br>

                dependency. Now this dependency is header-only.<br>

                5. Code duplication. This is mostly fine for a sandbox

                but needs to be <br>

                reduced.<br>

                <br>

                 From analyzer's side:<br>

                1. Many events are not reflected in the final

                ExplodedGraph. For <br>

                example, checkers can receive PointerEscape event, but

                the event itself <br>

                will not be recorded in the ExplodedGraph - only changes

                made by <br>

                checkers will.</blockquote>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      Weeeell maaaaaybe. This would essentially mean passing a

      CheckerContext into checkPointerEscape and checkRegionChanges.

      With a predecessor node having a new sort of ProgramPoint such as

      "PointerEscapePoint". And we can put the list of escaped stuff

      directly into the program point instead of passing it separately.<br>

      <br>

      Which means allowing state splits in these callbacks. Which of

      course can be disallowed later (to keep the code that calls the

      callbacks simple), and graph matchers won't need them any time

      soon anyway.<br>

      <br>

      But it'd still be a lot of refactoring because currently these

      callbacks are called from code that doesn't even know anything

      about nodes, eg. from within State->invalidateRegions().<br>

    </blockquote>

    Internally, I created a dummy node in CheckerManager methods and

    then launched checker callbacks with it. But this doesn't look like

    a right way.<br>

    <br>

    <blockquote type="cite"

      cite="mid:1f0beb2b-0e33-bd7d-f80d-3a23f8c4ed88@gmail.com"> <br>

      <blockquote type="cite"

cite="mid:CAPRL4a0WncdqW394JFDZ7SdA0+bF0jgFrH7-wnfsN03mFADH9g@mail.gmail.com">

        <div dir="ltr">

          <div>

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">That's

                also true for Stmt nodes: I noticed same issues <br>

                with PostCondition. This makes matching a bit harder.

                Should we add some <br>

                information into ExplodedGraph?<br>

                2. (Minor) Some static analyzer data structures don't

                support <br>

                traditional LLVM casting methods (dyn_cast, isa) because

                they lack <br>

                classof() method. I have fixed this internally and will

                put a patch on <br>

                review soon.<br>

                <br>

                -------- Conclusion --------<br>

                Finally, there is a graph matching engine supporting

                basic functionality <br>

                and two checkers using it. I apologize for not starting

                the discussion <br>

                earlier, but I just wasn't sure that the approach will

                work. Is anybody <br>

                interested to have this stuff in upstream (maybe,

                changed or improved)? <br>

                If yes, I'll be happy to contribute this work

                patch-by-patch and <br>

                continue its improvement. If no - well, I had enough fun

                playing with it.<br>

              </blockquote>

              <div><br>

              </div>

              <div>Thanks for sharing this results. Regardless of being

                upstreamed as is or</div>

              <div>in a separate form or not at all, this is an

                interesting experiment for the</div>

              <div>community. <br>

              </div>

              <div> </div>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>