<div dir="ltr"><div>Hi Artem, Stephen,</div><div><br></div><div>Success \o/ <br></div><div><br></div><div>I can detect multiple returns now and should be able to figure out the other things I want to do also.</div><div><br></div><div>One or two comments in-line.</div><div><br></div><div>Thank you both for your help,</div><div><br></div><div>Billy.<br></div><div><br></div><div><br></div><div><br></div><div>Stephen wrote:</div>> On 10/09/2019 22:47, Billy O'Mahony via cfe-dev wrote:<br>> > Hi Artem,<br>> > <br>> > (or anyone else :D ).<br>> > <br>> > I managed to create a dummy clang-tidy check and run it.<br>> > <br>> > For the tests I want to write I've decided the easiest first step is to <br>> > just check for more than one return statement in my functions (this will <br>> > only be applying to specific functions all with a common prefix e.g <br>> > "api_foo".<br>> > <br>> > So I managed to write a matcher which seems to work ok.<br>> >      Finder->addMatcher(returnStmt().bind("x"), this);<br>> > <br>> > And get it to print a warning for each rtn statement in MyCheck::check:<br>> >      const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x");<br>> >      diag(MatchedDecl->getReturnLoc(), "found rtn stmt");<br>> > <br>> > So next I want to be able to find the name of the function that contains <br>> > the returnStatement but I can't figure out how to do this.<br>> <br>> .<br>>   Finder->addMatcher(<br>>      returnStmt(<br>>          forFunction(functionDecl().bind("func"))<br>>          ).bind("returnStmt"),<br><div>>      this);</div><div></div>> <br>> <br>> Your `check` function will then be called once for each return statement <br><div>> and you can extract both bound nodes.</div><div>
<div><br></div><div>This is doing the trick very nicely.</div>

</div><div><br></div><div>> </div>> > <br>> > Or maybe I should start with a FunctionDecl matcher that matches fns <br>> > named (api_...) and then somehow traverse that function's AST and count <br>> > the returnStmt's. ?<br>> <br>> If you wish to do counting, then traversing from the Function might <br><div>> indeed make more sense.</div><div><br></div><div>Ok. If I can't figure it out I'll post again. But using the double-bind and recording fn names in a <br></div><div>std::set is working for me right now.<br></div><br>> <br>> > I've found a few examples of clang-tidy checks on-line I think this one <br>> > <a href="https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html">https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html</a> <br>> > is the best. But recommendations for better resources also appreciated.<br>> <br>> <br>> I linked to some resources I wrote here which you might find useful:<br>> <br>>  <br><div>> <a href="https://steveire.wordpress.com/2018/11/11/future-developments-in-clang-query/">https://steveire.wordpress.com/2018/11/11/future-developments-in-clang-query/</a></div>> <br><div>> You can also see some of the other posts on my blog.</div><div><br></div><div>Thanks, I'll have a look.<br></div>> <br>> Thanks,<br>> <br>> Stephen.<br>> </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 10 Sep 2019 at 23:46, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    Yup, the most principled way of doing this with ASTMatchers is to
    start with the function decl and then recurse inside it:<br>
    <br>
        functionDecl(matchesName(...),
    forEachDescendant(returnStmt(...)))<br>
    <br>
    You can always do this in an inside out, but it most likely has
    performance implications (i never really understood how ASTMatcher
    performance works as i've never had any real performance problems
    with them):<br>
    <br>
        returnStmt(..., hasAncestor(functionDecl(matchesName(...))))<br>
    <br>
    You should totally use the official reference as your manual because
    it gives an up-to-date overview of all the stuff that we have, with
    a lot of examples:
    <a href="https://clang.llvm.org/docs/LibASTMatchersReference.html" target="_blank">https://clang.llvm.org/docs/LibASTMatchersReference.html</a><br>
    <br>
    <br>
    <br>
    <div>On 9/10/19 2:47 PM, Billy O'Mahony
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>Hi Artem,</div>
        <div><br>
        </div>
        <div>(or anyone else :D ).</div>
        <div><br>
        </div>
        <div>I managed to create a dummy clang-tidy check and run it.</div>
        <div><br>
        </div>
        <div>
          <div>For the tests I want to write I've decided the easiest
            first step is to just check for more than one return
            statement in my functions (this will only be applying to
            specific functions all with a common prefix e.g "api_foo".</div>
        </div>
        <div><br>
        </div>
        <div>So I managed to write a matcher which seems to work ok.</div>
        <div>    Finder->addMatcher(returnStmt().bind("x"), this);</div>
        <div><br>
        </div>
        <div>And get it to print a warning for each rtn statement in
          MyCheck::check:</div>
        <div>    const auto *MatchedDecl =
          Result.Nodes.getNodeAs<ReturnStmt>("x");<br>
              diag(MatchedDecl->getReturnLoc(), "found rtn stmt");</div>
        <div>
          <div><br>
          </div>
          <div>So next I want to be able to find the name of the
            function that contains the returnStatement but I can't
            figure out how to do this.</div>
          <div><br>
          </div>
          <div>Or maybe I should start with a FunctionDecl matcher that
            matches fns named (api_...) and then somehow traverse that
            function's AST and count the returnStmt's. ? <br>
          </div>
          <div><br>
          </div>
          <div>I've found a few examples of clang-tidy checks on-line I
            think this one <a href="https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html" target="_blank">https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html</a>
            is the best. But recommendations for better resources also
            appreciated.<br>
          </div>
          <div><br>
          </div>
          <div>Thanks for any help,</div>
          <div>Billy.<br>
          </div>
          <div><br>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Sat, 6 Jul 2019 at 13:41,
          Billy O'Mahony <<a href="mailto:billy.omahony@gmail.com" target="_blank">billy.omahony@gmail.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">
            <div>Hi Artem,</div>
            <div><br>
            </div>
            <div>super. Good to know I'm on the right path. And thanks
              for the extra info - plenty of clang goodness to be
              looking into there.</div>
            <div><br>
            </div>
            <div>Cheers,</div>
            <div>Billy.<br>
            </div>
            <br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Sat, 6 Jul 2019 at
                04:05, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                <div bgcolor="#FFFFFF"> > It sounds like clang-tidy
                  will be the sweet spot between usefulness and effort
                  to implement.<br>
                  <br>
                  Yup, i approve. You can experiment with clang-query
                  real quick and it should give you an idea. If you find
                  out that some of the useful ASTMatchers are missing,
                  feel free to add them - either locally or upstreamed,
                  as it's super easy.<br>
                  <br>
                  <br>
                  > By CFG you mean the Clang Static Analyzer? <br>
                  <br>
                  Clang CFG is <a href="https://clang.llvm.org/doxygen/classclang_1_1CFG.html" target="_blank">https://clang.llvm.org/doxygen/classclang_1_1CFG.html</a>
                  .<br>
                  <br>
                  It's a "source-level" control flow graph that consists
                  of pointers to AST statements, ordered in the control
                  flow order (i.e., in order of execution). A directed
                  path in the CFG from statement A to statement B would
                  tell you that "B can potentially be executed after A".
                  This allows you to use your favorite graph algorithms
                  to reason about the program's behavior.<br>
                  <br>
                  The Static Analyzer works over Clang CFG, but it's
                  much more than that: the Static Analyzer also
                  understands the semantics of every statement and can
                  model their effects over the "symbolic" state of the
                  program. For example, in the following code:<br>
                  <br>
                  void foo(bool x) {<br>
                    if (x) {<br>
                    }<br>
                    if (x) {<br>
                    }<br>
                  }<br>
                  <br>
                  the CFG would be a double-diamond: the control
                  branches off at the first statement, then joins back
                  at the end, then branches off again, then joins back.
                  But the Static Analyzer would understand that the
                  if-conditions are the same, therefore these are simply
                  two unrelated execution paths, and the execution path
                  in which the first branch goes to true and the second
                  branch goes to false is in fact impossible. And if 'x'
                  is overwritten in between, the Static Analyzer would
                  also know that.<br>
                  <br>
                  Apart from the Static Analyzer, the CFG is used for
                  some clang warnings, such as -Wuninitialized (see
                  AnalysisBasedWarnings.cpp). There are some ready-made
                  analyses for common data flow problems implemented
                  over Clang CFG (such as live variables analysis,
                  dominators analysis, etc.).<br>
                  <br>
                  Clang CFG is not used during the actual compilation /
                  code generation. When Chris Lattner talks about MLIR
                  and mentions that Clang should have had a "CIL", he
                  means that the Clang CFG (or something similar) should
                  have been used for compilation *as well as* for
                  analysis. This would, in particular, allow semantic
                  optimizations that require information that's already
                  lost in LLVM IR.<br>
                  <br>
                  Because the CFG is not used for compilation, it's not
                  necessarily correct. There are a few known bugs in it,
                  mostly around complicated C++ and also around GNU
                  extensions such as the *binary* operator ?: or
                  statement-expressions. But for plain C it should be
                  nearly perfect.<br>
                  <br>
                  <br>
                  > Does 'copy  around' include passing to my private
                  fns such as tweak()?<br>
                  <br>
                  That's entirely up to you. What i was trying to say is
                  that it if you allow copying 'dev' into another
                  variable, it will become much harder to implement your
                  analysis, so you'd much rather forbid the user to do
                  this. You might also allow the user to do this and
                  suffer from imprecision in your analysis. At the same
                  time, because your analysis is not inter-procedural,
                  passing 'dev' into a function should be fine. The
                  function can still return it back "laundered" so the
                  user would be able to assign it into a variable behind
                  your back. But these are the usual trade-offs of
                  static analysis, don't be too scared by them - after
                  all it all boils down to the halting problem :)<br>
                  <br>
                  <div>On
                    7/4/19 3:40 AM, Billy O'Mahony wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>Hi Artem,</div>
                      <div><br>
                      </div>
                      <div>thanks for your well thought-out and useful
                        reply. It sounds like clang-tidy will be the
                        sweet spot between usefulness and effort to
                        implement.</div>
                      <div><br>
                      </div>
                      <div>I have a few other responses down below.</div>
                      <div><br>
                      </div>
                      <div>Regards,</div>
                      <div>Billy.<br>
                      </div>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">On Wed, 3 Jul
                          2019 at 23:23, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>>
                          wrote:<br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div bgcolor="#FFFFFF"> Hi,<br>
                            <br>
                            It depends on how strict do you want the
                            checking be and on the details of the rule.
                            If you're designing a new API from scratch
                            and stuck with gcc forever, i wouldn't mind
                            using the gcc __attribute__((cleanup())) for
                            your purpose.<br>
                            <br>
                          </div>
                        </blockquote>
                        <div>I didn't know about that gcc attrib. I need
                          to read the gcc manual attrib section! I
                          should've added that while we will be
                          developing on gcc the code should be
                          compilable on other toolchains/OSs also, so we
                          are avoiding any gcc extensions (e.g. gcc has
                          extensions for thread local storage but we are
                          not using those for the same reason).  <br>
                        </div>
                        <div>  <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div bgcolor="#FFFFFF"> The rule you described
                            should be reasonably easy to implement with
                            the Static Analyzer. The good side of it is
                            that you get a lot of semantic modeling for
                            free. For instance, if the developer copies
                            `dev` into a local variable and then uses
                            that local variable outside of
                            api_enter..api_exit, the tool will be able
                            to handle transparently, as it deals with
                            values rather than with variables.</div>
                        </blockquote>
                        <div>That is really cool.</div>
                        <div> <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div bgcolor="#FFFFFF"> Also it will probably
                            be the easiest tool for your problem. The
                            downside would be that it's not guaranteed
                            to find all bugs; it'll inevitably give up
                            on complicated code with high cyclomatic
                            complexity :) So if you want strict/paranoid
                            enforcement of rules, the Static Analyzer is
                            not the right tool. But if you want to
                            simply find some bugs for free, it's the
                            right tool.<br>
                            <br>
                            It sounds as if your problem is not
                            inter-procedural. Let me double-check this:
                            would you have another api_enter..api_exit
                            pair in the body of your tweak() function?
                            Or is just one api_enter..api_exit enough?
                            Or is it a bug to call api_enter twice
                            without an api_exit in between?<br>
                          </div>
                        </blockquote>
                        <div>Yes in this case tweak() could be another
                          public fn of the api that would also have an
                          enter/exit pair. But we are using recursive
                          mutexes (a thread can acquire the same mutex N
                          times (ie call api_enter)  so long as it also
                          releases N times) so that will be okay. <br>
                        </div>
                        <div><br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div bgcolor="#FFFFFF"> <br>
                            If you have to write api_enter..api_exit in
                            *every* function that deals with devices,
                            then the problem is not inter-procedural,
                            which makes it much easier. In particular,
                            you should be able to come up with a purely
                            syntactic analysis ("every function that
                            accesses a device_t must start with
                            api_enter() and must end in exactly one spot
                            with api_exit()").</div>
                        </blockquote>
                        <div>We will only insist on enter/exit in public
                          API functions. Which are the only ones a
                          client application can call. Internal private
                          functions we won't have locking (as they can
                          only be called ultimately from a public
                          function so the device will be locked.) We are
                          going to allow private fns to call back into
                          the api via the public i/face. But we will
                          have the public functions in specific files
                          and have a specific naming prefix.<br>
                        </div>
                        <div>  <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> Such analysis should be easily do-able
                            in clang-tidy as long as you're satisfied
                            with this level of aggressiveness. In
                            particular, you'll have to be willing to
                            sacrifice code like this:<br>
                            <br>
                            void foo(device_t *dev) {<br>
                              if (flip_a_coin()) {<br>
                                api_enter(dev);<br>
                                ...<br>
                                api_exit(dev);<br>
                              }<br>
                            }<br>
                            <br>
                            But it may be perfectly fine if you
                            seriously want to enforce a strict structure
                            on all your functions that deal with
                            devices.</div>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>So is it the case that clang-tidy kind of
                          passes info to the checker-extension in a
                          syntactic code-parsing order. Whereas the
                          static analyzer passes information to the
                          checker in a simulated run-time order?</div>
                        <div><br>
                        </div>
                        <div>E.g in your foo() above my proposed checker
                          gets fed 1) theres a function called foo, 2)
                          theres an if with a call to flip_a_coin 3) in
                          the true case there is a call to enter then
                          exit 4) in the else there is nothing 5) there
                          is a return (at which point my checker would
                          need to be pretty smart and hold a lot of
                          state to figure out something was wrong) . And
                          to compare for the static analyzer it's more
                          like 1) there is fn foo 2.1) there is a code
                          path through foo with enter/exit 2.2) there is
                          a code path with just return (at which point
                          my reasonably simple checker would raise an
                          error).<br>
                        </div>
                        <div>   <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> <br>
                            I think the truly-truly right tool for your
                            problem would be to come up with a custom
                            analysis over Clang CFG.</div>
                        </blockquote>
                        <div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div bgcolor="#FFFFFF">.<br>
                            </div>
                          </blockquote>
                          <div>By CFG you mean the Clang Static
                            Analyzer? </div>
                            <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> It would be harder to implement, but it
                            would allow you to express things like
                            "every execution path within a function that
                            accesses `dev` must have a api_enter before
                            it and an api_exit after it; you are not
                            allowed to copy `dev` around". This would
                            strictly enforce the rule.</div>
                        </blockquote>
                        <div>Yes that would be great. But I think just
                          using clang-tidy from what you are saying
                          would get us a long way. And there are heaps
                          of simpler checks we would like to implement
                          also.<br>
                        </div>
                        <div> <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div> At the same time it'll allow you to lift
                            the requirement of exactly one return point
                            - you would still be able to ensure that all
                            accesses are covered. If you need to allow
                            to copy `dev` around, it should still be
                            doable, but it will be significantly more
                            difficult to implement<br>
                          </div>
                        </blockquote>
                        <div>Does 'copy  around' include passing to my
                          private fns such as tweak()?. We don't need to
                          copy dev anywhere within the public fns but we
                          do need it to pass it to private fns.<br>
                        </div>
                        <div> <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div bgcolor="#FFFFFF"> <br>
                            <div>On
                              7/2/19 12:13 PM, Billy O'Mahony via
                              cfe-dev wrote:<br>
                            </div>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div>Hello,</div>
                                <div><br>
                                </div>
                                <div>I'd like to write a rule for either
                                  clang-tidy or static analyzer to help
                                  catch some potential errors in a
                                  project I'm working on.</div>
                                <div><br>
                                </div>
                                <div>My questions are: <br>
                                </div>
                                <div>    a) is only one or the other
                                  will be able to do what I want to do?<br>
                                </div>
                                <div>    b) if both are feasible which
                                  would have the simpler implementation?</div>
                                <div><br>
                                </div>
                                <div>The project involves writing an API
                                  that will run in a multi-threaded
                                  application and is responsible for
                                  serializing all access to a device
                                  structure. Therefore the first thing
                                  in every function in the API must be
                                  to call api_enter (which will among
                                  other things acquire a mutex on the
                                  device) and the last thing before
                                  returning must be to call api_exit.
                                  Also I want to enforce single exit
                                  point from every API function - or
                                  certainly if there are any return
                                  points that bypass the api_exit call.<br>
                                </div>
                                <div><br>
                                </div>
                                <div>So here is an example function with
                                  errors I want to catch highlighted.</div>
                                <div><br>
                                </div>
                                <div>int api_foo(device_t *dev) {</div>
                                <div>    int ret_val = 0;<br>
                                </div>
                                <div><br>
                                </div>
                                <div>    bar();  // fn calls & decls
                                  before api_enter is ok- just don't
                                  access dev.</div>
                                <div>    dev->bla = 1; // NO! device
                                  access before api_enter() called</div>
                                <div>    api_enter(dev);   // error if
                                  this call is not present exactly once</div>
                                <div><br>
                                </div>
                                <div>    if (dev->bla)</div>
                                <div>        return; // NO! didn't call
                                  api_exit before rtn. Also two return
                                  points</div>
                                <div><br>
                                </div>
                                <div>    if (dev->ma) {</div>
                                <div>        ret_val = 1;<br>
                                </div>
                                <div>        goto cleanup;</div>
                                <div>    }</div>
                                <div>    tweak(dev);<br>
                                </div>
                                <div><br>
                                </div>
                                <div>cleanup:</div>
                                <div>    api_exit(dev); // error if this
                                  is not present exactly once<br>
                                </div>
                                <div>    dev->bla = 1; //NO! device
                                  access after api_exit()<br>
                                </div>
                                <div>    return ret_val;<br>
                                </div>
                                <div>}</div>
                                <div><br>
                                </div>
                                <div>I don't think it matters but the
                                  project is C compiled with gcc.<br>
                                </div>
                                <div><br>
                                </div>
                                <div>Also if both are feasible any other
                                  pointers, tips or good resources would
                                  be appreciated. E.g  is there a
                                  totally different methodology I'm not
                                  considering - e.g. would using
                                  something like pycparser be a lot
                                  easier - though I'd prefer to keep it
                                  in clang as we plan to use tidy &
                                  static analyzer in any case for
                                  standard QA.<br>
                                </div>
                                <div><br>
                                </div>
                                <div>Thanks for reading,</div>
                                <div>Billy.<br>
                                </div>
                              </div>
                              <br>
                              <fieldset></fieldset>
                              <pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
                            </blockquote>
                            <br>
                          </div>
                        </blockquote>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </blockquote>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div>