[cfe-dev] clang-tidy or static analyzer or ...

Tue Sep 10 15:46:47 PDT 2019

Yup, the most principled way of doing this with ASTMatchers is to start 
with the function decl and then recurse inside it:

     functionDecl(matchesName(...), forEachDescendant(returnStmt(...)))

You can always do this in an inside out, but it most likely has 
performance implications (i never really understood how ASTMatcher 
performance works as i've never had any real performance problems with 
them):

     returnStmt(..., hasAncestor(functionDecl(matchesName(...))))

You should totally use the official reference as your manual because it 
gives an up-to-date overview of all the stuff that we have, with a lot 
of examples: https://clang.llvm.org/docs/LibASTMatchersReference.html

On 9/10/19 2:47 PM, Billy O'Mahony wrote:
> Hi Artem,
>
> (or anyone else :D ).
>
> I managed to create a dummy clang-tidy check and run it.
>
> For the tests I want to write I've decided the easiest first step is 
> to just check for more than one return statement in my functions (this 
> will only be applying to specific functions all with a common prefix 
> e.g "api_foo".
>
> So I managed to write a matcher which seems to work ok.
>     Finder->addMatcher(returnStmt().bind("x"), this);
>
> And get it to print a warning for each rtn statement in MyCheck::check:
>     const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x");
>     diag(MatchedDecl->getReturnLoc(), "found rtn stmt");
>
> So next I want to be able to find the name of the function that 
> contains the returnStatement but I can't figure out how to do this.
>
> Or maybe I should start with a FunctionDecl matcher that matches fns 
> named (api_...) and then somehow traverse that function's AST and 
> count the returnStmt's. ?
>
> I've found a few examples of clang-tidy checks on-line I think this 
> one 
> https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html 
> is the best. But recommendations for better resources also appreciated.
>
> Thanks for any help,
> Billy.
>
>
> On Sat, 6 Jul 2019 at 13:41, Billy O'Mahony <billy.omahony at gmail.com 
> <mailto:billy.omahony at gmail.com>> wrote:
>
>     Hi Artem,
>
>     super. Good to know I'm on the right path. And thanks for the
>     extra info - plenty of clang goodness to be looking into there.
>
>     Cheers,
>     Billy.
>
>     On Sat, 6 Jul 2019 at 04:05, Artem Dergachev <noqnoqneo at gmail.com
>     <mailto:noqnoqneo at gmail.com>> wrote:
>
>         > It sounds like clang-tidy will be the sweet spot between
>         usefulness and effort to implement.
>
>         Yup, i approve. You can experiment with clang-query real quick
>         and it should give you an idea. If you find out that some of
>         the useful ASTMatchers are missing, feel free to add them -
>         either locally or upstreamed, as it's super easy.
>
>
>         > By CFG you mean the Clang Static Analyzer?
>
>         Clang CFG is
>         https://clang.llvm.org/doxygen/classclang_1_1CFG.html .
>
>         It's a "source-level" control flow graph that consists of
>         pointers to AST statements, ordered in the control flow order
>         (i.e., in order of execution). A directed path in the CFG from
>         statement A to statement B would tell you that "B can
>         potentially be executed after A". This allows you to use your
>         favorite graph algorithms to reason about the program's behavior.
>
>         The Static Analyzer works over Clang CFG, but it's much more
>         than that: the Static Analyzer also understands the semantics
>         of every statement and can model their effects over the
>         "symbolic" state of the program. For example, in the following
>         code:
>
>         void foo(bool x) {
>           if (x) {
>           }
>           if (x) {
>           }
>         }
>
>         the CFG would be a double-diamond: the control branches off at
>         the first statement, then joins back at the end, then branches
>         off again, then joins back. But the Static Analyzer would
>         understand that the if-conditions are the same, therefore
>         these are simply two unrelated execution paths, and the
>         execution path in which the first branch goes to true and the
>         second branch goes to false is in fact impossible. And if 'x'
>         is overwritten in between, the Static Analyzer would also know
>         that.
>
>         Apart from the Static Analyzer, the CFG is used for some clang
>         warnings, such as -Wuninitialized (see
>         AnalysisBasedWarnings.cpp). There are some ready-made analyses
>         for common data flow problems implemented over Clang CFG (such
>         as live variables analysis, dominators analysis, etc.).
>
>         Clang CFG is not used during the actual compilation / code
>         generation. When Chris Lattner talks about MLIR and mentions
>         that Clang should have had a "CIL", he means that the Clang
>         CFG (or something similar) should have been used for
>         compilation *as well as* for analysis. This would, in
>         particular, allow semantic optimizations that require
>         information that's already lost in LLVM IR.
>
>         Because the CFG is not used for compilation, it's not
>         necessarily correct. There are a few known bugs in it, mostly
>         around complicated C++ and also around GNU extensions such as
>         the *binary* operator ?: or statement-expressions. But for
>         plain C it should be nearly perfect.
>
>
>         > Does 'copy  around' include passing to my private fns such
>         as tweak()?
>
>         That's entirely up to you. What i was trying to say is that it
>         if you allow copying 'dev' into another variable, it will
>         become much harder to implement your analysis, so you'd much
>         rather forbid the user to do this. You might also allow the
>         user to do this and suffer from imprecision in your analysis.
>         At the same time, because your analysis is not
>         inter-procedural, passing 'dev' into a function should be
>         fine. The function can still return it back "laundered" so the
>         user would be able to assign it into a variable behind your
>         back. But these are the usual trade-offs of static analysis,
>         don't be too scared by them - after all it all boils down to
>         the halting problem :)
>
>         On 7/4/19 3:40 AM, Billy O'Mahony wrote:
>>         Hi Artem,
>>
>>         thanks for your well thought-out and useful reply. It sounds
>>         like clang-tidy will be the sweet spot between usefulness and
>>         effort to implement.
>>
>>         I have a few other responses down below.
>>
>>         Regards,
>>         Billy.
>>
>>         On Wed, 3 Jul 2019 at 23:23, Artem Dergachev
>>         <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>
>>             Hi,
>>
>>             It depends on how strict do you want the checking be and
>>             on the details of the rule. If you're designing a new API
>>             from scratch and stuck with gcc forever, i wouldn't mind
>>             using the gcc __attribute__((cleanup())) for your purpose.
>>
>>         I didn't know about that gcc attrib. I need to read the gcc
>>         manual attrib section! I should've added that while we will
>>         be developing on gcc the code should be compilable on other
>>         toolchains/OSs also, so we are avoiding any gcc extensions
>>         (e.g. gcc has extensions for thread local storage but we are
>>         not using those for the same reason).
>>
>>             The rule you described should be reasonably easy to
>>             implement with the Static Analyzer. The good side of it
>>             is that you get a lot of semantic modeling for free. For
>>             instance, if the developer copies `dev` into a local
>>             variable and then uses that local variable outside of
>>             api_enter..api_exit, the tool will be able to handle
>>             transparently, as it deals with values rather than with
>>             variables.
>>
>>         That is really cool.
>>
>>             Also it will probably be the easiest tool for your
>>             problem. The downside would be that it's not guaranteed
>>             to find all bugs; it'll inevitably give up on complicated
>>             code with high cyclomatic complexity :) So if you want
>>             strict/paranoid enforcement of rules, the Static Analyzer
>>             is not the right tool. But if you want to simply find
>>             some bugs for free, it's the right tool.
>>
>>             It sounds as if your problem is not inter-procedural. Let
>>             me double-check this: would you have another
>>             api_enter..api_exit pair in the body of your tweak()
>>             function? Or is just one api_enter..api_exit enough? Or
>>             is it a bug to call api_enter twice without an api_exit
>>             in between?
>>
>>         Yes in this case tweak() could be another public fn of the
>>         api that would also have an enter/exit pair. But we are using
>>         recursive mutexes (a thread can acquire the same mutex N
>>         times (ie call api_enter)  so long as it also releases N
>>         times) so that will be okay.
>>
>>
>>             If you have to write api_enter..api_exit in *every*
>>             function that deals with devices, then the problem is not
>>             inter-procedural, which makes it much easier. In
>>             particular, you should be able to come up with a purely
>>             syntactic analysis ("every function that accesses a
>>             device_t must start with api_enter() and must end in
>>             exactly one spot with api_exit()").
>>
>>         We will only insist on enter/exit in public API functions.
>>         Which are the only ones a client application can call.
>>         Internal private functions we won't have locking (as they can
>>         only be called ultimately from a public function so the
>>         device will be locked.) We are going to allow private fns to
>>         call back into the api via the public i/face. But we will
>>         have the public functions in specific files and have a
>>         specific naming prefix.
>>
>>             Such analysis should be easily do-able in clang-tidy as
>>             long as you're satisfied with this level of
>>             aggressiveness. In particular, you'll have to be willing
>>             to sacrifice code like this:
>>
>>             void foo(device_t *dev) {
>>               if (flip_a_coin()) {
>>                 api_enter(dev);
>>                 ...
>>                 api_exit(dev);
>>               }
>>             }
>>
>>             But it may be perfectly fine if you seriously want to
>>             enforce a strict structure on all your functions that
>>             deal with devices.
>>
>>
>>         So is it the case that clang-tidy kind of passes info to the
>>         checker-extension in a syntactic code-parsing order. Whereas
>>         the static analyzer passes information to the checker in a
>>         simulated run-time order?
>>
>>         E.g in your foo() above my proposed checker gets fed 1)
>>         theres a function called foo, 2) theres an if with a call to
>>         flip_a_coin 3) in the true case there is a call to enter then
>>         exit 4) in the else there is nothing 5) there is a return (at
>>         which point my checker would need to be pretty smart and hold
>>         a lot of state to figure out something was wrong) . And to
>>         compare for the static analyzer it's more like 1) there is fn
>>         foo 2.1) there is a code path through foo with enter/exit
>>         2.2) there is a code path with just return (at which point my
>>         reasonably simple checker would raise an error).
>>
>>
>>             I think the truly-truly right tool for your problem would
>>             be to come up with a custom analysis over Clang CFG.
>>
>>             .
>>
>>         By CFG you mean the Clang Static Analyzer?
>>
>>             It would be harder to implement, but it would allow you
>>             to express things like "every execution path within a
>>             function that accesses `dev` must have a api_enter before
>>             it and an api_exit after it; you are not allowed to copy
>>             `dev` around". This would strictly enforce the rule.
>>
>>         Yes that would be great. But I think just using clang-tidy
>>         from what you are saying would get us a long way. And there
>>         are heaps of simpler checks we would like to implement also.
>>
>>             At the same time it'll allow you to lift the requirement
>>             of exactly one return point - you would still be able to
>>             ensure that all accesses are covered. If you need to
>>             allow to copy `dev` around, it should still be doable,
>>             but it will be significantly more difficult to implement
>>
>>         Does 'copy  around' include passing to my private fns such as
>>         tweak()?. We don't need to copy dev anywhere within the
>>         public fns but we do need it to pass it to private fns.
>>
>>
>>             On 7/2/19 12:13 PM, Billy O'Mahony via cfe-dev wrote:
>>>             Hello,
>>>
>>>             I'd like to write a rule for either clang-tidy or static
>>>             analyzer to help catch some potential errors in a
>>>             project I'm working on.
>>>
>>>             My questions are:
>>>                 a) is only one or the other will be able to do what
>>>             I want to do?
>>>                 b) if both are feasible which would have the simpler
>>>             implementation?
>>>
>>>             The project involves writing an API that will run in a
>>>             multi-threaded application and is responsible for
>>>             serializing all access to a device structure. Therefore
>>>             the first thing in every function in the API must be to
>>>             call api_enter (which will among other things acquire a
>>>             mutex on the device) and the last thing before returning
>>>             must be to call api_exit. Also I want to enforce single
>>>             exit point from every API function - or certainly if
>>>             there are any return points that bypass the api_exit call.
>>>
>>>             So here is an example function with errors I want to
>>>             catch highlighted.
>>>
>>>             int api_foo(device_t *dev) {
>>>                 int ret_val = 0;
>>>
>>>                 bar();  // fn calls & decls before api_enter is ok-
>>>             just don't access dev.
>>>                 dev->bla = 1; // NO! device access before
>>>             api_enter() called
>>>                 api_enter(dev);   // error if this call is not
>>>             present exactly once
>>>
>>>                 if (dev->bla)
>>>                     return; // NO! didn't call api_exit before rtn.
>>>             Also two return points
>>>
>>>                 if (dev->ma) {
>>>                     ret_val = 1;
>>>                     goto cleanup;
>>>                 }
>>>                 tweak(dev);
>>>
>>>             cleanup:
>>>                 api_exit(dev); // error if this is not present
>>>             exactly once
>>>                 dev->bla = 1; //NO! device access after api_exit()
>>>                 return ret_val;
>>>             }
>>>
>>>             I don't think it matters but the project is C compiled
>>>             with gcc.
>>>
>>>             Also if both are feasible any other pointers, tips or
>>>             good resources would be appreciated. E.g  is there a
>>>             totally different methodology I'm not considering - e.g.
>>>             would using something like pycparser be a lot easier -
>>>             though I'd prefer to keep it in clang as we plan to use
>>>             tidy & static analyzer in any case for standard QA.
>>>
>>>             Thanks for reading,
>>>             Billy.
>>>
>>>             _______________________________________________
>>>             cfe-dev mailing list
>>>             cfe-dev at lists.llvm.org  <mailto:cfe-dev at lists.llvm.org>
>>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190910/4403b3ee/attachment.html>