[cfe-dev] clang-tidy or static analyzer or ...

Billy O'Mahony via cfe-dev cfe-dev at lists.llvm.org
Thu Sep 19 06:41:17 PDT 2019


Hi Artem, Stephen,

Success \o/

I can detect multiple returns now and should be able to figure out the
other things I want to do also.

One or two comments in-line.

Thank you both for your help,

Billy.



Stephen wrote:
> On 10/09/2019 22:47, Billy O'Mahony via cfe-dev wrote:
> > Hi Artem,
> >
> > (or anyone else :D ).
> >
> > I managed to create a dummy clang-tidy check and run it.
> >
> > For the tests I want to write I've decided the easiest first step is to
> > just check for more than one return statement in my functions (this
will
> > only be applying to specific functions all with a common prefix e.g
> > "api_foo".
> >
> > So I managed to write a matcher which seems to work ok.
> >      Finder->addMatcher(returnStmt().bind("x"), this);
> >
> > And get it to print a warning for each rtn statement in MyCheck::check:
> >      const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x");
> >      diag(MatchedDecl->getReturnLoc(), "found rtn stmt");
> >
> > So next I want to be able to find the name of the function that
contains
> > the returnStatement but I can't figure out how to do this.
>
> .
>   Finder->addMatcher(
>      returnStmt(
>          forFunction(functionDecl().bind("func"))
>          ).bind("returnStmt"),
>      this);
>
>
> Your `check` function will then be called once for each return statement
> and you can extract both bound nodes.

This is doing the trick very nicely.

>
> >
> > Or maybe I should start with a FunctionDecl matcher that matches fns
> > named (api_...) and then somehow traverse that function's AST and count
> > the returnStmt's. ?
>
> If you wish to do counting, then traversing from the Function might
> indeed make more sense.

Ok. If I can't figure it out I'll post again. But using the double-bind and
recording fn names in a
std::set is working for me right now.

>
> > I've found a few examples of clang-tidy checks on-line I think this one
> >
https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html
> > is the best. But recommendations for better resources also appreciated.
>
>
> I linked to some resources I wrote here which you might find useful:
>
>
>
https://steveire.wordpress.com/2018/11/11/future-developments-in-clang-query/
>
> You can also see some of the other posts on my blog.

Thanks, I'll have a look.
>
> Thanks,
>
> Stephen.
>

On Tue, 10 Sep 2019 at 23:46, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> Yup, the most principled way of doing this with ASTMatchers is to start
> with the function decl and then recurse inside it:
>
>     functionDecl(matchesName(...), forEachDescendant(returnStmt(...)))
>
> You can always do this in an inside out, but it most likely has
> performance implications (i never really understood how ASTMatcher
> performance works as i've never had any real performance problems with
> them):
>
>     returnStmt(..., hasAncestor(functionDecl(matchesName(...))))
>
> You should totally use the official reference as your manual because it
> gives an up-to-date overview of all the stuff that we have, with a lot of
> examples: https://clang.llvm.org/docs/LibASTMatchersReference.html
>
>
>
> On 9/10/19 2:47 PM, Billy O'Mahony wrote:
>
> Hi Artem,
>
> (or anyone else :D ).
>
> I managed to create a dummy clang-tidy check and run it.
>
> For the tests I want to write I've decided the easiest first step is to
> just check for more than one return statement in my functions (this will
> only be applying to specific functions all with a common prefix e.g
> "api_foo".
>
> So I managed to write a matcher which seems to work ok.
>     Finder->addMatcher(returnStmt().bind("x"), this);
>
> And get it to print a warning for each rtn statement in MyCheck::check:
>     const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x");
>     diag(MatchedDecl->getReturnLoc(), "found rtn stmt");
>
> So next I want to be able to find the name of the function that contains
> the returnStatement but I can't figure out how to do this.
>
> Or maybe I should start with a FunctionDecl matcher that matches fns named
> (api_...) and then somehow traverse that function's AST and count the
> returnStmt's. ?
>
> I've found a few examples of clang-tidy checks on-line I think this one
> https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html
> is the best. But recommendations for better resources also appreciated.
>
> Thanks for any help,
> Billy.
>
>
> On Sat, 6 Jul 2019 at 13:41, Billy O'Mahony <billy.omahony at gmail.com>
> wrote:
>
>> Hi Artem,
>>
>> super. Good to know I'm on the right path. And thanks for the extra info
>> - plenty of clang goodness to be looking into there.
>>
>> Cheers,
>> Billy.
>>
>> On Sat, 6 Jul 2019 at 04:05, Artem Dergachev <noqnoqneo at gmail.com> wrote:
>>
>>> > It sounds like clang-tidy will be the sweet spot between usefulness
>>> and effort to implement.
>>>
>>> Yup, i approve. You can experiment with clang-query real quick and it
>>> should give you an idea. If you find out that some of the useful
>>> ASTMatchers are missing, feel free to add them - either locally or
>>> upstreamed, as it's super easy.
>>>
>>>
>>> > By CFG you mean the Clang Static Analyzer?
>>>
>>> Clang CFG is https://clang.llvm.org/doxygen/classclang_1_1CFG.html .
>>>
>>> It's a "source-level" control flow graph that consists of pointers to
>>> AST statements, ordered in the control flow order (i.e., in order of
>>> execution). A directed path in the CFG from statement A to statement B
>>> would tell you that "B can potentially be executed after A". This allows
>>> you to use your favorite graph algorithms to reason about the program's
>>> behavior.
>>>
>>> The Static Analyzer works over Clang CFG, but it's much more than that:
>>> the Static Analyzer also understands the semantics of every statement and
>>> can model their effects over the "symbolic" state of the program. For
>>> example, in the following code:
>>>
>>> void foo(bool x) {
>>>   if (x) {
>>>   }
>>>   if (x) {
>>>   }
>>> }
>>>
>>> the CFG would be a double-diamond: the control branches off at the first
>>> statement, then joins back at the end, then branches off again, then joins
>>> back. But the Static Analyzer would understand that the if-conditions are
>>> the same, therefore these are simply two unrelated execution paths, and the
>>> execution path in which the first branch goes to true and the second branch
>>> goes to false is in fact impossible. And if 'x' is overwritten in between,
>>> the Static Analyzer would also know that.
>>>
>>> Apart from the Static Analyzer, the CFG is used for some clang warnings,
>>> such as -Wuninitialized (see AnalysisBasedWarnings.cpp). There are some
>>> ready-made analyses for common data flow problems implemented over Clang
>>> CFG (such as live variables analysis, dominators analysis, etc.).
>>>
>>> Clang CFG is not used during the actual compilation / code generation.
>>> When Chris Lattner talks about MLIR and mentions that Clang should have had
>>> a "CIL", he means that the Clang CFG (or something similar) should have
>>> been used for compilation *as well as* for analysis. This would, in
>>> particular, allow semantic optimizations that require information that's
>>> already lost in LLVM IR.
>>>
>>> Because the CFG is not used for compilation, it's not necessarily
>>> correct. There are a few known bugs in it, mostly around complicated C++
>>> and also around GNU extensions such as the *binary* operator ?: or
>>> statement-expressions. But for plain C it should be nearly perfect.
>>>
>>>
>>> > Does 'copy  around' include passing to my private fns such as tweak()?
>>>
>>> That's entirely up to you. What i was trying to say is that it if you
>>> allow copying 'dev' into another variable, it will become much harder to
>>> implement your analysis, so you'd much rather forbid the user to do this.
>>> You might also allow the user to do this and suffer from imprecision in
>>> your analysis. At the same time, because your analysis is not
>>> inter-procedural, passing 'dev' into a function should be fine. The
>>> function can still return it back "laundered" so the user would be able to
>>> assign it into a variable behind your back. But these are the usual
>>> trade-offs of static analysis, don't be too scared by them - after all it
>>> all boils down to the halting problem :)
>>>
>>> On 7/4/19 3:40 AM, Billy O'Mahony wrote:
>>>
>>> Hi Artem,
>>>
>>> thanks for your well thought-out and useful reply. It sounds like
>>> clang-tidy will be the sweet spot between usefulness and effort to
>>> implement.
>>>
>>> I have a few other responses down below.
>>>
>>> Regards,
>>> Billy.
>>>
>>> On Wed, 3 Jul 2019 at 23:23, Artem Dergachev <noqnoqneo at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> It depends on how strict do you want the checking be and on the details
>>>> of the rule. If you're designing a new API from scratch and stuck with gcc
>>>> forever, i wouldn't mind using the gcc __attribute__((cleanup())) for your
>>>> purpose.
>>>>
>>>> I didn't know about that gcc attrib. I need to read the gcc manual
>>> attrib section! I should've added that while we will be developing on gcc
>>> the code should be compilable on other toolchains/OSs also, so we are
>>> avoiding any gcc extensions (e.g. gcc has extensions for thread local
>>> storage but we are not using those for the same reason).
>>>
>>>
>>>> The rule you described should be reasonably easy to implement with the
>>>> Static Analyzer. The good side of it is that you get a lot of semantic
>>>> modeling for free. For instance, if the developer copies `dev` into a local
>>>> variable and then uses that local variable outside of api_enter..api_exit,
>>>> the tool will be able to handle transparently, as it deals with values
>>>> rather than with variables.
>>>>
>>> That is really cool.
>>>
>>> Also it will probably be the easiest tool for your problem. The downside
>>>> would be that it's not guaranteed to find all bugs; it'll inevitably give
>>>> up on complicated code with high cyclomatic complexity :) So if you want
>>>> strict/paranoid enforcement of rules, the Static Analyzer is not the right
>>>> tool. But if you want to simply find some bugs for free, it's the right
>>>> tool.
>>>>
>>>> It sounds as if your problem is not inter-procedural. Let me
>>>> double-check this: would you have another api_enter..api_exit pair in the
>>>> body of your tweak() function? Or is just one api_enter..api_exit enough?
>>>> Or is it a bug to call api_enter twice without an api_exit in between?
>>>>
>>> Yes in this case tweak() could be another public fn of the api that
>>> would also have an enter/exit pair. But we are using recursive mutexes (a
>>> thread can acquire the same mutex N times (ie call api_enter)  so long as
>>> it also releases N times) so that will be okay.
>>>
>>>
>>>> If you have to write api_enter..api_exit in *every* function that deals
>>>> with devices, then the problem is not inter-procedural, which makes it much
>>>> easier. In particular, you should be able to come up with a purely
>>>> syntactic analysis ("every function that accesses a device_t must start
>>>> with api_enter() and must end in exactly one spot with api_exit()").
>>>>
>>> We will only insist on enter/exit in public API functions. Which are the
>>> only ones a client application can call. Internal private functions we
>>> won't have locking (as they can only be called ultimately from a public
>>> function so the device will be locked.) We are going to allow private fns
>>> to call back into the api via the public i/face. But we will have the
>>> public functions in specific files and have a specific naming prefix.
>>>
>>>
>>>> Such analysis should be easily do-able in clang-tidy as long as you're
>>>> satisfied with this level of aggressiveness. In particular, you'll have to
>>>> be willing to sacrifice code like this:
>>>>
>>>> void foo(device_t *dev) {
>>>>   if (flip_a_coin()) {
>>>>     api_enter(dev);
>>>>     ...
>>>>     api_exit(dev);
>>>>   }
>>>> }
>>>>
>>>> But it may be perfectly fine if you seriously want to enforce a strict
>>>> structure on all your functions that deal with devices.
>>>>
>>>
>>> So is it the case that clang-tidy kind of passes info to the
>>> checker-extension in a syntactic code-parsing order. Whereas the static
>>> analyzer passes information to the checker in a simulated run-time order?
>>>
>>> E.g in your foo() above my proposed checker gets fed 1) theres a
>>> function called foo, 2) theres an if with a call to flip_a_coin 3) in the
>>> true case there is a call to enter then exit 4) in the else there is
>>> nothing 5) there is a return (at which point my checker would need to be
>>> pretty smart and hold a lot of state to figure out something was wrong) .
>>> And to compare for the static analyzer it's more like 1) there is fn foo
>>> 2.1) there is a code path through foo with enter/exit 2.2) there is a code
>>> path with just return (at which point my reasonably simple checker would
>>> raise an error).
>>>
>>>
>>>>
>>>> I think the truly-truly right tool for your problem would be to come up
>>>> with a custom analysis over Clang CFG.
>>>>
>>> .
>>>>
>>> By CFG you mean the Clang Static Analyzer?
>>>
>>>
>>>> It would be harder to implement, but it would allow you to express
>>>> things like "every execution path within a function that accesses `dev`
>>>> must have a api_enter before it and an api_exit after it; you are not
>>>> allowed to copy `dev` around". This would strictly enforce the rule.
>>>>
>>> Yes that would be great. But I think just using clang-tidy from what you
>>> are saying would get us a long way. And there are heaps of simpler checks
>>> we would like to implement also.
>>>
>>> At the same time it'll allow you to lift the requirement of exactly one
>>>> return point - you would still be able to ensure that all accesses are
>>>> covered. If you need to allow to copy `dev` around, it should still be
>>>> doable, but it will be significantly more difficult to implement
>>>>
>>> Does 'copy  around' include passing to my private fns such as tweak()?.
>>> We don't need to copy dev anywhere within the public fns but we do need it
>>> to pass it to private fns.
>>>
>>>
>>>> On 7/2/19 12:13 PM, Billy O'Mahony via cfe-dev wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'd like to write a rule for either clang-tidy or static analyzer to
>>>> help catch some potential errors in a project I'm working on.
>>>>
>>>> My questions are:
>>>>     a) is only one or the other will be able to do what I want to do?
>>>>     b) if both are feasible which would have the simpler implementation?
>>>>
>>>> The project involves writing an API that will run in a multi-threaded
>>>> application and is responsible for serializing all access to a device
>>>> structure. Therefore the first thing in every function in the API must be
>>>> to call api_enter (which will among other things acquire a mutex on the
>>>> device) and the last thing before returning must be to call api_exit. Also
>>>> I want to enforce single exit point from every API function - or certainly
>>>> if there are any return points that bypass the api_exit call.
>>>>
>>>> So here is an example function with errors I want to catch highlighted.
>>>>
>>>> int api_foo(device_t *dev) {
>>>>     int ret_val = 0;
>>>>
>>>>     bar();  // fn calls & decls before api_enter is ok- just don't
>>>> access dev.
>>>>     dev->bla = 1; // NO! device access before api_enter() called
>>>>     api_enter(dev);   // error if this call is not present exactly once
>>>>
>>>>     if (dev->bla)
>>>>         return; // NO! didn't call api_exit before rtn. Also two return
>>>> points
>>>>
>>>>     if (dev->ma) {
>>>>         ret_val = 1;
>>>>         goto cleanup;
>>>>     }
>>>>     tweak(dev);
>>>>
>>>> cleanup:
>>>>     api_exit(dev); // error if this is not present exactly once
>>>>     dev->bla = 1; //NO! device access after api_exit()
>>>>     return ret_val;
>>>> }
>>>>
>>>> I don't think it matters but the project is C compiled with gcc.
>>>>
>>>> Also if both are feasible any other pointers, tips or good resources
>>>> would be appreciated. E.g  is there a totally different methodology I'm not
>>>> considering - e.g. would using something like pycparser be a lot easier -
>>>> though I'd prefer to keep it in clang as we plan to use tidy & static
>>>> analyzer in any case for standard QA.
>>>>
>>>> Thanks for reading,
>>>> Billy.
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190919/a0b3b2cf/attachment.html>


More information about the cfe-dev mailing list