[cfe-dev] clang-tidy or static analyzer or ...

Sat Jul 6 05:41:30 PDT 2019

Hi Artem,

super. Good to know I'm on the right path. And thanks for the extra info -
plenty of clang goodness to be looking into there.

Cheers,
Billy.

On Sat, 6 Jul 2019 at 04:05, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> > It sounds like clang-tidy will be the sweet spot between usefulness and
> effort to implement.
>
> Yup, i approve. You can experiment with clang-query real quick and it
> should give you an idea. If you find out that some of the useful
> ASTMatchers are missing, feel free to add them - either locally or
> upstreamed, as it's super easy.
>
>
> > By CFG you mean the Clang Static Analyzer?
>
> Clang CFG is https://clang.llvm.org/doxygen/classclang_1_1CFG.html .
>
> It's a "source-level" control flow graph that consists of pointers to AST
> statements, ordered in the control flow order (i.e., in order of
> execution). A directed path in the CFG from statement A to statement B
> would tell you that "B can potentially be executed after A". This allows
> you to use your favorite graph algorithms to reason about the program's
> behavior.
>
> The Static Analyzer works over Clang CFG, but it's much more than that:
> the Static Analyzer also understands the semantics of every statement and
> can model their effects over the "symbolic" state of the program. For
> example, in the following code:
>
> void foo(bool x) {
>   if (x) {
>   }
>   if (x) {
>   }
> }
>
> the CFG would be a double-diamond: the control branches off at the first
> statement, then joins back at the end, then branches off again, then joins
> back. But the Static Analyzer would understand that the if-conditions are
> the same, therefore these are simply two unrelated execution paths, and the
> execution path in which the first branch goes to true and the second branch
> goes to false is in fact impossible. And if 'x' is overwritten in between,
> the Static Analyzer would also know that.
>
> Apart from the Static Analyzer, the CFG is used for some clang warnings,
> such as -Wuninitialized (see AnalysisBasedWarnings.cpp). There are some
> ready-made analyses for common data flow problems implemented over Clang
> CFG (such as live variables analysis, dominators analysis, etc.).
>
> Clang CFG is not used during the actual compilation / code generation.
> When Chris Lattner talks about MLIR and mentions that Clang should have had
> a "CIL", he means that the Clang CFG (or something similar) should have
> been used for compilation *as well as* for analysis. This would, in
> particular, allow semantic optimizations that require information that's
> already lost in LLVM IR.
>
> Because the CFG is not used for compilation, it's not necessarily correct.
> There are a few known bugs in it, mostly around complicated C++ and also
> around GNU extensions such as the *binary* operator ?: or
> statement-expressions. But for plain C it should be nearly perfect.
>
>
> > Does 'copy  around' include passing to my private fns such as tweak()?
>
> That's entirely up to you. What i was trying to say is that it if you
> allow copying 'dev' into another variable, it will become much harder to
> implement your analysis, so you'd much rather forbid the user to do this.
> You might also allow the user to do this and suffer from imprecision in
> your analysis. At the same time, because your analysis is not
> inter-procedural, passing 'dev' into a function should be fine. The
> function can still return it back "laundered" so the user would be able to
> assign it into a variable behind your back. But these are the usual
> trade-offs of static analysis, don't be too scared by them - after all it
> all boils down to the halting problem :)
>
> On 7/4/19 3:40 AM, Billy O'Mahony wrote:
>
> Hi Artem,
>
> thanks for your well thought-out and useful reply. It sounds like
> clang-tidy will be the sweet spot between usefulness and effort to
> implement.
>
> I have a few other responses down below.
>
> Regards,
> Billy.
>
> On Wed, 3 Jul 2019 at 23:23, Artem Dergachev <noqnoqneo at gmail.com> wrote:
>
>> Hi,
>>
>> It depends on how strict do you want the checking be and on the details
>> of the rule. If you're designing a new API from scratch and stuck with gcc
>> forever, i wouldn't mind using the gcc __attribute__((cleanup())) for your
>> purpose.
>>
>> I didn't know about that gcc attrib. I need to read the gcc manual attrib
> section! I should've added that while we will be developing on gcc the code
> should be compilable on other toolchains/OSs also, so we are avoiding any
> gcc extensions (e.g. gcc has extensions for thread local storage but we are
> not using those for the same reason).
>
>
>> The rule you described should be reasonably easy to implement with the
>> Static Analyzer. The good side of it is that you get a lot of semantic
>> modeling for free. For instance, if the developer copies `dev` into a local
>> variable and then uses that local variable outside of api_enter..api_exit,
>> the tool will be able to handle transparently, as it deals with values
>> rather than with variables.
>>
> That is really cool.
>
> Also it will probably be the easiest tool for your problem. The downside
>> would be that it's not guaranteed to find all bugs; it'll inevitably give
>> up on complicated code with high cyclomatic complexity :) So if you want
>> strict/paranoid enforcement of rules, the Static Analyzer is not the right
>> tool. But if you want to simply find some bugs for free, it's the right
>> tool.
>>
>> It sounds as if your problem is not inter-procedural. Let me double-check
>> this: would you have another api_enter..api_exit pair in the body of your
>> tweak() function? Or is just one api_enter..api_exit enough? Or is it a bug
>> to call api_enter twice without an api_exit in between?
>>
> Yes in this case tweak() could be another public fn of the api that would
> also have an enter/exit pair. But we are using recursive mutexes (a thread
> can acquire the same mutex N times (ie call api_enter)  so long as it also
> releases N times) so that will be okay.
>
>
>> If you have to write api_enter..api_exit in *every* function that deals
>> with devices, then the problem is not inter-procedural, which makes it much
>> easier. In particular, you should be able to come up with a purely
>> syntactic analysis ("every function that accesses a device_t must start
>> with api_enter() and must end in exactly one spot with api_exit()").
>>
> We will only insist on enter/exit in public API functions. Which are the
> only ones a client application can call. Internal private functions we
> won't have locking (as they can only be called ultimately from a public
> function so the device will be locked.) We are going to allow private fns
> to call back into the api via the public i/face. But we will have the
> public functions in specific files and have a specific naming prefix.
>
>
>> Such analysis should be easily do-able in clang-tidy as long as you're
>> satisfied with this level of aggressiveness. In particular, you'll have to
>> be willing to sacrifice code like this:
>>
>> void foo(device_t *dev) {
>>   if (flip_a_coin()) {
>>     api_enter(dev);
>>     ...
>>     api_exit(dev);
>>   }
>> }
>>
>> But it may be perfectly fine if you seriously want to enforce a strict
>> structure on all your functions that deal with devices.
>>
>
> So is it the case that clang-tidy kind of passes info to the
> checker-extension in a syntactic code-parsing order. Whereas the static
> analyzer passes information to the checker in a simulated run-time order?
>
> E.g in your foo() above my proposed checker gets fed 1) theres a function
> called foo, 2) theres an if with a call to flip_a_coin 3) in the true case
> there is a call to enter then exit 4) in the else there is nothing 5) there
> is a return (at which point my checker would need to be pretty smart and
> hold a lot of state to figure out something was wrong) . And to compare for
> the static analyzer it's more like 1) there is fn foo 2.1) there is a code
> path through foo with enter/exit 2.2) there is a code path with just return
> (at which point my reasonably simple checker would raise an error).
>
>
>>
>> I think the truly-truly right tool for your problem would be to come up
>> with a custom analysis over Clang CFG.
>>
> .
>>
> By CFG you mean the Clang Static Analyzer?
>
>
>> It would be harder to implement, but it would allow you to express things
>> like "every execution path within a function that accesses `dev` must have
>> a api_enter before it and an api_exit after it; you are not allowed to copy
>> `dev` around". This would strictly enforce the rule.
>>
> Yes that would be great. But I think just using clang-tidy from what you
> are saying would get us a long way. And there are heaps of simpler checks
> we would like to implement also.
>
> At the same time it'll allow you to lift the requirement of exactly one
>> return point - you would still be able to ensure that all accesses are
>> covered. If you need to allow to copy `dev` around, it should still be
>> doable, but it will be significantly more difficult to implement
>>
> Does 'copy  around' include passing to my private fns such as tweak()?. We
> don't need to copy dev anywhere within the public fns but we do need it to
> pass it to private fns.
>
>
>> On 7/2/19 12:13 PM, Billy O'Mahony via cfe-dev wrote:
>>
>> Hello,
>>
>> I'd like to write a rule for either clang-tidy or static analyzer to help
>> catch some potential errors in a project I'm working on.
>>
>> My questions are:
>>     a) is only one or the other will be able to do what I want to do?
>>     b) if both are feasible which would have the simpler implementation?
>>
>> The project involves writing an API that will run in a multi-threaded
>> application and is responsible for serializing all access to a device
>> structure. Therefore the first thing in every function in the API must be
>> to call api_enter (which will among other things acquire a mutex on the
>> device) and the last thing before returning must be to call api_exit. Also
>> I want to enforce single exit point from every API function - or certainly
>> if there are any return points that bypass the api_exit call.
>>
>> So here is an example function with errors I want to catch highlighted.
>>
>> int api_foo(device_t *dev) {
>>     int ret_val = 0;
>>
>>     bar();  // fn calls & decls before api_enter is ok- just don't access
>> dev.
>>     dev->bla = 1; // NO! device access before api_enter() called
>>     api_enter(dev);   // error if this call is not present exactly once
>>
>>     if (dev->bla)
>>         return; // NO! didn't call api_exit before rtn. Also two return
>> points
>>
>>     if (dev->ma) {
>>         ret_val = 1;
>>         goto cleanup;
>>     }
>>     tweak(dev);
>>
>> cleanup:
>>     api_exit(dev); // error if this is not present exactly once
>>     dev->bla = 1; //NO! device access after api_exit()
>>     return ret_val;
>> }
>>
>> I don't think it matters but the project is C compiled with gcc.
>>
>> Also if both are feasible any other pointers, tips or good resources
>> would be appreciated. E.g  is there a totally different methodology I'm not
>> considering - e.g. would using something like pycparser be a lot easier -
>> though I'd prefer to keep it in clang as we plan to use tidy & static
>> analyzer in any case for standard QA.
>>
>> Thanks for reading,
>> Billy.
>>
>> _______________________________________________
>> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190706/9c2257d2/attachment.html>