[cfe-dev] clang-tidy or static analyzer or ...

Thu Jul 4 03:40:26 PDT 2019

Hi Artem,

thanks for your well thought-out and useful reply. It sounds like
clang-tidy will be the sweet spot between usefulness and effort to
implement.

I have a few other responses down below.

Regards,
Billy.

On Wed, 3 Jul 2019 at 23:23, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> Hi,
>
> It depends on how strict do you want the checking be and on the details of
> the rule. If you're designing a new API from scratch and stuck with gcc
> forever, i wouldn't mind using the gcc __attribute__((cleanup())) for your
> purpose.
>
> I didn't know about that gcc attrib. I need to read the gcc manual attrib
section! I should've added that while we will be developing on gcc the code
should be compilable on other toolchains/OSs also, so we are avoiding any
gcc extensions (e.g. gcc has extensions for thread local storage but we are
not using those for the same reason).

> The rule you described should be reasonably easy to implement with the
> Static Analyzer. The good side of it is that you get a lot of semantic
> modeling for free. For instance, if the developer copies `dev` into a local
> variable and then uses that local variable outside of api_enter..api_exit,
> the tool will be able to handle transparently, as it deals with values
> rather than with variables.
>
That is really cool.

Also it will probably be the easiest tool for your problem. The downside
> would be that it's not guaranteed to find all bugs; it'll inevitably give
> up on complicated code with high cyclomatic complexity :) So if you want
> strict/paranoid enforcement of rules, the Static Analyzer is not the right
> tool. But if you want to simply find some bugs for free, it's the right
> tool.
>
> It sounds as if your problem is not inter-procedural. Let me double-check
> this: would you have another api_enter..api_exit pair in the body of your
> tweak() function? Or is just one api_enter..api_exit enough? Or is it a bug
> to call api_enter twice without an api_exit in between?
>
Yes in this case tweak() could be another public fn of the api that would
also have an enter/exit pair. But we are using recursive mutexes (a thread
can acquire the same mutex N times (ie call api_enter)  so long as it also
releases N times) so that will be okay.

> If you have to write api_enter..api_exit in *every* function that deals
> with devices, then the problem is not inter-procedural, which makes it much
> easier. In particular, you should be able to come up with a purely
> syntactic analysis ("every function that accesses a device_t must start
> with api_enter() and must end in exactly one spot with api_exit()").
>
We will only insist on enter/exit in public API functions. Which are the
only ones a client application can call. Internal private functions we
won't have locking (as they can only be called ultimately from a public
function so the device will be locked.) We are going to allow private fns
to call back into the api via the public i/face. But we will have the
public functions in specific files and have a specific naming prefix.

> Such analysis should be easily do-able in clang-tidy as long as you're
> satisfied with this level of aggressiveness. In particular, you'll have to
> be willing to sacrifice code like this:
>
> void foo(device_t *dev) {
>   if (flip_a_coin()) {
>     api_enter(dev);
>     ...
>     api_exit(dev);
>   }
> }
>
> But it may be perfectly fine if you seriously want to enforce a strict
> structure on all your functions that deal with devices.
>

So is it the case that clang-tidy kind of passes info to the
checker-extension in a syntactic code-parsing order. Whereas the static
analyzer passes information to the checker in a simulated run-time order?

E.g in your foo() above my proposed checker gets fed 1) theres a function
called foo, 2) theres an if with a call to flip_a_coin 3) in the true case
there is a call to enter then exit 4) in the else there is nothing 5) there
is a return (at which point my checker would need to be pretty smart and
hold a lot of state to figure out something was wrong) . And to compare for
the static analyzer it's more like 1) there is fn foo 2.1) there is a code
path through foo with enter/exit 2.2) there is a code path with just return
(at which point my reasonably simple checker would raise an error).

>
> I think the truly-truly right tool for your problem would be to come up
> with a custom analysis over Clang CFG.
>
.
>
By CFG you mean the Clang Static Analyzer?

> It would be harder to implement, but it would allow you to express things
> like "every execution path within a function that accesses `dev` must have
> a api_enter before it and an api_exit after it; you are not allowed to copy
> `dev` around". This would strictly enforce the rule.
>
Yes that would be great. But I think just using clang-tidy from what you
are saying would get us a long way. And there are heaps of simpler checks
we would like to implement also.

At the same time it'll allow you to lift the requirement of exactly one
> return point - you would still be able to ensure that all accesses are
> covered. If you need to allow to copy `dev` around, it should still be
> doable, but it will be significantly more difficult to implement
>
Does 'copy  around' include passing to my private fns such as tweak()?. We
don't need to copy dev anywhere within the public fns but we do need it to
pass it to private fns.

> On 7/2/19 12:13 PM, Billy O'Mahony via cfe-dev wrote:
>
> Hello,
>
> I'd like to write a rule for either clang-tidy or static analyzer to help
> catch some potential errors in a project I'm working on.
>
> My questions are:
>     a) is only one or the other will be able to do what I want to do?
>     b) if both are feasible which would have the simpler implementation?
>
> The project involves writing an API that will run in a multi-threaded
> application and is responsible for serializing all access to a device
> structure. Therefore the first thing in every function in the API must be
> to call api_enter (which will among other things acquire a mutex on the
> device) and the last thing before returning must be to call api_exit. Also
> I want to enforce single exit point from every API function - or certainly
> if there are any return points that bypass the api_exit call.
>
> So here is an example function with errors I want to catch highlighted.
>
> int api_foo(device_t *dev) {
>     int ret_val = 0;
>
>     bar();  // fn calls & decls before api_enter is ok- just don't access
> dev.
>     dev->bla = 1; // NO! device access before api_enter() called
>     api_enter(dev);   // error if this call is not present exactly once
>
>     if (dev->bla)
>         return; // NO! didn't call api_exit before rtn. Also two return
> points
>
>     if (dev->ma) {
>         ret_val = 1;
>         goto cleanup;
>     }
>     tweak(dev);
>
> cleanup:
>     api_exit(dev); // error if this is not present exactly once
>     dev->bla = 1; //NO! device access after api_exit()
>     return ret_val;
> }
>
> I don't think it matters but the project is C compiled with gcc.
>
> Also if both are feasible any other pointers, tips or good resources would
> be appreciated. E.g  is there a totally different methodology I'm not
> considering - e.g. would using something like pycparser be a lot easier -
> though I'd prefer to keep it in clang as we plan to use tidy & static
> analyzer in any case for standard QA.
>
> Thanks for reading,
> Billy.
>
> _______________________________________________
> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190704/3fe5a66a/attachment.html>