<div dir="ltr"><div>Hi Artem, Stephen,</div><div><br></div><div>Success \o/ <br></div><div><br></div><div>I can detect multiple returns now and should be able to figure out the other things I want to do also.</div><div><br></div><div>One or two comments in-line.</div><div><br></div><div>Thank you both for your help,</div><div><br></div><div>Billy.<br></div><div><br></div><div><br></div><div><br></div><div>Stephen wrote:</div>> On 10/09/2019 22:47, Billy O'Mahony via cfe-dev wrote:<br>> > Hi Artem,<br>> > <br>> > (or anyone else :D ).<br>> > <br>> > I managed to create a dummy clang-tidy check and run it.<br>> > <br>> > For the tests I want to write I've decided the easiest first step is to <br>> > just check for more than one return statement in my functions (this will <br>> > only be applying to specific functions all with a common prefix e.g <br>> > "api_foo".<br>> > <br>> > So I managed to write a matcher which seems to work ok.<br>> > Finder->addMatcher(returnStmt().bind("x"), this);<br>> > <br>> > And get it to print a warning for each rtn statement in MyCheck::check:<br>> > const auto *MatchedDecl = Result.Nodes.getNodeAs<ReturnStmt>("x");<br>> > diag(MatchedDecl->getReturnLoc(), "found rtn stmt");<br>> > <br>> > So next I want to be able to find the name of the function that contains <br>> > the returnStatement but I can't figure out how to do this.<br>> <br>> .<br>> Finder->addMatcher(<br>> returnStmt(<br>> forFunction(functionDecl().bind("func"))<br>> ).bind("returnStmt"),<br><div>> this);</div><div></div>> <br>> <br>> Your `check` function will then be called once for each return statement <br><div>> and you can extract both bound nodes.</div><div>
<div><br></div><div>This is doing the trick very nicely.</div>
</div><div><br></div><div>> </div>> > <br>> > Or maybe I should start with a FunctionDecl matcher that matches fns <br>> > named (api_...) and then somehow traverse that function's AST and count <br>> > the returnStmt's. ?<br>> <br>> If you wish to do counting, then traversing from the Function might <br><div>> indeed make more sense.</div><div><br></div><div>Ok. If I can't figure it out I'll post again. But using the double-bind and recording fn names in a <br></div><div>std::set is working for me right now.<br></div><br>> <br>> > I've found a few examples of clang-tidy checks on-line I think this one <br>> > <a href="https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html">https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html</a> <br>> > is the best. But recommendations for better resources also appreciated.<br>> <br>> <br>> I linked to some resources I wrote here which you might find useful:<br>> <br>> <br><div>> <a href="https://steveire.wordpress.com/2018/11/11/future-developments-in-clang-query/">https://steveire.wordpress.com/2018/11/11/future-developments-in-clang-query/</a></div>> <br><div>> You can also see some of the other posts on my blog.</div><div><br></div><div>Thanks, I'll have a look.<br></div>> <br>> Thanks,<br>> <br>> Stephen.<br>> </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 10 Sep 2019 at 23:46, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
Yup, the most principled way of doing this with ASTMatchers is to
start with the function decl and then recurse inside it:<br>
<br>
functionDecl(matchesName(...),
forEachDescendant(returnStmt(...)))<br>
<br>
You can always do this in an inside out, but it most likely has
performance implications (i never really understood how ASTMatcher
performance works as i've never had any real performance problems
with them):<br>
<br>
returnStmt(..., hasAncestor(functionDecl(matchesName(...))))<br>
<br>
You should totally use the official reference as your manual because
it gives an up-to-date overview of all the stuff that we have, with
a lot of examples:
<a href="https://clang.llvm.org/docs/LibASTMatchersReference.html" target="_blank">https://clang.llvm.org/docs/LibASTMatchersReference.html</a><br>
<br>
<br>
<br>
<div>On 9/10/19 2:47 PM, Billy O'Mahony
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Artem,</div>
<div><br>
</div>
<div>(or anyone else :D ).</div>
<div><br>
</div>
<div>I managed to create a dummy clang-tidy check and run it.</div>
<div><br>
</div>
<div>
<div>For the tests I want to write I've decided the easiest
first step is to just check for more than one return
statement in my functions (this will only be applying to
specific functions all with a common prefix e.g "api_foo".</div>
</div>
<div><br>
</div>
<div>So I managed to write a matcher which seems to work ok.</div>
<div> Finder->addMatcher(returnStmt().bind("x"), this);</div>
<div><br>
</div>
<div>And get it to print a warning for each rtn statement in
MyCheck::check:</div>
<div> const auto *MatchedDecl =
Result.Nodes.getNodeAs<ReturnStmt>("x");<br>
diag(MatchedDecl->getReturnLoc(), "found rtn stmt");</div>
<div>
<div><br>
</div>
<div>So next I want to be able to find the name of the
function that contains the returnStatement but I can't
figure out how to do this.</div>
<div><br>
</div>
<div>Or maybe I should start with a FunctionDecl matcher that
matches fns named (api_...) and then somehow traverse that
function's AST and count the returnStmt's. ? <br>
</div>
<div><br>
</div>
<div>I've found a few examples of clang-tidy checks on-line I
think this one <a href="https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html" target="_blank">https://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html</a>
is the best. But recommendations for better resources also
appreciated.<br>
</div>
<div><br>
</div>
<div>Thanks for any help,</div>
<div>Billy.<br>
</div>
<div><br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, 6 Jul 2019 at 13:41,
Billy O'Mahony <<a href="mailto:billy.omahony@gmail.com" target="_blank">billy.omahony@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi Artem,</div>
<div><br>
</div>
<div>super. Good to know I'm on the right path. And thanks
for the extra info - plenty of clang goodness to be
looking into there.</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Billy.<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, 6 Jul 2019 at
04:05, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> > It sounds like clang-tidy
will be the sweet spot between usefulness and effort
to implement.<br>
<br>
Yup, i approve. You can experiment with clang-query
real quick and it should give you an idea. If you find
out that some of the useful ASTMatchers are missing,
feel free to add them - either locally or upstreamed,
as it's super easy.<br>
<br>
<br>
> By CFG you mean the Clang Static Analyzer? <br>
<br>
Clang CFG is <a href="https://clang.llvm.org/doxygen/classclang_1_1CFG.html" target="_blank">https://clang.llvm.org/doxygen/classclang_1_1CFG.html</a>
.<br>
<br>
It's a "source-level" control flow graph that consists
of pointers to AST statements, ordered in the control
flow order (i.e., in order of execution). A directed
path in the CFG from statement A to statement B would
tell you that "B can potentially be executed after A".
This allows you to use your favorite graph algorithms
to reason about the program's behavior.<br>
<br>
The Static Analyzer works over Clang CFG, but it's
much more than that: the Static Analyzer also
understands the semantics of every statement and can
model their effects over the "symbolic" state of the
program. For example, in the following code:<br>
<br>
void foo(bool x) {<br>
if (x) {<br>
}<br>
if (x) {<br>
}<br>
}<br>
<br>
the CFG would be a double-diamond: the control
branches off at the first statement, then joins back
at the end, then branches off again, then joins back.
But the Static Analyzer would understand that the
if-conditions are the same, therefore these are simply
two unrelated execution paths, and the execution path
in which the first branch goes to true and the second
branch goes to false is in fact impossible. And if 'x'
is overwritten in between, the Static Analyzer would
also know that.<br>
<br>
Apart from the Static Analyzer, the CFG is used for
some clang warnings, such as -Wuninitialized (see
AnalysisBasedWarnings.cpp). There are some ready-made
analyses for common data flow problems implemented
over Clang CFG (such as live variables analysis,
dominators analysis, etc.).<br>
<br>
Clang CFG is not used during the actual compilation /
code generation. When Chris Lattner talks about MLIR
and mentions that Clang should have had a "CIL", he
means that the Clang CFG (or something similar) should
have been used for compilation *as well as* for
analysis. This would, in particular, allow semantic
optimizations that require information that's already
lost in LLVM IR.<br>
<br>
Because the CFG is not used for compilation, it's not
necessarily correct. There are a few known bugs in it,
mostly around complicated C++ and also around GNU
extensions such as the *binary* operator ?: or
statement-expressions. But for plain C it should be
nearly perfect.<br>
<br>
<br>
> Does 'copy around' include passing to my private
fns such as tweak()?<br>
<br>
That's entirely up to you. What i was trying to say is
that it if you allow copying 'dev' into another
variable, it will become much harder to implement your
analysis, so you'd much rather forbid the user to do
this. You might also allow the user to do this and
suffer from imprecision in your analysis. At the same
time, because your analysis is not inter-procedural,
passing 'dev' into a function should be fine. The
function can still return it back "laundered" so the
user would be able to assign it into a variable behind
your back. But these are the usual trade-offs of
static analysis, don't be too scared by them - after
all it all boils down to the halting problem :)<br>
<br>
<div>On
7/4/19 3:40 AM, Billy O'Mahony wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Artem,</div>
<div><br>
</div>
<div>thanks for your well thought-out and useful
reply. It sounds like clang-tidy will be the
sweet spot between usefulness and effort to
implement.</div>
<div><br>
</div>
<div>I have a few other responses down below.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Billy.<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 3 Jul
2019 at 23:23, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Hi,<br>
<br>
It depends on how strict do you want the
checking be and on the details of the rule.
If you're designing a new API from scratch
and stuck with gcc forever, i wouldn't mind
using the gcc __attribute__((cleanup())) for
your purpose.<br>
<br>
</div>
</blockquote>
<div>I didn't know about that gcc attrib. I need
to read the gcc manual attrib section! I
should've added that while we will be
developing on gcc the code should be
compilable on other toolchains/OSs also, so we
are avoiding any gcc extensions (e.g. gcc has
extensions for thread local storage but we are
not using those for the same reason). <br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> The rule you described
should be reasonably easy to implement with
the Static Analyzer. The good side of it is
that you get a lot of semantic modeling for
free. For instance, if the developer copies
`dev` into a local variable and then uses
that local variable outside of
api_enter..api_exit, the tool will be able
to handle transparently, as it deals with
values rather than with variables.</div>
</blockquote>
<div>That is really cool.</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> Also it will probably
be the easiest tool for your problem. The
downside would be that it's not guaranteed
to find all bugs; it'll inevitably give up
on complicated code with high cyclomatic
complexity :) So if you want strict/paranoid
enforcement of rules, the Static Analyzer is
not the right tool. But if you want to
simply find some bugs for free, it's the
right tool.<br>
<br>
It sounds as if your problem is not
inter-procedural. Let me double-check this:
would you have another api_enter..api_exit
pair in the body of your tweak() function?
Or is just one api_enter..api_exit enough?
Or is it a bug to call api_enter twice
without an api_exit in between?<br>
</div>
</blockquote>
<div>Yes in this case tweak() could be another
public fn of the api that would also have an
enter/exit pair. But we are using recursive
mutexes (a thread can acquire the same mutex N
times (ie call api_enter) so long as it also
releases N times) so that will be okay. <br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
If you have to write api_enter..api_exit in
*every* function that deals with devices,
then the problem is not inter-procedural,
which makes it much easier. In particular,
you should be able to come up with a purely
syntactic analysis ("every function that
accesses a device_t must start with
api_enter() and must end in exactly one spot
with api_exit()").</div>
</blockquote>
<div>We will only insist on enter/exit in public
API functions. Which are the only ones a
client application can call. Internal private
functions we won't have locking (as they can
only be called ultimately from a public
function so the device will be locked.) We are
going to allow private fns to call back into
the api via the public i/face. But we will
have the public functions in specific files
and have a specific naming prefix.<br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> Such analysis should be easily do-able
in clang-tidy as long as you're satisfied
with this level of aggressiveness. In
particular, you'll have to be willing to
sacrifice code like this:<br>
<br>
void foo(device_t *dev) {<br>
if (flip_a_coin()) {<br>
api_enter(dev);<br>
...<br>
api_exit(dev);<br>
}<br>
}<br>
<br>
But it may be perfectly fine if you
seriously want to enforce a strict structure
on all your functions that deal with
devices.</div>
</blockquote>
<div><br>
</div>
<div>So is it the case that clang-tidy kind of
passes info to the checker-extension in a
syntactic code-parsing order. Whereas the
static analyzer passes information to the
checker in a simulated run-time order?</div>
<div><br>
</div>
<div>E.g in your foo() above my proposed checker
gets fed 1) theres a function called foo, 2)
theres an if with a call to flip_a_coin 3) in
the true case there is a call to enter then
exit 4) in the else there is nothing 5) there
is a return (at which point my checker would
need to be pretty smart and hold a lot of
state to figure out something was wrong) . And
to compare for the static analyzer it's more
like 1) there is fn foo 2.1) there is a code
path through foo with enter/exit 2.2) there is
a code path with just return (at which point
my reasonably simple checker would raise an
error).<br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> <br>
I think the truly-truly right tool for your
problem would be to come up with a custom
analysis over Clang CFG.</div>
</blockquote>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">.<br>
</div>
</blockquote>
<div>By CFG you mean the Clang Static
Analyzer? </div>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> It would be harder to implement, but it
would allow you to express things like
"every execution path within a function that
accesses `dev` must have a api_enter before
it and an api_exit after it; you are not
allowed to copy `dev` around". This would
strictly enforce the rule.</div>
</blockquote>
<div>Yes that would be great. But I think just
using clang-tidy from what you are saying
would get us a long way. And there are heaps
of simpler checks we would like to implement
also.<br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> At the same time it'll allow you to lift
the requirement of exactly one return point
- you would still be able to ensure that all
accesses are covered. If you need to allow
to copy `dev` around, it should still be
doable, but it will be significantly more
difficult to implement<br>
</div>
</blockquote>
<div>Does 'copy around' include passing to my
private fns such as tweak()?. We don't need to
copy dev anywhere within the public fns but we
do need it to pass it to private fns.<br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
<div>On
7/2/19 12:13 PM, Billy O'Mahony via
cfe-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hello,</div>
<div><br>
</div>
<div>I'd like to write a rule for either
clang-tidy or static analyzer to help
catch some potential errors in a
project I'm working on.</div>
<div><br>
</div>
<div>My questions are: <br>
</div>
<div> a) is only one or the other
will be able to do what I want to do?<br>
</div>
<div> b) if both are feasible which
would have the simpler implementation?</div>
<div><br>
</div>
<div>The project involves writing an API
that will run in a multi-threaded
application and is responsible for
serializing all access to a device
structure. Therefore the first thing
in every function in the API must be
to call api_enter (which will among
other things acquire a mutex on the
device) and the last thing before
returning must be to call api_exit.
Also I want to enforce single exit
point from every API function - or
certainly if there are any return
points that bypass the api_exit call.<br>
</div>
<div><br>
</div>
<div>So here is an example function with
errors I want to catch highlighted.</div>
<div><br>
</div>
<div>int api_foo(device_t *dev) {</div>
<div> int ret_val = 0;<br>
</div>
<div><br>
</div>
<div> bar(); // fn calls & decls
before api_enter is ok- just don't
access dev.</div>
<div> dev->bla = 1; // NO! device
access before api_enter() called</div>
<div> api_enter(dev); // error if
this call is not present exactly once</div>
<div><br>
</div>
<div> if (dev->bla)</div>
<div> return; // NO! didn't call
api_exit before rtn. Also two return
points</div>
<div><br>
</div>
<div> if (dev->ma) {</div>
<div> ret_val = 1;<br>
</div>
<div> goto cleanup;</div>
<div> }</div>
<div> tweak(dev);<br>
</div>
<div><br>
</div>
<div>cleanup:</div>
<div> api_exit(dev); // error if this
is not present exactly once<br>
</div>
<div> dev->bla = 1; //NO! device
access after api_exit()<br>
</div>
<div> return ret_val;<br>
</div>
<div>}</div>
<div><br>
</div>
<div>I don't think it matters but the
project is C compiled with gcc.<br>
</div>
<div><br>
</div>
<div>Also if both are feasible any other
pointers, tips or good resources would
be appreciated. E.g is there a
totally different methodology I'm not
considering - e.g. would using
something like pycparser be a lot
easier - though I'd prefer to keep it
in clang as we plan to use tidy &
static analyzer in any case for
standard QA.<br>
</div>
<div><br>
</div>
<div>Thanks for reading,</div>
<div>Billy.<br>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote></div>