[cfe-dev] GSoC 2012: Static Function Blacklisting

Sun Apr 1 17:56:14 PDT 2012

> > - Come up with a framework of storing the intermediate results between the analyzes of different translation units.
> >  Most likely, this is the direction the current clang static analyzer will take at some point; however, this is a challenging problem.
> This, by itself, would make a good GSoC project.  A persistence framework for adding annotations to functions, so that the analyser could run as in two passes, one collecting metadata and another applying it would not be a massively complex project by itself, but would be hugely useful for a whole range of analyses.

After letting the comments digest for a while, I am fairly sure that
my original proposals scope should be expanded some.
It sounds like it would be feasible to take some steps in the
direction of what David mentioned, though I am not exactly sure which
ones.
I am very hesitant to generalize persistence past function (and static
data?) annotation, as I do want to have a coherent result at the end
of GSoC.

After reading through some portions of the documentation on
/clang/lib/StaticAnalyzer, I am not too sure where a persistence
framework would fit relative to the clang code base.
>From what I understand persistence work would need to be at least
within /clang/lib, but I am not entirely sure on that based upon my
current knowledge of how clang is structured.
Is it safe to assume that the callgraph/annotation analysis should
only be interfaced to clang through the libclang interface?

> One could imagine a simple database (sqlite ?) to record all this information, and perhaps a simple python script on the side to query it once the analysis is complete.

Is there some standard way for recording this information or is sqlite
just what comes to mind?

As for the rest of the semantics proposed by Matthieu, I find them to
be excellent specs to work by.

> There is an attribute called "annotate" that enable you to embed string literals like:

Well that eliminates the need for another attribute, assuming that
hijacking this does not create any issues.
Thanks Michael.

> You don't necessarily need the full AST, just having a call graph...

I had assumed that the call graph information was not readily accessible.

> (clang has rudimentary call graph support already)

Could you point me to where that support may be?
After reading through some of the internals, I started to find
warnings about the "gore of the internal analysis engine".

> Here is a rough algorithm for solving this, taking the "reentrant" annotation as an example.
> 1) Build a call graph (clang has rudimentary call graph support already).
> 2) For each node(representing a function) internally mark it with "reentrant", "non reentrant", "don't know".
> 3) Iterate through the nodes in the graph and propagate the annotations:
>       If at least one of of the callees is "non-reentrant", the caller becomes "non-reentrant".
>      If all of the callees are "reentrant", the caller becomes "reentrant".
>       Validate: If a function becomes "non-reentrant" and it has "reentrant" user annotation, raise a warning.
> 4) Repeat Step #4 until no change. (To optimize performance, you'd iterate in topological order starting from the callees.)

That is fairly close to what I have intended to do, but seeing it
formalized so concisely makes me think that the scope of the project
should be expanded some.

So with all of that said, is it reasonable to extend this project into
some restricted two pass persistence framework for clang's static
analysis that could have the previously described property checking as
the first use of the new functionality?
Hopefully this work can be built upon in future clang development.

--Mark