[cfe-dev] GSoC 2012: Static Function Blacklisting

Sat Mar 31 14:26:44 PDT 2012

On Mar 30, 2012, at 10:05 AM, Mark McCurry wrote:

> I do want to avoid manual annotations and I would hope to have some
> basic support for determining if code in another translation unit is
> "safe".
> For this to be done, I should only need the AST from multiple
> translation units, which I had thought would not be too complex within
> clang's static analysis.
> Would this be simplified greatly by making this analysis a separate tool?
> 

You don't necessarily need the full AST, just having a call graph with information about the functions attributes is probably enough. After you have it, you just propagate the attributes essentially solving a data flow problem.

Here is a rough algorithm for solving this, taking the "reentrant" annotation as an example.
1) Build a call graph (clang has rudimentary call graph support already).
2) For each node(representing a function) internally mark it with "reentrant", "non reentrant", "don't know".
3) Iterate through the nodes in the graph and propagate the annotations:
       If at least one of of the callees is "non-reentrant", the caller becomes "non-reentrant".
       If all of the callees are "reentrant", the caller becomes "reentrant".
       Validate: If a function becomes "non-reentrant" and it has "reentrant" user annotation, raise a warning.
4) Repeat Step #4 until no change. (To optimize performance, you'd iterate in topological order starting from the callees.)

This algorithm assumes that if we don't know about a function, it is doing the right thing. This approach is the easiest for users to adapt. In addition, it allows you to deal with imprecision of the analyses, like functions defined in other translation units, dynamically linked libraries, system functions the analyzer does not reason about, calls via function pointers (though, these could be dropped at call graph construction). You can always strengthen your analyzes afterwards to make it more precise.

You could use the same algorithm even when handling multiple translation units. However, you'd need to build a global call graph by linking the information from different translation units. I am not sure how easy it is to resolve possible  ambiguities here.. In the worst case, you can always assume the function is marked as "don't know".

I'd start by implementing the analyzes for a single TU. Then, come up with a serialization scheme for the generated data. Last, we can either feed the serialized output back into the tool or have another tool/script to do the processing of the global graph.

I think it is very important to embed the knowledge about existing standard/system functions into the analyzes, so that someone could use it off-the-shelf.

> If not, I think the logical path is to work within each translation
> unit to check for safety as that process should be extendable once
> clang's static analysis has incorporated support for shifting
> information between translation units.
> Is any work currently being done in cross translation unit analysis or
> it is just sitting on a todo list?
> 

It's on a todo list. The general analyzer support is more involved than what you'd need for this project and I am not sure when we'll have it. 
Adding the support required for your project would be a first step in the right direction here and might be useful for other analyses.

> This analysis would definitely be more useful to users if it could do
> those checks, but it should still be of use without them.
> As per me working on some of the cross translation code, I might be
> able to get enough working for a proof of concept, but I have a poor
> gauge of how much time it would consume.
> 
> --Mark