[cfe-dev] Two pass analysis framework: AST merging approach

Wed May 4 06:09:24 PDT 2016

Hi!

This e-mail is a proposal based on the work done by Yury Gibrov et al.:
http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html

They accomplished a two pass analysis, the first pass is serializing the
AST of every translation unit and creates an index of functions, the second
pass does the real analysis, which can load the AST of function bodies on
demand.

This approach can be used to achieve cross translation unit analysis for
the clang Static Analyzer to some extent, but similar approach could be
applicable to Clang Tidy and other clang based tools.

While this method is not likely to be a silver bullet for the Static
Analyzer, I did some benchmarks to see how feasible this approach is. The
baseline was running the Static Analyzer without the two pass analyis, the
second one was running using the framework linked above.

For a 150k LOC C projects I got the following results:
The size of the serialized ASTs was: 140MB
The size of the indexes (textual representation): 4.4MB
The time of the analysis was bellow 4X
The amount of memory consumed was bellow 2X

All in all it looks like a feasible approach for some use cases.

I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately I
was not able to run the analysis due to some missing features in the AST
Importer. But I was able to serialize the ASTs and generate the indices:
The siye of the serialized ASTs: 45.4 GB
The siye of the function index: 1,6GB

While these numbers are less promising, I think there are some
opportunities to reduce them significantly.

I propose the introduction of an analysis mode for exporting ASTs. In
analysis mode the AST exporter would not emit the function body of a
function for several cases:
- In case a function is defined in a header, do not emit the body.
- In case the function was defined in an implicit template specialisation,
do not emit the body.

I think after similar optimizations it might be feasible to use this
approach on LLVM scale projects as well, and it would be much easier to
implement Clang based tools that can utilize cross translation unit
capabilities.

In case the analyzer gets a new interprocedural analysis method that would
increase the performance the users of this framework would profit from that
approach immediately.

Does a framework like this worth mainlining and working on? What do you
think?

(Note that, AST Importer related improvements are already being mainlined
by Yury et al. My question is about the "analysis mode" for exporting ASTs,
and a general framework to consume those exported ASTs.)

Regards,
Gábor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160504/2198d988/attachment.html>