[cfe-dev] Two pass analysis framework: AST merging approach
Gábor Horváth via cfe-dev
cfe-dev at lists.llvm.org
Tue Aug 9 02:03:14 PDT 2016
A side note, it might be worth to consider IDDFS instead of DFS (
get more optimal coverage patterns regardless of the "shape" of the
What do you think?
On 26 July 2016 at 10:07, Gábor Horváth <xazax.hun at gmail.com> wrote:
> On 15 July 2016 at 08:17, Anna Zaks <ganna at apple.com> wrote:
>> On Jul 12, 2016, at 4:59 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>> (As a ping), I would like to summarize the measurements I done since the
>> original e-mail:
>> The approach is to first serialize all the translation units to the
>> storage, create an index of the functions, and then load them lazily on
>> demand to achieve cross translation unit support. This does not modify the
>> inter-procedural analysis of the Static Analyzer and could be used for
>> Clang Tidy as well. Once a new inter-procedural analysis is introduced for
>> the Static Analyzer, the cross tu support would profit from it immediately.
>> rAthena, a 150k LOC C project:
>> The size of the serialized ASTs was: 140MB
>> The size of the indexes: 4.4MB
>> The time of the analysis was bellow 4X
>> The amount of memory consumed was bellow 2X
>> The number of reports is 3 times more
>> Xerces, 150k LOC C++ project:
>> The size of the serialized ASTs was: 800MB
>> The size of the indexes: 90MB
>> The analysis time using CTU was the half of the one without CTU
>> LLVM + Clang + Clang tools extra:
>> The size of the serialized ASTs was: 45.4 GB
>> The size of the indexes: 1,6GB
>> Some optimization effort to reduce te size of the CFG:
>> TU ASTs after omitting function bodies from headers: 42.7 GB
>> TU ASTs after omitting STL: 34.0 GB
>> TU ASTs after skipping implicit instantiations: 21.5 GB
>> TU ASTs after omitting STL and implicit instantiations: 16.0 GB
>> Considering that the build directory of a debug build is also about 40 GB
>> on my platform, I do not consider the size of the serialized ASTs a
>> blocking issue. However, in case it is a concern, according to the attached
>> statistics about the serialized AST dumps, there are some optimization
>> possibilities in the binary format.
>> This solution also works in a multi process build/analysis environment.
>> Some of the necessary framework, for example ASTImporter code is being
>> accepted into mainline clang right now.
>> All in all, this approach:
>> - Can discover new bug reports as is.
>> Hi Gabor,
>> I’d like to get more data on this point. We know that inlining-style
>> inter-procedural analysis does not scale. That is the main issue with
>> adding the cross-translation unit support to the clang static analyzer.
>> I suspect that the only way the suggested approach works is that we time
>> out more often when the TU support is enabled. Since the static analyzer
>> performs DFS of the paths, this means that we trade catching bugs localized
>> within a single TU to bugs that require deep cross-TU analysis. I suspect
>> those bugs are much harder to understand, but more, importantly, we trade
>> coverage in a lot of cases instead of gaining it.
> Hi Anna,
> Good point. Using the cross TU analysis, the analyzer has very different
> coverage pattern. I did not have time to look into the reports yet, but
> will do so soon.
>> Looks like you clam that the number of bugs reported increased for one of
>> the projects, but there is no data for all the other projects. Also, had
>> that project ever been analyzed with the clang static analyzer before and
>> had some of the intra-procedural bugs been fixed before you made the
> There is no trace of fixed static analyzer warnings in the commit logs of
> rAthena project. But in case issues found w/o cross tu and was fixed it can
> be still useful to analyze the project with a different coverage pattern.
>> I’d like to see the absolute number of bugs reported in both cases. Also,
>> it would be good to measure coverage and the number of times the analyzer
>> times out in both cases. (Those statistics should be easy to generate as
>> the tracking is already implemented by the analyzer and we used it when
>> bringing up inter procedural analysis.)
> I did some measurements, and here are the results for the rAthena project:
> Coverage results without cross translation unit:
> 105499 AnalysisConsumer - The # of basic blocks in the analyzed functions.4167 AnalysisConsumer - The # of functions and blocks analyzed (as top level with inlining turned on).10084 AnalysisConsumer - The # of functions at top level.60.0244651798 AnalysisConsumer - The % of reachable basic blocks.3360 AnalysisConsumer - The maximum number of basic blocks in a function.960440 CoreEngine - The # of paths explored by the analyzer.140425349 CoreEngine - The # of steps executed.646201 ExprEngine - The # of aborted paths due to reaching the maximum block count in a top level function24317519 ExprEngine - The # of times RemoveDeadBindings is called489956 ExprEngine - The # of times we inlined a call20300 ExprEngine - The # of times we re-evaluated a call without inlining
> Coverage results with cross translation unit:
> 150406 AnalysisConsumer - The # of basic blocks in the analyzed functions.3308 AnalysisConsumer - The # of functions and blocks analyzed (as top level with inlining turned on).9829 AnalysisConsumer - The # of functions at top level.48.9545495971 AnalysisConsumer - The % of reachable basic blocks.3360 AnalysisConsumer - The maximum number of basic blocks in a function.2088751 CoreEngine - The # of paths explored by the analyzer.345875017 CoreEngine - The # of steps executed.464409 ExprEngine - The # of aborted paths due to reaching the maximum block count in a top level function64756893 ExprEngine - The # of times RemoveDeadBindings is called2176922 ExprEngine - The # of times we inlined a call44215 ExprEngine - The # of times we re-evaluated a call without inlining
> Note that, the results are not fully precise for the following reasons:
> * I simply collected and merged the results for each translation unit. In
> the cross translation unit case a function might be analyzed from multiple
> translation unit, that might include duplicates in the number of analyzed
> basic blocks. Inline functions in headers might cause the same effects even
> when cross translation unit is turned off.
> * The coverage % is also imprecise. Basic blocks not reached in one TU
> might reached from another TU during the analysis.
> Even though these numbers are just hints, they show about 50% increase in
> the number of analyzed blocks. This makes me think that the overall
> coverage is increased (however as you pointed out, the pattern is
> different). It is also worth noting that the number of functions analyzed
> as top level decreased.
> In case the issues reported this way are too hard to understand, the
> heuristics might be tuned in case this feature is turned on. This is no
> silver bullet of course, but once a new interprocedural analysis is
> introduced, this would provide an easy way to evaluate it in a cross tu
> What do you think?
>> Thank you,
>> - Feasible to implement, does not need sever modifications to the Static
>> Analyzer or Clang Tidy.
>> - Has acceptable performance for lots of the real world projects.
>> I think, this would be a useful addition to the clang source tree. Do you
>> On 4 May 2016 at 15:09, Gábor Horváth <xazax.hun at gmail.com> wrote:
>>> This e-mail is a proposal based on the work done by Yury Gibrov et al.:
>>> They accomplished a two pass analysis, the first pass is serializing the
>>> AST of every translation unit and creates an index of functions, the second
>>> pass does the real analysis, which can load the AST of function bodies on
>>> This approach can be used to achieve cross translation unit analysis for
>>> the clang Static Analyzer to some extent, but similar approach could be
>>> applicable to Clang Tidy and other clang based tools.
>>> While this method is not likely to be a silver bullet for the Static
>>> Analyzer, I did some benchmarks to see how feasible this approach is. The
>>> baseline was running the Static Analyzer without the two pass analyis, the
>>> second one was running using the framework linked above.
>>> For a 150k LOC C projects I got the following results:
>>> The size of the serialized ASTs was: 140MB
>>> The size of the indexes (textual representation): 4.4MB
>>> The time of the analysis was bellow 4X
>>> The amount of memory consumed was bellow 2X
>>> All in all it looks like a feasible approach for some use cases.
>>> I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately
>>> I was not able to run the analysis due to some missing features in the AST
>>> Importer. But I was able to serialize the ASTs and generate the indices:
>>> The siye of the serialized ASTs: 45.4 GB
>>> The siye of the function index: 1,6GB
>>> While these numbers are less promising, I think there are some
>>> opportunities to reduce them significantly.
>>> I propose the introduction of an analysis mode for exporting ASTs. In
>>> analysis mode the AST exporter would not emit the function body of a
>>> function for several cases:
>>> - In case a function is defined in a header, do not emit the body.
>>> - In case the function was defined in an implicit template
>>> specialisation, do not emit the body.
>>> I think after similar optimizations it might be feasible to use this
>>> approach on LLVM scale projects as well, and it would be much easier to
>>> implement Clang based tools that can utilize cross translation unit
>>> In case the analyzer gets a new interprocedural analysis method that
>>> would increase the performance the users of this framework would profit
>>> from that approach immediately.
>>> Does a framework like this worth mainlining and working on? What do you
>>> (Note that, AST Importer related improvements are already being
>>> mainlined by Yury et al. My question is about the "analysis mode" for
>>> exporting ASTs, and a general framework to consume those exported ASTs.)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev