[cfe-dev] Two pass analysis framework: AST merging approach
Aleksei Sidorin via cfe-dev
cfe-dev at lists.llvm.org
Wed May 4 07:05:02 PDT 2016
Hello Gabor,
thank you for your proposal. In our approach, we introduced
ASTContext::getXTUDefinition() method to allow clients import functions
they need. Did you follow the same way or added something else on top?
> - In case a function is defined in a header, do not emit the body.
There may be issues with this approach. Code:
1.cpp
#include "1.h"
1.h
void f() {...}
2.cpp
void f();
...
f();
needs a header-located function to be imported for analysis of 2.cpp.
Also, in general, somebody may need to search for non-function stuff:
for types or variables, for example. CSA may need to search for a global
variable initializer, too (we didn't implement it currently). It is a
more common task. But we need to start with something.
> Hi!
>
> This e-mail is a proposal based on the work done by Yury Gibrov et al.:
> http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html
>
> They accomplished a two pass analysis, the first pass is serializing the
> AST of every translation unit and creates an index of functions, the second
> pass does the real analysis, which can load the AST of function bodies on
> demand.
>
> This approach can be used to achieve cross translation unit analysis for
> the clang Static Analyzer to some extent, but similar approach could be
> applicable to Clang Tidy and other clang based tools.
>
> While this method is not likely to be a silver bullet for the Static
> Analyzer, I did some benchmarks to see how feasible this approach is. The
> baseline was running the Static Analyzer without the two pass analyis, the
> second one was running using the framework linked above.
>
> For a 150k LOC C projects I got the following results:
> The size of the serialized ASTs was: 140MB
> The size of the indexes (textual representation): 4.4MB
> The time of the analysis was bellow 4X
> The amount of memory consumed was bellow 2X
>
> All in all it looks like a feasible approach for some use cases.
>
> I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately I
> was not able to run the analysis due to some missing features in the AST
> Importer. But I was able to serialize the ASTs and generate the indices:
> The siye of the serialized ASTs: 45.4 GB
> The siye of the function index: 1,6GB
>
> While these numbers are less promising, I think there are some
> opportunities to reduce them significantly.
>
> I propose the introduction of an analysis mode for exporting ASTs. In
> analysis mode the AST exporter would not emit the function body of a
> function for several cases:
> - In case a function is defined in a header, do not emit the body.
> - In case the function was defined in an implicit template specialisation,
> do not emit the body.
>
> I think after similar optimizations it might be feasible to use this
> approach on LLVM scale projects as well, and it would be much easier to
> implement Clang based tools that can utilize cross translation unit
> capabilities.
>
> In case the analyzer gets a new interprocedural analysis method that would
> increase the performance the users of this framework would profit from that
> approach immediately.
>
> Does a framework like this worth mainlining and working on? What do you
> think?
>
> (Note that, AST Importer related improvements are already being mainlined
> by Yury et al. My question is about the "analysis mode" for exporting ASTs,
> and a general framework to consume those exported ASTs.)
>
> Regards,
> Gábor
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160504/2198d988/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> ------------------------------
>
> End of cfe-dev Digest, Vol 107, Issue 9
> ***************************************
--
Best regards,
Aleksei Sidorin
Software Engineer,
IMSWL-IMCG, SRR, Samsung Electronics
More information about the cfe-dev
mailing list