<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On May 4, 2016, at 6:09 AM, Gábor Horváth <<a href="mailto:xazax.hun@gmail.com" class="">xazax.hun@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class=""><div class=""><div class="">Hi!<br class=""><br class=""></div>This e-mail is a proposal based on the work done by Yury Gibrov et al.: <a href="http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html" class="">http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html</a><b class=""><br class=""><br class=""></b></div>They accomplished a two pass analysis, the first pass is serializing the AST of every translation unit and creates an index of functions, the second pass does the real analysis, which can load the AST of function bodies on demand.<br class=""><br class=""></div>This approach can be used to achieve cross translation unit analysis for the clang Static Analyzer to some extent, but similar approach could be applicable to Clang Tidy and other clang based tools.<br class=""><br class=""></div>While this method is not likely to be a silver bullet for the Static Analyzer, I did some benchmarks to see how feasible this approach is. The baseline was running the Static Analyzer without the two pass analyis, the second one was running using the framework linked above.<br class=""><br class=""></div></div></div></div></blockquote><div><br class=""></div>Can you explain what the "two pass analysis" does? Ex: Does it loop through each of the TU second time and “inline” every call from other TUs? In which order are the other TUs loaded? In which order the call sites are processed? Do you repeat until no change? Did you measure coverage in some way? Did you perform path-sensitive checks? (The time of analysis of 4X seems much lower than what I would expect, given that we now explore much deeper paths and the analyzer has exponential running time.)</div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class="">For a 150k LOC C projects I got the following results:<br class=""></div><div class="">The size of the serialized ASTs was: 140MB<br class=""></div><div class="">The size of the indexes (textual representation): 4.4MB<br class=""></div><div class="">The time of the analysis was bellow 4X<br class=""></div><div class="">The amount of memory consumed was bellow 2X<br class=""><br class=""></div><div class="">All in all it looks like a feasible approach for some use cases.<br class=""><br class=""></div><div class="">I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately I was not able to run the analysis due to some missing features in the AST Importer. But I was able to serialize the ASTs and generate the indices:<br class=""></div><div class="">The siye of the serialized ASTs: 45.4 GB<br class=""></div><div class="">The siye of the function index: 1,6GB<br class=""><br class=""></div><div class="">While these numbers are less promising, I think there are some opportunities to reduce them significantly.<br class=""><br class=""></div><div class="">I propose the introduction of an analysis mode for exporting ASTs. In analysis mode the AST exporter would not emit the function body of a function for several cases:<br class=""></div><div class="">- In case a function is defined in a header, do not emit the body.<br class=""></div><div class="">- In case the function was defined in an implicit template specialisation, do not emit the body.<br class=""><br class=""></div><div class="">I think after similar optimizations it might be feasible to use this approach on LLVM scale projects as well, and it would be much easier to implement Clang based tools that can utilize cross translation unit capabilities.<br class=""><br class=""></div><div class="">In case the analyzer gets a new interprocedural analysis method that would increase the performance the users of this framework would profit from that approach immediately.<br class=""><br class=""></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class="">Does a framework like this worth mainlining and working on? What do you think?<br class=""><br class=""></div><div class="">(Note that, AST Importer related improvements are already being mainlined by Yury et al. My question is about the "analysis mode" for exporting ASTs, and a general framework to consume those exported ASTs.)<br class=""></div><div class=""><br class=""></div><div class="">Regards,<br class=""></div><div class="">Gábor<br class=""></div><div class=""><br class=""></div><div class=""><br class=""><br class=""></div></div></div>

</div></blockquote></div><br class=""></body></html>