[cfe-dev] Two pass analysis framework: AST merging approach

Wed May 4 09:56:02 PDT 2016

On Wed, May 4, 2016 at 4:12 PM, Gábor Horváth <cfe-dev at lists.llvm.org>
wrote:

> Hi!
>
> On 4 May 2016 at 16:05, Aleksei Sidorin <a.sidorin at samsung.com> wrote:
>
>> Hello Gabor,
>>
>> thank you for your proposal. In our approach, we introduced
>> ASTContext::getXTUDefinition() method to allow clients import functions
>> they need. Did you follow the same way or added something else on top?
>>
>
> I used that implementation.
>
>
>>
>> > - In case a function is defined in a header, do not emit the body.
>>
>> There may be issues with this approach. Code:
>>
>> 1.cpp
>> #include "1.h"
>>
>> 1.h
>> void f() {...}
>>
>> 2.cpp
>> void f();
>>
>> ...
>> f();
>>
>>
>> needs a header-located function to be imported for analysis of 2.cpp.
>>
>
> You are right. This method would not work for those scenarios. But losing
> some definition might worth it to make this solution feasible. Another
> possible solution would be to try to come up with a protocol to only export
> the definition of such functions once (so only for one translation unit).
>

A better solution would be possible with modularized builds
<http://clang.llvm.org/docs/Modules.html>: then each header's AST would be
serialized exactly once instead of being copied in each translation unit.
That requires a serious pre-work to make the analyzed code compatible with
modules, but it seems to be the most perspective long-term approach.

>
>>
>> Also, in general, somebody may need to search for non-function stuff: for
>> types or variables, for example. CSA may need to search for a global
>> variable initializer, too (we didn't implement it currently).  It is a more
>> common task. But we need to start with something.
>>
>
> Right. I did not plan to omit those information from the AST dump.
>
>
>>
>>
>> Hi!
>>>
>>> This e-mail is a proposal based on the work done by Yury Gibrov et al.:
>>> http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html
>>>
>>> They accomplished a two pass analysis, the first pass is serializing the
>>> AST of every translation unit and creates an index of functions, the
>>> second
>>> pass does the real analysis, which can load the AST of function bodies on
>>> demand.
>>>
>>> This approach can be used to achieve cross translation unit analysis for
>>> the clang Static Analyzer to some extent, but similar approach could be
>>> applicable to Clang Tidy and other clang based tools.
>>>
>>> While this method is not likely to be a silver bullet for the Static
>>> Analyzer, I did some benchmarks to see how feasible this approach is. The
>>> baseline was running the Static Analyzer without the two pass analyis,
>>> the
>>> second one was running using the framework linked above.
>>>
>>> For a 150k LOC C projects I got the following results:
>>> The size of the serialized ASTs was: 140MB
>>> The size of the indexes (textual representation): 4.4MB
>>> The time of the analysis was bellow 4X
>>> The amount of memory consumed was bellow 2X
>>>
>>> All in all it looks like a feasible approach for some use cases.
>>>
>>> I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately
>>> I
>>> was not able to run the analysis due to some missing features in the AST
>>> Importer. But I was able to serialize the ASTs and generate the indices:
>>> The siye of the serialized ASTs: 45.4 GB
>>> The siye of the function index: 1,6GB
>>>
>>> While these numbers are less promising, I think there are some
>>> opportunities to reduce them significantly.
>>>
>>> I propose the introduction of an analysis mode for exporting ASTs. In
>>> analysis mode the AST exporter would not emit the function body of a
>>> function for several cases:
>>> - In case a function is defined in a header, do not emit the body.
>>> - In case the function was defined in an implicit template
>>> specialisation,
>>> do not emit the body.
>>>
>>> I think after similar optimizations it might be feasible to use this
>>> approach on LLVM scale projects as well, and it would be much easier to
>>> implement Clang based tools that can utilize cross translation unit
>>> capabilities.
>>>
>>> In case the analyzer gets a new interprocedural analysis method that
>>> would
>>> increase the performance the users of this framework would profit from
>>> that
>>> approach immediately.
>>>
>>> Does a framework like this worth mainlining and working on? What do you
>>> think?
>>>
>>> (Note that, AST Importer related improvements are already being mainlined
>>> by Yury et al. My question is about the "analysis mode" for exporting
>>> ASTs,
>>> and a general framework to consume those exported ASTs.)
>>>
>>> Regards,
>>> Gábor
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <
>>> http://lists.llvm.org/pipermail/cfe-dev/attachments/20160504/2198d988/attachment.html
>>> >
>>>
>>> ------------------------------
>>>
>>> Subject: Digest Footer
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>> ------------------------------
>>>
>>> End of cfe-dev Digest, Vol 107, Issue 9
>>> ***************************************
>>>
>>
>>
>> --
>> Best regards,
>> Aleksei Sidorin
>> Software Engineer,
>> IMSWL-IMCG, SRR, Samsung Electronics
>>
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160504/6b6f7ce9/attachment.html>