<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, May 4, 2016 at 4:12 PM, Gábor Horváth <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi!<br><div><div class="gmail_extra"><br><div class="gmail_quote"><span class="gmail-">On 4 May 2016 at 16:05, Aleksei Sidorin <span dir="ltr"><<a href="mailto:a.sidorin@samsung.com">a.sidorin@samsung.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hello Gabor,<br>

<br>

thank you for your proposal. In our approach, we introduced ASTContext::getXTUDefinition() method to allow clients import functions they need. Did you follow the same way or added something else on top?<span><br></span></blockquote><div><br></div></span><div>I used that implementation.<br> <br></div><span class="gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span>

<br>

> - In case a function is defined in a header, do not emit the body.<br>

<br></span>

There may be issues with this approach. Code:<br>

<br>

1.cpp<br>

#include "1.h"<br>

<br>

1.h<br>

void f() {...}<br>

<br>

2.cpp<br>

void f();<br>

<br>

...<br>

f();<br>

<br>

<br>

needs a header-located function to be imported for analysis of 2.cpp.<br></blockquote><div><br></div></span><div>You are right. This method would not work for those scenarios. But losing some definition might worth it to make this solution feasible. Another possible solution would be to try to come up with a protocol to only export the definition of such functions once (so only for one translation unit).  <br></div></div></div></div></div></blockquote><div><br></div><div>A better solution would be possible with <a href="http://clang.llvm.org/docs/Modules.html">modularized builds</a>: then each header's AST would be serialized exactly once instead of being copied in each translation unit. That requires a serious pre-work to make the analyzed code compatible with modules, but it seems to be the most perspective long-term approach.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div class="gmail_extra"><div class="gmail_quote"><div></div><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

Also, in general, somebody may need to search for non-function stuff: for types or variables, for example. CSA may need to search for a global variable initializer, too (we didn't implement it currently).  It is a more common task. But we need to start with something.<br></blockquote><div><br></div></span><div>Right. I did not plan to omit those information from the AST dump.<br></div><div><div class="gmail-h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div>

Hi!<br>

<br>

This e-mail is a proposal based on the work done by Yury Gibrov et al.:<br>

<a href="http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html" rel="noreferrer">http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html</a><br>

<br>

They accomplished a two pass analysis, the first pass is serializing the<br>

AST of every translation unit and creates an index of functions, the second<br>

pass does the real analysis, which can load the AST of function bodies on<br>

demand.<br>

<br>

This approach can be used to achieve cross translation unit analysis for<br>

the clang Static Analyzer to some extent, but similar approach could be<br>

applicable to Clang Tidy and other clang based tools.<br>

<br>

While this method is not likely to be a silver bullet for the Static<br>

Analyzer, I did some benchmarks to see how feasible this approach is. The<br>

baseline was running the Static Analyzer without the two pass analyis, the<br>

second one was running using the framework linked above.<br>

<br>

For a 150k LOC C projects I got the following results:<br>

The size of the serialized ASTs was: 140MB<br>

The size of the indexes (textual representation): 4.4MB<br>

The time of the analysis was bellow 4X<br>

The amount of memory consumed was bellow 2X<br>

<br>

All in all it looks like a feasible approach for some use cases.<br>

<br>

I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately I<br>

was not able to run the analysis due to some missing features in the AST<br>

Importer. But I was able to serialize the ASTs and generate the indices:<br>

The siye of the serialized ASTs: 45.4 GB<br>

The siye of the function index: 1,6GB<br>

<br>

While these numbers are less promising, I think there are some<br>

opportunities to reduce them significantly.<br>

<br>

I propose the introduction of an analysis mode for exporting ASTs. In<br>

analysis mode the AST exporter would not emit the function body of a<br>

function for several cases:<br>

- In case a function is defined in a header, do not emit the body.<br>

- In case the function was defined in an implicit template specialisation,<br>

do not emit the body.<br>

<br>

I think after similar optimizations it might be feasible to use this<br>

approach on LLVM scale projects as well, and it would be much easier to<br>

implement Clang based tools that can utilize cross translation unit<br>

capabilities.<br>

<br></div></div><span>

In case the analyzer gets a new interprocedural analysis method that would<br>

increase the performance the users of this framework would profit from that<br>

approach immediately.<br>

<br>

Does a framework like this worth mainlining and working on? What do you<br>

think?<br>

<br>

(Note that, AST Importer related improvements are already being mainlined<br>

by Yury et al. My question is about the "analysis mode" for exporting ASTs,<br>

and a general framework to consume those exported ASTs.)<br>

<br>

Regards,<br>

Gábor<br></span>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://lists.llvm.org/pipermail/cfe-dev/attachments/20160504/2198d988/attachment.html" rel="noreferrer">http://lists.llvm.org/pipermail/cfe-dev/attachments/20160504/2198d988/attachment.html</a>><br>

<br>

------------------------------<br>

<br>

Subject: Digest Footer<br>

<br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

<br>

<br>

------------------------------<br>

<br>

End of cfe-dev Digest, Vol 107, Issue 9<br>

***************************************<span><font color="#888888"><br>

</font></span></blockquote><span><font color="#888888">

<br>

<br>

-- <br>

Best regards,<br>

Aleksei Sidorin<br>

Software Engineer,<br>

IMSWL-IMCG, SRR, Samsung Electronics<br>

<br>

</font></span></blockquote></div></div></div><br></div></div></div>

<br>_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

<br></blockquote></div><br></div></div>