<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 27, 2015 at 1:57 PM, James Widman <span dir="ltr"><<a href="mailto:james.widman@gmail.com" target="_blank">james.widman@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5">On Wed, May 27, 2015 at 4:11 AM, James Widman <<a href="mailto:james.widman@gmail.com">james.widman@gmail.com</a>> wrote:<br>

> On Tue, May 26, 2015 at 1:38 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br>

>> On Mon, May 25, 2015 at 12:37 PM, Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>> wrote:<br>

>>><br>

>>> zapcc maintains as much as possible from previous compilations: AST, IR,<br>

>>> MC and DebugInfo.  I'm not sure that module support goes that far.<br>

>><br>

>><br>

>> ASTs are preserved in modules, that's all they're for (parsing time tends to<br>

>> dominate, at least in our world/experiments/data as I understand it, so<br>

>> that's the first thing to fix). Duplicate IR/MC/DebugInfo is still present<br>

>> though it'd be the next thing to solve - we're talking about deduplicating<br>

>> some of the debug info and Adrian Prantl is working on that at the moment -<br>

>> putting debug info for types into the module files themselves and<br>

>> referencing it directly as a split DWARF file.<br>

>><br>

>> Duplicate IR/MC comes from comdat/linkonce_odr functions - and at some point<br>

>> it'd be nice to put those in a module too, if there's a clear single<br>

>> ownership (oh, you have an inline function in your modular header - OK,<br>

>> we'll IRGen it, make an available_externally copy of it in the module to be<br>

>> linked into any users of the module, and a standard external definition will<br>

>> be codegen'd down to object code and put in the module to be passed to the<br>

>> linker). This wouldn't solve the problems with templates that have no 'home'<br>

>> to put their definition.<br>

><br>

> I guess it depends on the build setup:  if you spread the build across<br>

> multiple machines then... never mind.<br>

><br>

> But if the whole build is on one machine and it has enough memory, and<br>

> as long as something like zapcc is retaining the whole program's AST<br>

> anyway, it could be a win for it to complete that whole-program AST<br>

> before any IR is generated.  Presumably, the compiler could then<br>

> invent the 'home' and do each instantiation exactly once in the entire<br>

> build.<br>

><br>

> Or... it might still help the multi-machine setup.  In the worst case,<br>

> an instantiated function would get instantiated once per machine.<br>

><br>

> But in that case it might be nice to get a fix-it hint from the linker<br>

> to automatically extern-templateize all such instantiations. (:<br>

<br>

<br>

</div></div>That reminds me: is there any public data that shows the percentage of<br>

build time spent doing IRGen/opt/CodeGen for duplicates that end up<br>

getting discarded?<br></blockquote><div><br></div><div>I have information on a couple large (1-10MLOC) codebases indicating that time spent outside of parsing is typically ~20% of total CPU time at -O2/-O3. IIRC, with lower optimization levels, I saw 10-15%.</div><div><br></div><div>So that ~20% number is a rough upper bound for the time spent in the LLVM optimizers and code generation, and hence an upper bound on the time for duplicates.</div><div><br></div><div>The fact that clang does IRGen as it parses (hence it fell under "parsing time" in my mesurements) makes it somewhat difficult to pinpoint how much time is spent on duplicates during IRGenj. If you want to measure this, you could do it similarly to how I describe measuring per-file time in <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__permalink.gmane.org_gmane.comp.compilers.clang.devel_42127&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=6o22aD1GYD0WRehrKffCIxMUQEzJrL9jeg50EaUo-Us&s=qU_dy1Dgmcmv5UyfEfeztlJWD4Kl459iN_TshZIhiAU&e=">http://permalink.gmane.org/gmane.comp.compilers.clang.devel/42127</a> but with extra probes tracking calls into IRGen. Also adding probes inside of the middle end and back end to track per-function time. </div><div><br></div><div>By combining this information with information from the linker about which functions end up becoming "duplicates", you should have a decent empirical estimate for the data that you want. You might do this by placing probes in the linker so that you can easily measure any project by just building it with the instrumented toolchain and using DTrace to funnel out all the data, which can then be fed into a script.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class=""><div class="h5"><br>

--James<br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

</div></div></blockquote></div><br></div></div>