[cfe-dev] Controlling instantiation of templates from PCH

Tue May 28 02:31:19 PDT 2019

On Tuesday 28 of May 2019, David Blaikie wrote:
> So I'm not sure I understand this comment:
>
> "And, if you look carefully, 4 seconds more to generate code, most of it
> for those templates. And after the compiler spends all this time on
> templates in all the source files, it gets all passed to the linker, which
> will shrug and then throw most of it away (and that will too take a load of
> time, if you still happen to use the BFD linker instead of gold/lld
> <https://lists.freedesktop.org/archives/libreoffice/2018-July/080484.html>
>  with -gsplit-dwarf -Wl,--gdb-index
> <https://lists.freedesktop.org/archives/libreoffice/2018-June/080437.html>)
>. What a marvel."
>
> What extra code generation occurred with the PCH? Any change in generated
> code with a PCH would surprise me.

 If I understand it correctly, the small testcase from me means that adding a 
PCH generally does not change the resulting object file, only make Clang 
spend more time processing something it throws away as unused somewhen in the 
later stages of creating the object file, so there's no extra code generation 
caused by the PCH. So, to make it more clean what I meant there, it's more 
like saying that there's a missed opportunity:

- Let's say that I have a library built from a.cpp and b.cpp, and both those 
sources use std::vector< int >. As in, they really use it, so both a.o and 
b.o end up with weak copies of std::vector< int > code.
- That seems to be basically inevitable with the normal non-PCH code, as the 
Clang instance compiling a.cpp cannot know that std::vector< int > code will 
be also present in b.o, and so both compiling a.cpp and b.cpp results in 
generating std::vector< int >, even though we can clearly see it's 
unnecessary.
- I say it's basically inevitable in the non-PCH case, because I don't know a 
reasonable way to avoid that in practice. There is extern template, which 
would work in this minimal testcase, but for a real-world large codebase I 
find that impractical, tedious and what not (please correct if I'm wrong and 
there is a reasonable way, but beware that I've already tried that and 
decided that writing a compiler patch was an easier way of going about it).
- However, in the PCH case, both Clang instances do know that they share all 
the template instantiations from the PCH. And that's where my patch steps in 
and -fpch-template-instantiation=force tell one instance "take care of it 
all" and -fpch-template-instantiation=skip tells all the other 
instance "don't bother with those, somebody else will take care of that". So 
all but one Clang instances can skip all those numerous 
Sema::InstantiateFunctionDefinition() and also code generation for all of 
those instances that actually are used in that TU.
- To put it differently, you can also view -fpch-template-instantiation=skip 
as automatic extern template for whatever is used by the PCH, 
and -fpch-template-instantiation=force as explicit instantiation for it, 
where all the hassle of extern template is replaced by just putting all the 
template stuff in the PCH. (To be precise, it's not exactly like explicit 
instantiation, because it involves only what is instantiated by the PCH, but 
if wanted that can be handled by actually explicitly instantiating in the 
PCH, without having to bother with the extern template stuff).

-- 
 Lubos Lunak