[cfe-dev] Controlling instantiation of templates from PCH

Lubos Lunak via cfe-dev cfe-dev at lists.llvm.org
Sat May 25 12:32:39 PDT 2019


 Hello,

 I'm working on a Clang patch that can make C++ builds noticeably faster in 
some setups by allowing control over how templates are instantiated, but I 
have some problems finishing it and need advice.

 Background: I am a LibreOffice developer. When enabling precompiled headers, 
e.g. for LO Calc precompiled headers save ~2/3 of build time when MSVC is 
used, but with Clang they save only ~10%. Moreover the larger the PCH the 
more time is saved with MSVC, but this is not so with Clang, in fact larger 
PCHs often make things slower.

 The recent -ftime-trace feature allowed me to investigate this and it turns 
out that the time saved by having to parse less is outweighted by having to 
instantiate (many) more templates. You can see -ftime-trace graphs at 
http://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html 
(1nd row - no PCH, 2nd row - small PCH, 3rd row - large PCH), the .json files 
are at http://ge.tt/7RHeLHw2 if somebody wants to see them.

 Specifically, the time is spent in Sema::PerformPendingInstantiations() and 
Sema::InstantiateFunctionDefinition(). The vast majority of the 
instantiations comes from the PCH itself. This means that this is performed 
for every TU using the PCH, and it also means that it's useless work, as the 
linker will discard all but one copy of that.

 My WIP patch implements a new option to avoid that. The idea is that all 
sources using the PCH will be built with -fpch-template-instantiation=skip, 
which will prevent Sema::InstantiateFunctionDefinition() from actually 
instantiating templates coming from the PCH if they would be uneeded 
duplicates (note that means almost all PCH template instantiations in the 
case of a developer build with -O0 -g, which is my primary use case). Then 
one extra source file is built with -fpch-template-instantiation=force, which 
will provide one copy of instantiations. I assume that this is similar to how 
MSVC manages to have much better gains with PCH, the .obj created during PCH 
creation presumably contains single instantiations.

 In the -ftime-trace graphs linked above, the 4th row is large PCH with my 
patch. The compilation time saved by this is 50% and 60% for the two examples 
(and I think moving some templates into the PCH might get it to 70-75% for 
the second file).

 As I said, I have some problems that prevent the patch from being fully 
usable, so in order to finish it, could somebody help me with the following:

- I don't understand how it is controlled which kind of ctor/dtor is emitted 
(complete ctor vs base ctor, i.e. C1 vs C2 type in the Itanium ABI). I get 
many undefined references because the TU built with instances does not have 
both types, yet other TUs refer to them. How can I force both of them be 
emitted?

- I have an undefined reference to one template function that should be 
included in the TU with instances, but it isn't. The Sema part instantiates 
it and I could track it as far as getting generated in Codegen, but then I'm 
lost. I assume that it gets discarded because something in codegen or llvm 
considers it unused. Is there a place like that and where is it? Are there 
other places in codegen/llvm where I could check to see why this function 
doesn't get generated in the object file?

- In Sema::InstantiateFunctionDefinition() the code for extern templates still 
instantiates a function if it has getContainedAutoType(), so my code should 
probably also check that. But I'm not sure what that is (is that 'auto foo() 
{ return 1; }' ?) or why that would need an instance in every TU.

- I used BENIGN_ENUM_LANGOPT because Clang otherwise complains that the PCH is 
used with a different option than what it was generated with, which is 
necessary in this case, but I'm not sure if this is the correct handling of 
the option.

- Is there a simple rule for what decides that a template needs to be 
instantiated? As far as I can tell, even using a template as a class member 
or having an inline member function manipulating it doesn't. When I mentioned 
moving some templates into the PCH in order to get possible 70% savings, I 
actually don't know how to cause an instantiation from the PCH, the templates 
and their uses are included there.

 Thank you.

-- 
 Lubos Lunak
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pch-instantiate-templates.patch
Type: text/x-diff
Size: 10725 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190525/00d1964a/attachment.patch>


More information about the cfe-dev mailing list