[cfe-dev] [RFC] Identifying wasteful template function bodies

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Mon Dec 2 12:13:57 PST 2019

On Mon, Dec 2, 2019 at 2:42 PM Reid Kleckner via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> I like this idea. I'm not sure a warning is the best way to surface it.
> The first alternative that occurs to be would be the -ftime-trace JSON
> file, so you can dump out complete info about the most often instantiated
> templates, how many nodes they contained, and how many of them were
> dependent.
> Which brings me to wonder, if clang already tracks
> <https://github.com/llvm/llvm-project/blob/master/clang/include/clang/AST/Expr.h#L183>
> whether an Expr is instantiation dependent, why can't it do this
> optimization itself? I assume there are good reasons. I recall there are
> some invariants about Decl or Expr pointer identity, but maybe those are
> handled by AlwaysRebuild
> <https://github.com/llvm/llvm-project/blob/master/clang/lib/Sema/TreeTransform.h#L140>.
> Another thing that occurs to me is that some semantic analysis or warnings
> may not fire if the DeclContext of an Expr is dependent. I wonder what the
> savings would be if we could enumerate those checks, store them in the
> template pattern, and run only the checks that matter without re-traversing
> the entire AST. The last reason I can think of is ADL. I think that would
> defer name lookup for almost every CallExpr in a template.

I /think/ ADL still allows calls to be resolved up-front if the parameters
aren't dependent. Yeah, here the non-dependent call is resolved and the
dependent call is left with an UnresolvedLookupExpr:

> On Mon, Dec 2, 2019 at 8:01 AM Brian Gesiak <modocache at gmail.com> wrote:
>> I work on a C++ project for which compilation time is a significant
>> concern. One of my colleagues was able to significantly shorten the
>> time Clang took to compile our project, by manually outlining
>> independently-typed code from large template functions.
>> This makes intuitive sense to me, because when instantiating a
>> template function, Clang traverses the body of the function. The
>> longer the function body, the more nodes in the AST Clang has to
>> traverse, and the more time it takes. Programmers can read the
>> function and see that some statements in the function body remain the
>> same no matter what types the function is instantiated with. By
>> extracting these statements into a separate, non-template function,
>> programmers can reduce the amount of nodes Clang must traverse.
>> I created a contrived example that demonstrates how splitting up a
>> long template function can improve compile time. (Beware, the files
>> are large, I needed something that would take Clang a hefty amount of
>> time to process.)
>> https://gist.github.com/modocache/77b8ac09280c08bd88f84b92ff43a28b
>> In the example above, 'example.cpp' defines a template function
>> 'foo<T, U, V, W, X, Y, Z>', whose body is ~46k LoC. It then
>> instantiates 'foo' 10 times, with 10 different combinations of
>> template type parameters. In total, 'clang -c -O1 example.cpp -Xclang
>> -disable-llvm-passes -Xclang -emit-llvm' takes ~35 seconds in total to
>> compile. Each additional instantiation of 'foo' adds an additional ~3
>> seconds to the total compile time.
>> Only the last statement in 'foo' is dependent upon the template type
>> parameters to 'foo'. 'example-outlined.cpp' moves ~46k LoC of
>> independently-typed statements out of 'foo' and into a function named
>> 'foo_prologue_outlined', and has 'foo' call 'foo_prologue_outlined'.
>> 'foo_prologue_outlined' is not a template function. The result is
>> identical program behavior, but a total compile time of just ~5
>> seconds (~85% faster). Additional instantiations of 'foo' in
>> 'example-outlined.cpp' cost almost no additional compile time.
>> Although the functions in our project are not as long, some of them
>> take significantly longer than 35 seconds to compile. By outlining
>> independently-typed statements, we've been able to reduce compile time
>> of some functions, from 300s to 200s (1/3rd faster). So, my colleagues
>> and I are looking for other functions we can manually outline in order
>> to reduce the amount of time Clang takes to compile our project. To
>> this end, it would be handy if Clang could tell us, for example, “hey,
>> I just instantiated 'bar<int, float, double>', but X% of the
>> statements in that function did not require transformation,” where
>> 'X%' is some threshold that could be set in the compiler invocation.
>> For now I'm thinking the option to set this warning threshold could be
>> called '-Wwasteful-template-threshold=' -- but I'm aware that sounds
>> awkward, and I'd love suggestions for a better name.
>> I think implementing this feature is possible by adding some state to
>> TreeTransform, or the Clang template instantiators that derive from
>> that class. But before I send a patch to do so, I'm curious if anyone
>> has attempted such a thing before, or if anyone has thoughts or
>> comments on this feature. I'd prefer not to spend time implementing
>> this diagnostic in Clang if it's predestined to be rejected in code
>> review, so please let me know what you think!
>> (I've cc'ed some contributors who I think have worked in this space
>> like @rnk, or those who might have better naming suggestions like
>> @rtrieu.)
>> - Brian Gesiak
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191202/63e06d92/attachment.html>

More information about the cfe-dev mailing list