[cfe-dev] [RFC] Identifying wasteful template function bodies

Reid Kleckner via cfe-dev cfe-dev at lists.llvm.org
Mon Dec 2 11:42:12 PST 2019


I like this idea. I'm not sure a warning is the best way to surface it. The
first alternative that occurs to be would be the -ftime-trace JSON file, so
you can dump out complete info about the most often instantiated templates,
how many nodes they contained, and how many of them were dependent.

Which brings me to wonder, if clang already tracks
<https://github.com/llvm/llvm-project/blob/master/clang/include/clang/AST/Expr.h#L183>
whether an Expr is instantiation dependent, why can't it do this
optimization itself? I assume there are good reasons. I recall there are
some invariants about Decl or Expr pointer identity, but maybe those are
handled by AlwaysRebuild
<https://github.com/llvm/llvm-project/blob/master/clang/lib/Sema/TreeTransform.h#L140>.
Another thing that occurs to me is that some semantic analysis or warnings
may not fire if the DeclContext of an Expr is dependent. I wonder what the
savings would be if we could enumerate those checks, store them in the
template pattern, and run only the checks that matter without re-traversing
the entire AST. The last reason I can think of is ADL. I think that would
defer name lookup for almost every CallExpr in a template.

On Mon, Dec 2, 2019 at 8:01 AM Brian Gesiak <modocache at gmail.com> wrote:

> I work on a C++ project for which compilation time is a significant
> concern. One of my colleagues was able to significantly shorten the
> time Clang took to compile our project, by manually outlining
> independently-typed code from large template functions.
>
> This makes intuitive sense to me, because when instantiating a
> template function, Clang traverses the body of the function. The
> longer the function body, the more nodes in the AST Clang has to
> traverse, and the more time it takes. Programmers can read the
> function and see that some statements in the function body remain the
> same no matter what types the function is instantiated with. By
> extracting these statements into a separate, non-template function,
> programmers can reduce the amount of nodes Clang must traverse.
>
> I created a contrived example that demonstrates how splitting up a
> long template function can improve compile time. (Beware, the files
> are large, I needed something that would take Clang a hefty amount of
> time to process.)
> https://gist.github.com/modocache/77b8ac09280c08bd88f84b92ff43a28b
>
> In the example above, 'example.cpp' defines a template function
> 'foo<T, U, V, W, X, Y, Z>', whose body is ~46k LoC. It then
> instantiates 'foo' 10 times, with 10 different combinations of
> template type parameters. In total, 'clang -c -O1 example.cpp -Xclang
> -disable-llvm-passes -Xclang -emit-llvm' takes ~35 seconds in total to
> compile. Each additional instantiation of 'foo' adds an additional ~3
> seconds to the total compile time.
>
> Only the last statement in 'foo' is dependent upon the template type
> parameters to 'foo'. 'example-outlined.cpp' moves ~46k LoC of
> independently-typed statements out of 'foo' and into a function named
> 'foo_prologue_outlined', and has 'foo' call 'foo_prologue_outlined'.
> 'foo_prologue_outlined' is not a template function. The result is
> identical program behavior, but a total compile time of just ~5
> seconds (~85% faster). Additional instantiations of 'foo' in
> 'example-outlined.cpp' cost almost no additional compile time.
>
> Although the functions in our project are not as long, some of them
> take significantly longer than 35 seconds to compile. By outlining
> independently-typed statements, we've been able to reduce compile time
> of some functions, from 300s to 200s (1/3rd faster). So, my colleagues
> and I are looking for other functions we can manually outline in order
> to reduce the amount of time Clang takes to compile our project. To
> this end, it would be handy if Clang could tell us, for example, “hey,
> I just instantiated 'bar<int, float, double>', but X% of the
> statements in that function did not require transformation,” where
> 'X%' is some threshold that could be set in the compiler invocation.
> For now I'm thinking the option to set this warning threshold could be
> called '-Wwasteful-template-threshold=' -- but I'm aware that sounds
> awkward, and I'd love suggestions for a better name.
>
> I think implementing this feature is possible by adding some state to
> TreeTransform, or the Clang template instantiators that derive from
> that class. But before I send a patch to do so, I'm curious if anyone
> has attempted such a thing before, or if anyone has thoughts or
> comments on this feature. I'd prefer not to spend time implementing
> this diagnostic in Clang if it's predestined to be rejected in code
> review, so please let me know what you think!
>
> (I've cc'ed some contributors who I think have worked in this space
> like @rnk, or those who might have better naming suggestions like
> @rtrieu.)
>
> - Brian Gesiak
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191202/ec5c1751/attachment.html>


More information about the cfe-dev mailing list