[cfe-dev] [RFC] Identifying wasteful template function bodies

Brian Gesiak via cfe-dev cfe-dev at lists.llvm.org
Mon Dec 2 08:48:32 PST 2019


Oops, sorry, I meant to send this to cfe-dev! Looking forward to any
and all advice/opinions, though. Thanks! - Brian

On Mon, Dec 2, 2019 at 8:00 AM Brian Gesiak <modocache at gmail.com> wrote:
>
> I work on a C++ project for which compilation time is a significant
> concern. One of my colleagues was able to significantly shorten the
> time Clang took to compile our project, by manually outlining
> independently-typed code from large template functions.
>
> This makes intuitive sense to me, because when instantiating a
> template function, Clang traverses the body of the function. The
> longer the function body, the more nodes in the AST Clang has to
> traverse, and the more time it takes. Programmers can read the
> function and see that some statements in the function body remain the
> same no matter what types the function is instantiated with. By
> extracting these statements into a separate, non-template function,
> programmers can reduce the amount of nodes Clang must traverse.
>
> I created a contrived example that demonstrates how splitting up a
> long template function can improve compile time. (Beware, the files
> are large, I needed something that would take Clang a hefty amount of
> time to process.)
> https://gist.github.com/modocache/77b8ac09280c08bd88f84b92ff43a28b
>
> In the example above, 'example.cpp' defines a template function
> 'foo<T, U, V, W, X, Y, Z>', whose body is ~46k LoC. It then
> instantiates 'foo' 10 times, with 10 different combinations of
> template type parameters. In total, 'clang -c -O1 example.cpp -Xclang
> -disable-llvm-passes -Xclang -emit-llvm' takes ~35 seconds in total to
> compile. Each additional instantiation of 'foo' adds an additional ~3
> seconds to the total compile time.
>
> Only the last statement in 'foo' is dependent upon the template type
> parameters to 'foo'. 'example-outlined.cpp' moves ~46k LoC of
> independently-typed statements out of 'foo' and into a function named
> 'foo_prologue_outlined', and has 'foo' call 'foo_prologue_outlined'.
> 'foo_prologue_outlined' is not a template function. The result is
> identical program behavior, but a total compile time of just ~5
> seconds (~85% faster). Additional instantiations of 'foo' in
> 'example-outlined.cpp' cost almost no additional compile time.
>
> Although the functions in our project are not as long, some of them
> take significantly longer than 35 seconds to compile. By outlining
> independently-typed statements, we've been able to reduce compile time
> of some functions, from 300s to 200s (1/3rd faster). So, my colleagues
> and I are looking for other functions we can manually outline in order
> to reduce the amount of time Clang takes to compile our project. To
> this end, it would be handy if Clang could tell us, for example, “hey,
> I just instantiated 'bar<int, float, double>', but X% of the
> statements in that function did not require transformation,” where
> 'X%' is some threshold that could be set in the compiler invocation.
> For now I'm thinking the option to set this warning threshold could be
> called '-Wwasteful-template-threshold=' -- but I'm aware that sounds
> awkward, and I'd love suggestions for a better name.
>
> I think implementing this feature is possible by adding some state to
> TreeTransform, or the Clang template instantiators that derive from
> that class. But before I send a patch to do so, I'm curious if anyone
> has attempted such a thing before, or if anyone has thoughts or
> comments on this feature. I'd prefer not to spend time implementing
> this diagnostic in Clang if it's predestined to be rejected in code
> review, so please let me know what you think!
>
> (I've cc'ed some contributors who I think have worked in this space
> like @rnk, or those who might have better naming suggestions like
> @rtrieu.)
>
> - Brian Gesiak



More information about the cfe-dev mailing list