[cfe-dev] JumboSupport: making unity builds easier in Clang

Wed Apr 11 10:53:58 PDT 2018

See also: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Tenseconds.pdf

I started experimenting with a unity build of an LLVM/Clang-sized
proprietary project at my previous employer, and I found the basics
easy to get going. The hard part was massaging the code base to avoid
collisions, as indicated by the work by Mostyn & co.

I left the job before I had a chance to fully evaluate it, but
assuming I'd had something like `#pragma jumbo` to reduce the
friction, it might have been easier to get more data for less effort.

Mostyn/Daniel, do you have any gut feel/data on how much of the
problem a #pragma would solve? I suppose there are still constructs
that `#pragma jumbo` can't help with, that requires manual
intervention?

Also, Chromium is hardly a typical codebase, the little I've looked at
it, it's *extremely* clean and consistent, so it might be interesting
to try this on something else. Maybe LLVM itself would be an
interesting candidate.

- Kim

On Wed, Apr 11, 2018 at 7:08 PM, via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> If you want to share ASTs (an ephemeral structure) clang would need to do
> the distributing.  If you want to share IR of instantiated templates, you
> can do a shared database where clang is much less involved in managing the
> distribution.  Say the database key can be maybe a hash of the token stream
> of the template definition would work?  plus the template parameters.  Then
> you can pull precompiled IR out of the database (if you want to do
> optimizations) or make a reference to it (if you're doing LTO).
>
> --paulr
>
>
>
> From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of David
> Blaikie via cfe-dev
> Sent: Wednesday, April 11, 2018 11:09 AM
> To: David Chisnall
> Cc: Bruce Dawson; Daniel Cheng; richard at metafoo.co.uk;
> cfe-dev at lists.llvm.org; Daniel Bratell; Jens Widell
> Subject: Re: [cfe-dev] JumboSupport: making unity builds easier in Clang
>
>
>
> This would have issues with distributed builds, though, right? Unless clang
> then took on the burden of doing the distribution too, which might be a bit
> much.
>
> On Wed, Apr 11, 2018 at 12:43 AM David Chisnall via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>
> On 10 Apr 2018, at 21:28, Daniel Bratell via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>>
>> I've heard (hearsay, I admit) from profiling that it seems the single
>> largest time consumer in clang is template instantiation, something I assume
>> can't easily be prepared in advance.
>>
>> One example is chromium's chrome/browser/browser target which is 732 files
>> that normally need 6220 CPU seconds to compile, average 8,5 seconds per
>> file. All combined together gives a single translation unit that takes 400
>> seconds to compile, a mere 0.54 seconds on average per file. That indicates
>> that about 8 seconds per compiled file is related to the processing of
>> headers.
>
> It sounds as if there are two things here:
>
> 1. The time taken to parse the headers
> 2. The time taken to repeatedly instantiate templates that the linker will
> then discard
>
> Assuming a command line where all of the relevant source files are provided
> to the compiler invocation:
>
> Solving the first one is relatively easy if the files have a common prefix
> (which can be determined by simple string comparison).  Find the common
> prefix in the source files, build the clang AST, and then do a clone for
> each compilation unit.  Hopefully, the clone is a lot cheaper than
> re-parsing (and can ideally share source locations).
>
> The second is slightly more difficult, because it relies on sharing parts of
> the AST across notional compilation units.
>
> To make this work well with incremental builds, ideally you’d spit out all
> of the common template instantiations into a separate IR file, which could
> then be used with ThinLTO.
>
> Personally, I would prefer to have an interface where a build system can
> invoke clang with all of the files that need building and the degree of
> parallelism to use and let it share as much state as it wants across builds.
> In an ideal world, clang would record which templates have been instantiated
> in a prior build (or a previous build step in the current build) and avoid
> any IRGen for them, at the very least.
>
> Old C++ compilers, predating linker support for COMDATs, emitted templates
> lazily, simply emitting references to them, then parsing the linker errors
> and generating missing implementations until the linker errors went away.
> Modern C++ compilers generate many instantiations of the same templates and
> then discard most of them.  It would be nice to find an intermediate point,
> which worked well with ThinLTO, where templates could be emitted once and be
> available for inlining everywhere.
>
> David
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>