[cfe-dev] JumboSupport: making unity builds easier in Clang

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Tue Apr 10 12:13:21 PDT 2018

On 10 April 2018 at 10:05, Nico Weber via cfe-dev <cfe-dev at lists.llvm.org>

> On Tue, Apr 10, 2018 at 1:01 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> On Tue, Apr 10, 2018 at 9:58 AM Nico Weber <thakis at chromium.org> wrote:
>>> On Tue, Apr 10, 2018 at 11:56 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>> On Tue, Apr 10, 2018 at 8:52 AM Mostyn Bramley-Moore <mostynb at vewd.com>
>>>> wrote:
>>>>> On Tue, Apr 10, 2018 at 4:27 PM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>> I haven't looked at the patches in detail - but generally a jumbo
>>>>>> build feels like a bit of a workaround & maybe there are better long-term
>>>>>> solutions that might fit into the compiler. A few sort of background
>>>>>> questions:
>>>>>> * Have you tried Clang header modules (
>>>>>> https://clang.llvm.org/docs/Modules.html )? (explicit (granted,
>>>>>> explicit might only be practical at the moment using Google's internal
>>>>>> version of Bazel - but you /might/ get some comparison numbers from a
>>>>>> Google Chrome developer) and implicit)
>>>>>>   * The doc talks about maybe disabling jumbo builds for a single
>>>>>> target for developer efficiency, with the risk that a header edit would
>>>>>> maybe be worse for the developer than the jumbo build - this is where
>>>>>> modules would help as well, since it doesn't have this tradeoff property of
>>>>>> two different dimensions of "more work" you have to choose from.
>>>>> There are ways to minimise this- an earlier proprietary jumbo build
>>>>> system used at Opera would detect when you're modifying and rebuilding
>>>>> files, and compile these in "normal" mode.  This gave fast full/clean build
>>>>> times but also short modify+rebuild times.  We have not attempted to
>>>>> implement this in the Chromium Jumbo build configuration.
>>>> Building that kind of infrastructure seems like a pretty big hammer
>>>> compared to modularizing the codebase...
>>> Modularizing the codebase doesn't give you the same build time impact,
>>> linearizes your build more,
>> Not sure I follow - it partially linearizes (as you say, due to the
>> module dependency rather than header dependency issue), as does the jumbo
>> build.
> The jumbo build just needs to append a bunch of files, that's fast.
> Compiling a module isn't.

Well, compiling a module is just appending a bunch of headers and compiling
them. It's just at a different layer of the graph.

> and slows down incremental builds.
>> Compared to a traditional build? I wouldn't think so (I mean, yes,
>> reading/writing modules has some overhead - but also some gains) on
>> average. I'd expect slower builds if you modify a header at the very base
>> of the dependency (the STL), but beyond that I would've thought the
>> reading/writing modules overhead would be saved by reusing modules for
>> infrequently modified files (like the STL).
> Say you touch some header foo.h. Previously, you needed to rebuild all cc
> files including it. Now you need to instead rebuild the module, and since
> the module has changed you now need to rebuild all cc files using any
> header in the module, not just the users of foo.h. That's potentially way
> more cc files.

But say you touch some source file foo.cc. Previously, and with modules,
you just need to rebuild that cc file. With a unity build, you now instead
need to rebuild the concatenation of that .cc file and a bunch of others.
That's also potentially way more cc files. :)

But measurements beat speculation here.

> (wonder what the combination would be like - modularizing headers, and
>> also jumbo-ifying .cpp files together... - whether there's much to be saved
>> in the reading modules part of the work, reading them in fewer times - that
>> gets into some of the ideas of compiler as a service I guess)
>>> Even if it wasn't a lot more work to get modules going, it's not
>>> completely clear to me that that would address the use case that the people
>>> working on the jumbo build have.
>>>> (maybe still less work - but a lot of work to workaround things &
>>>> produce some rather quirky behavior (in terms of how the build functions
>>>> based on looking at exactly how the source files have changed & changing
>>>> the build action graph depending on that) - but enough that I'd be inclined
>>>> to reconsider going in the modular direction again)
>>>>>> * I was going to ask about the lack of parallelism in a jumbo build -
>>>>>> but reading the doc I see it's not a 'full' jumbo build, but chunkifying
>>>>>> the build - so there's still some/enough parallelism. Cool :)
>>>>> I have heard rumours of some codebases in the games industry using a
>>>>> single jumbo source file for the entire build, but this is generally
>>>>> considered to be taking things too far and not our intended use case.
>>>> Ah, my understanding was that jumbo builds were often/mainly used for
>>>> optimized builds to get cross-module optimizations (LTO-esque) & so it'd be
>>>> likely to be the whole program.
>>>>> The size of Chromium's jumbo compilation units is tunable- you can
>>>>> simply #include fewer real source files per jumbo source file- the bigger
>>>>> your build farm is, the smaller you want this number to be.  The optimal
>>>>> setup depends on things like the shape of the dependency graph and the
>>>>> relative costs of the original source files.  IIRC we currently only have
>>>>> build-wide "jumbo_file_merge_limit" setting, though that might have changed
>>>>> since I last looked (V8 would benefit from this, since its source files
>>>>> compile more slowly than most Chromium source files).
>>>>> -Mostyn.
>>>>>> On Tue, Apr 10, 2018 at 5:12 AM Mostyn Bramley-Moore via cfe-dev <
>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>> *Hi,I am a member of a small group of Chromium developers who are
>>>>>>> working on adding a unity build[1] setup to Chromium[2], in order to reduce
>>>>>>> the project's long and ever-increasing compile times.  We're calling these
>>>>>>> "jumbo" builds, because this term is not as overloaded as "unity".We're
>>>>>>> slowly making progress, but find that a lot of our time is spent renaming
>>>>>>> things in anonymous namespaces- it would be much simpler if it was possible
>>>>>>> to automatically treat these as if they were file-local.   Jens Widell has
>>>>>>> put together a proof-of-concept which appears to work reasonably well, it
>>>>>>> consists of a clang plugin and a small clang
>>>>>>> patch:https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1
>>>>>>> <https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1>https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f
>>>>>>> <https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f>After
>>>>>>> building clang and the plugin, you generate jumbo source files that look
>>>>>>> like:jumbo_source_1.cc:#pragma jumbo#include
>>>>>>> "real_source_file_1.cc"#include "real_source_file_2.cc"#include
>>>>>>> "real_source_file_3.cc"Then, you compile something like this:clang++ -c
>>>>>>> jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang
>>>>>>> -add-plugin -Xclang jumbo-supportThe plugin gives unique names[3] to the
>>>>>>> anonymous namespaces without otherwise changing their semantics, and also
>>>>>>> #undef's macros defined in each top-level source file before processing the
>>>>>>> next top-level source file.  That way header files can still define macros
>>>>>>> that are used in multiple source files in the jumbo translation unit.
>>>>>>> Collisions between macros defined in header files and names used in other
>>>>>>> headers and other source files are still possible, but less likely.To show
>>>>>>> how much these two changes help, here's a patch to make Chromium's network
>>>>>>> code build in jumbo
>>>>>>> mode:https://chromium-review.googlesource.com/c/chromium/src/+/966523
>>>>>>> <https://chromium-review.googlesource.com/c/chromium/src/+/966523>
>>>>>>> (+352/-377 lines)And here's the corresponding patch using the
>>>>>>> proof-of-concept JumboSupport
>>>>>>> plugin:https://chromium-review.googlesource.com/c/chromium/src/+/962062
>>>>>>> <https://chromium-review.googlesource.com/c/chromium/src/+/962062> (+53/-52
>>>>>>> lines)It seems clear that the version using the JumboSupport plugin would
>>>>>>> require less effort to create, review and merge into the codebase.  We have
>>>>>>> a few other feature ideas, but these two changes seem to do most of the
>>>>>>> work for us.So now we're trying to figure out the best way forward- would a
>>>>>>> feature like this be welcome to the Clang project?  And if so, how would
>>>>>>> you recommend that we go about it? We would prefer to do this in a way that
>>>>>>> does not require a locally patched Clang and could live with building a
>>>>>>> custom plugin, although implementing this entirely in Clang would be even
>>>>>>> better.*
I've been thinking about ways to get the benefits of unity builds without
the semantic changes. With the functionality we introduced for
-fmodules-local-submodule-visibility, we have the abililty to parse one
file, then make it "invisible" and parse another file, skipping all the
repeated parts from the two parses, which would give us some (maybe most)
of the performance benefit of unity builds without the semantic changes.
(This is not quite as good as a unity build: you'd still repeatedly lex and
preprocess the files #included into both source files. We could implicitly
treat header files with include guards as being "modular" to get the
performance back, but then you also get back some of the semantic changes.)

>>>>>>> *Thanks,-Mostyn.[1] If you're not familiar with unity builds, the
>>>>>>> idea is to compile multiple source files per compiler invocation, reducing
>>>>>>> the overhead of processing header files (which can be surprisingly high).
>>>>>>> We do this by taking a list of the source files in a target and generating
>>>>>>> "jumbo" source files that #include multiple "real" source files, and then
>>>>>>> we feed these jumbo files to the compiler one at a time.  This way, we
>>>>>>> don't prevent the usage of valuable build tools like ccache and icecc that
>>>>>>> only support a single source file on the command line.[2] Daniel Bratell
>>>>>>> has a summary of our progress jumbo-ifying the Chromium codebase
>>>>>>> here:https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#
>>>>>>> <https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#>[3]
>>>>>>> The JumboSupport plugin assigns names to the anonymous namespaces in a
>>>>>>> given file:  foo::(anonymous namespace)::bar is replaced with a symbol name
>>>>>>> of the form foo::__anonymous_<number>::bar where <number> is unique to the
>>>>>>> file within the jumbo translation unit.  Due to the internal linkage of
>>>>>>> these symbols, <number> does not need to be unique across multiple object
>>>>>>> files/jumbo source files.*
>>>>>>> --
>>>>>>> Mostyn Bramley-Moore
>>>>>>> Vewd Software
>>>>>>> mostynb at vewd.com <mostynb at opera.com>
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>> --
>>>>> Mostyn Bramley-Moore
>>>>> Vewd Software
>>>>> mostynb at vewd.com <mostynb at opera.com>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180410/93ebf88e/attachment.html>

More information about the cfe-dev mailing list