[cfe-dev] JumboSupport: making unity builds easier in Clang

Tue Apr 10 12:40:30 PDT 2018

> you'd still repeatedly lex and preprocess the files #included into both
source files

That is where the high cost of translation units comes from, so I don't
think the 'abililty to parse one file, then make it "invisible"' will help
build performance. To be clear, the per-translation unit cost is not from
firing up the compiler, it's from parsing/lexing/preprocessing millions of
lines of header files, and associated code generation.

> With a unity build, you now instead need to rebuild the concatenation of
that .cc file and a bunch of others.

True. But a pragmatic unity/jumbo build system understands and manages this
risk, by keeping the number of source files that are #included down to a
reasonable level. Even when jumbo concatenates 50 source files together the
compilation cost for that blob is *far* less than 50 times the cost of
compiling one file. It's an issue, to be sure, but not a fatal flaw.

On Tue, Apr 10, 2018 at 12:13 PM Richard Smith <richard at metafoo.co.uk>
wrote:

> On 10 April 2018 at 10:05, Nico Weber via cfe-dev <cfe-dev at lists.llvm.org>
> wrote:
>
>> On Tue, Apr 10, 2018 at 1:01 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Apr 10, 2018 at 9:58 AM Nico Weber <thakis at chromium.org> wrote:
>>>
>>>> On Tue, Apr 10, 2018 at 11:56 AM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 10, 2018 at 8:52 AM Mostyn Bramley-Moore <mostynb at vewd.com>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Apr 10, 2018 at 4:27 PM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I haven't looked at the patches in detail - but generally a jumbo
>>>>>>> build feels like a bit of a workaround & maybe there are better long-term
>>>>>>> solutions that might fit into the compiler. A few sort of background
>>>>>>> questions:
>>>>>>>
>>>>>>> * Have you tried Clang header modules (
>>>>>>> https://clang.llvm.org/docs/Modules.html )? (explicit (granted,
>>>>>>> explicit might only be practical at the moment using Google's internal
>>>>>>> version of Bazel - but you /might/ get some comparison numbers from a
>>>>>>> Google Chrome developer) and implicit)
>>>>>>>   * The doc talks about maybe disabling jumbo builds for a single
>>>>>>> target for developer efficiency, with the risk that a header edit would
>>>>>>> maybe be worse for the developer than the jumbo build - this is where
>>>>>>> modules would help as well, since it doesn't have this tradeoff property of
>>>>>>> two different dimensions of "more work" you have to choose from.
>>>>>>>
>>>>>>
>>>>>> There are ways to minimise this- an earlier proprietary jumbo build
>>>>>> system used at Opera would detect when you're modifying and rebuilding
>>>>>> files, and compile these in "normal" mode.  This gave fast full/clean build
>>>>>> times but also short modify+rebuild times.  We have not attempted to
>>>>>> implement this in the Chromium Jumbo build configuration.
>>>>>>
>>>>>
>>>>> Building that kind of infrastructure seems like a pretty big hammer
>>>>> compared to modularizing the codebase...
>>>>>
>>>>
>>>> Modularizing the codebase doesn't give you the same build time impact,
>>>> linearizes your build more,
>>>>
>>>
>>> Not sure I follow - it partially linearizes (as you say, due to the
>>> module dependency rather than header dependency issue), as does the jumbo
>>> build.
>>>
>>
>> The jumbo build just needs to append a bunch of files, that's fast.
>> Compiling a module isn't.
>>
>
> Well, compiling a module is just appending a bunch of headers and
> compiling them. It's just at a different layer of the graph.
>
>
>> and slows down incremental builds.
>>>>
>>>
>>> Compared to a traditional build? I wouldn't think so (I mean, yes,
>>> reading/writing modules has some overhead - but also some gains) on
>>> average. I'd expect slower builds if you modify a header at the very base
>>> of the dependency (the STL), but beyond that I would've thought the
>>> reading/writing modules overhead would be saved by reusing modules for
>>> infrequently modified files (like the STL).
>>>
>>
>> Say you touch some header foo.h. Previously, you needed to rebuild all cc
>> files including it. Now you need to instead rebuild the module, and since
>> the module has changed you now need to rebuild all cc files using any
>> header in the module, not just the users of foo.h. That's potentially way
>> more cc files.
>>
>
> But say you touch some source file foo.cc. Previously, and with modules,
> you just need to rebuild that cc file. With a unity build, you now instead
> need to rebuild the concatenation of that .cc file and a bunch of others.
> That's also potentially way more cc files. :)
>
> But measurements beat speculation here.
>
>
>> (wonder what the combination would be like - modularizing headers, and
>>> also jumbo-ifying .cpp files together... - whether there's much to be saved
>>> in the reading modules part of the work, reading them in fewer times - that
>>> gets into some of the ideas of compiler as a service I guess)
>>>
>>>
>>>> Even if it wasn't a lot more work to get modules going, it's not
>>>> completely clear to me that that would address the use case that the people
>>>> working on the jumbo build have.
>>>>
>>>>
>>>>> (maybe still less work - but a lot of work to workaround things &
>>>>> produce some rather quirky behavior (in terms of how the build functions
>>>>> based on looking at exactly how the source files have changed & changing
>>>>> the build action graph depending on that) - but enough that I'd be inclined
>>>>> to reconsider going in the modular direction again)
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> * I was going to ask about the lack of parallelism in a jumbo build
>>>>>>> - but reading the doc I see it's not a 'full' jumbo build, but chunkifying
>>>>>>> the build - so there's still some/enough parallelism. Cool :)
>>>>>>>
>>>>>>
>>>>>> I have heard rumours of some codebases in the games industry using a
>>>>>> single jumbo source file for the entire build, but this is generally
>>>>>> considered to be taking things too far and not our intended use case.
>>>>>>
>>>>>
>>>>> Ah, my understanding was that jumbo builds were often/mainly used for
>>>>> optimized builds to get cross-module optimizations (LTO-esque) & so it'd be
>>>>> likely to be the whole program.
>>>>>
>>>>>
>>>>>> The size of Chromium's jumbo compilation units is tunable- you can
>>>>>> simply #include fewer real source files per jumbo source file- the bigger
>>>>>> your build farm is, the smaller you want this number to be.  The optimal
>>>>>> setup depends on things like the shape of the dependency graph and the
>>>>>> relative costs of the original source files.  IIRC we currently only have
>>>>>> build-wide "jumbo_file_merge_limit" setting, though that might have changed
>>>>>> since I last looked (V8 would benefit from this, since its source files
>>>>>> compile more slowly than most Chromium source files).
>>>>>>
>>>>>>
>>>>>> -Mostyn.
>>>>>>
>>>>>>
>>>>>>> On Tue, Apr 10, 2018 at 5:12 AM Mostyn Bramley-Moore via cfe-dev <
>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Hi,I am a member of a small group of Chromium developers who are
>>>>>>>> working on adding a unity build[1] setup to Chromium[2], in order to reduce
>>>>>>>> the project's long and ever-increasing compile times.  We're calling these
>>>>>>>> "jumbo" builds, because this term is not as overloaded as "unity".We're
>>>>>>>> slowly making progress, but find that a lot of our time is spent renaming
>>>>>>>> things in anonymous namespaces- it would be much simpler if it was possible
>>>>>>>> to automatically treat these as if they were file-local.   Jens Widell has
>>>>>>>> put together a proof-of-concept which appears to work reasonably well, it
>>>>>>>> consists of a clang plugin and a small clang
>>>>>>>> patch:https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1
>>>>>>>> <https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1>https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f
>>>>>>>> <https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f>After
>>>>>>>> building clang and the plugin, you generate jumbo source files that look
>>>>>>>> like:jumbo_source_1.cc:#pragma jumbo#include
>>>>>>>> "real_source_file_1.cc"#include "real_source_file_2.cc"#include
>>>>>>>> "real_source_file_3.cc"Then, you compile something like this:clang++ -c
>>>>>>>> jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang
>>>>>>>> -add-plugin -Xclang jumbo-supportThe plugin gives unique names[3] to the
>>>>>>>> anonymous namespaces without otherwise changing their semantics, and also
>>>>>>>> #undef's macros defined in each top-level source file before processing the
>>>>>>>> next top-level source file.  That way header files can still define macros
>>>>>>>> that are used in multiple source files in the jumbo translation unit.
>>>>>>>> Collisions between macros defined in header files and names used in other
>>>>>>>> headers and other source files are still possible, but less likely.To show
>>>>>>>> how much these two changes help, here's a patch to make Chromium's network
>>>>>>>> code build in jumbo
>>>>>>>> mode:https://chromium-review.googlesource.com/c/chromium/src/+/966523
>>>>>>>> <https://chromium-review.googlesource.com/c/chromium/src/+/966523>
>>>>>>>> (+352/-377 lines)And here's the corresponding patch using the
>>>>>>>> proof-of-concept JumboSupport
>>>>>>>> plugin:https://chromium-review.googlesource.com/c/chromium/src/+/962062
>>>>>>>> <https://chromium-review.googlesource.com/c/chromium/src/+/962062> (+53/-52
>>>>>>>> lines)It seems clear that the version using the JumboSupport plugin would
>>>>>>>> require less effort to create, review and merge into the codebase.  We have
>>>>>>>> a few other feature ideas, but these two changes seem to do most of the
>>>>>>>> work for us.So now we're trying to figure out the best way forward- would a
>>>>>>>> feature like this be welcome to the Clang project?  And if so, how would
>>>>>>>> you recommend that we go about it? We would prefer to do this in a way that
>>>>>>>> does not require a locally patched Clang and could live with building a
>>>>>>>> custom plugin, although implementing this entirely in Clang would be even
>>>>>>>> better.*
>>>>>>>>
>>>>>>>
> I've been thinking about ways to get the benefits of unity builds without
> the semantic changes. With the functionality we introduced for
> -fmodules-local-submodule-visibility, we have the abililty to parse one
> file, then make it "invisible" and parse another file, skipping all the
> repeated parts from the two parses, which would give us some (maybe most)
> of the performance benefit of unity builds without the semantic changes.
> (This is not quite as good as a unity build: you'd still repeatedly lex and
> preprocess the files #included into both source files. We could implicitly
> treat header files with include guards as being "modular" to get the
> performance back, but then you also get back some of the semantic changes.)
>
>
>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Thanks,-Mostyn.[1] If you're not familiar with unity builds, the
>>>>>>>> idea is to compile multiple source files per compiler invocation, reducing
>>>>>>>> the overhead of processing header files (which can be surprisingly high).
>>>>>>>> We do this by taking a list of the source files in a target and generating
>>>>>>>> "jumbo" source files that #include multiple "real" source files, and then
>>>>>>>> we feed these jumbo files to the compiler one at a time.  This way, we
>>>>>>>> don't prevent the usage of valuable build tools like ccache and icecc that
>>>>>>>> only support a single source file on the command line.[2] Daniel Bratell
>>>>>>>> has a summary of our progress jumbo-ifying the Chromium codebase
>>>>>>>> here:https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#
>>>>>>>> <https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#>[3]
>>>>>>>> The JumboSupport plugin assigns names to the anonymous namespaces in a
>>>>>>>> given file:  foo::(anonymous namespace)::bar is replaced with a symbol name
>>>>>>>> of the form foo::__anonymous_<number>::bar where <number> is unique to the
>>>>>>>> file within the jumbo translation unit.  Due to the internal linkage of
>>>>>>>> these symbols, <number> does not need to be unique across multiple object
>>>>>>>> files/jumbo source files.*
>>>>>>>> --
>>>>>>>> Mostyn Bramley-Moore
>>>>>>>> Vewd Software
>>>>>>>> mostynb at vewd.com <mostynb at opera.com>
>>>>>>>> _______________________________________________
>>>>>>>> cfe-dev mailing list
>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mostyn Bramley-Moore
>>>>>> Vewd Software
>>>>>> mostynb at vewd.com <mostynb at opera.com>
>>>>>>
>>>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180410/abb520d4/attachment.html>