[cfe-dev] JumboSupport: making unity builds easier in Clang

Tue Apr 10 23:59:02 PDT 2018

On Wed, Apr 11, 2018 at 1:41 AM, Mostyn Bramley-Moore <mostynb at vewd.com>
wrote:

> On Tue, Apr 10, 2018 at 9:13 PM, Richard Smith via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> On 10 April 2018 at 10:05, Nico Weber via cfe-dev <cfe-dev at lists.llvm.org
>> > wrote:
>>
>>> On Tue, Apr 10, 2018 at 1:01 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Apr 10, 2018 at 9:58 AM Nico Weber <thakis at chromium.org> wrote:
>>>>
>>>>> On Tue, Apr 10, 2018 at 11:56 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 10, 2018 at 8:52 AM Mostyn Bramley-Moore <
>>>>>> mostynb at vewd.com> wrote:
>>>>>>
>>>>>>> On Tue, Apr 10, 2018 at 4:27 PM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I haven't looked at the patches in detail - but generally a jumbo
>>>>>>>> build feels like a bit of a workaround & maybe there are better long-term
>>>>>>>> solutions that might fit into the compiler. A few sort of background
>>>>>>>> questions:
>>>>>>>>
>>>>>>>> * Have you tried Clang header modules (
>>>>>>>> https://clang.llvm.org/docs/Modules.html )? (explicit (granted,
>>>>>>>> explicit might only be practical at the moment using Google's internal
>>>>>>>> version of Bazel - but you /might/ get some comparison numbers from a
>>>>>>>> Google Chrome developer) and implicit)
>>>>>>>>   * The doc talks about maybe disabling jumbo builds for a single
>>>>>>>> target for developer efficiency, with the risk that a header edit would
>>>>>>>> maybe be worse for the developer than the jumbo build - this is where
>>>>>>>> modules would help as well, since it doesn't have this tradeoff property of
>>>>>>>> two different dimensions of "more work" you have to choose from.
>>>>>>>>
>>>>>>>
>>>>>>> There are ways to minimise this- an earlier proprietary jumbo build
>>>>>>> system used at Opera would detect when you're modifying and rebuilding
>>>>>>> files, and compile these in "normal" mode.  This gave fast full/clean build
>>>>>>> times but also short modify+rebuild times.  We have not attempted to
>>>>>>> implement this in the Chromium Jumbo build configuration.
>>>>>>>
>>>>>>
>>>>>> Building that kind of infrastructure seems like a pretty big hammer
>>>>>> compared to modularizing the codebase...
>>>>>>
>>>>>
>>>>> Modularizing the codebase doesn't give you the same build time impact,
>>>>> linearizes your build more,
>>>>>
>>>>
>>>> Not sure I follow - it partially linearizes (as you say, due to the
>>>> module dependency rather than header dependency issue), as does the jumbo
>>>> build.
>>>>
>>>
>>> The jumbo build just needs to append a bunch of files, that's fast.
>>> Compiling a module isn't.
>>>
>>
>> Well, compiling a module is just appending a bunch of headers and
>> compiling them. It's just at a different layer of the graph.
>>
>>
>>> and slows down incremental builds.
>>>>>
>>>>
>>>> Compared to a traditional build? I wouldn't think so (I mean, yes,
>>>> reading/writing modules has some overhead - but also some gains) on
>>>> average. I'd expect slower builds if you modify a header at the very base
>>>> of the dependency (the STL), but beyond that I would've thought the
>>>> reading/writing modules overhead would be saved by reusing modules for
>>>> infrequently modified files (like the STL).
>>>>
>>>
>>> Say you touch some header foo.h. Previously, you needed to rebuild all
>>> cc files including it. Now you need to instead rebuild the module, and
>>> since the module has changed you now need to rebuild all cc files using any
>>> header in the module, not just the users of foo.h. That's potentially way
>>> more cc files.
>>>
>>
>> But say you touch some source file foo.cc. Previously, and with modules,
>> you just need to rebuild that cc file. With a unity build, you now instead
>> need to rebuild the concatenation of that .cc file and a bunch of others.
>> That's also potentially way more cc files. :)
>>
>> But measurements beat speculation here.
>>
>
> Here's one data point: on a non-ccache, non-distributed build on a fairly
> high end machine (20 CPU cores, 40 threads), I built a subset of Chromium
> (content_shell) in both jumbo and non-jumbo mode.  Then I picked a single
> source file that is in part of the tree that we have previously made
> jumbo-capable (content/public/renderer/browser_plugin_delegate.cc),
> touched it and timed how long the rebuilds would take in both jumbo and
> non-jumbo mode.  The target that this source file is part of has 16 source
> files in total. which is smaller than the default jumbo_merge_file_limit
> value of 50, so to rebuild this one source file in jumbo mode requires that
> we also rebuild the other 15 source files in this target, which will not be
> done in parallel since they're all in a single jumbo compilation unit- in
> other words this is a moderately bad scenario for jumbo.
>
> The non-jumbo rebuild + relink time on this machine was between 9 and 10
> seconds, and the jumbo rebuild + relink time was 23-24 seconds- a little
> more than double, but still nowhere near "time to grab a coffee while I
> wait" territory.  This time is easily won back in jumbo mode if you need to
> rebase on master, or build another target or configuration.
>
> If you find yourself in a modify/rebuild/retest loop in this code, you can
> try a workflow optimisation mentioned in Daniel Bratell's doc (and earlier
> in this thread): turn jumbo off for just this target but on for all others,
> and you only have a one-time overhead of regenerating ninja files (which is
> quick) plus rebuilding 15 source files once in parallel.  Then you only
> need to rebuild a single source file each time around the loop.
>
> I am currently running the same benchmark on a lower-specced machine, one
> which is more realistic for many developers: a 4 core / 8 thread CPU
> workstation, but the test setup is excruciatingly slow to prepare so I will
> have to report back tomorrow with the numbers.  I expect the rebuild times
> to be comparable, since this test cannot make use of multiple CPU cores
> simultaneously (other than maybe parallel linking).  But the clean-build
> time speedup for this configuration is known to be a big net win in terms
> of absolute time saved (jumbo builds something like ~3x faster than
> non-jumbo which take several hours).
>

Results on my 4c/8t reference machine:
non-jumbo rebuild + relink time: about 7 seconds
jumbo rebuild + relink time: about 18 seconds

So a slightly higher percentage increase than my larger machine, but lower
absolute time increase.

The storage systems in these two machines are wildly different, but I
suspect the main difference in this benchmark is the frequency of the cores
(higher in the lower-specced machine).

-Mostyn.

> Jumbo builds are not a solution that you should use blindly without
> confirming that they work for your codebase and workflow, but in some cases
> they clearly have enormous benefits.
>
> -Mostyn.
>
>
>> (wonder what the combination would be like - modularizing headers, and
>>>> also jumbo-ifying .cpp files together... - whether there's much to be saved
>>>> in the reading modules part of the work, reading them in fewer times - that
>>>> gets into some of the ideas of compiler as a service I guess)
>>>>
>>>>
>>>>> Even if it wasn't a lot more work to get modules going, it's not
>>>>> completely clear to me that that would address the use case that the people
>>>>> working on the jumbo build have.
>>>>>
>>>>>
>>>>>> (maybe still less work - but a lot of work to workaround things &
>>>>>> produce some rather quirky behavior (in terms of how the build functions
>>>>>> based on looking at exactly how the source files have changed & changing
>>>>>> the build action graph depending on that) - but enough that I'd be inclined
>>>>>> to reconsider going in the modular direction again)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> * I was going to ask about the lack of parallelism in a jumbo build
>>>>>>>> - but reading the doc I see it's not a 'full' jumbo build, but chunkifying
>>>>>>>> the build - so there's still some/enough parallelism. Cool :)
>>>>>>>>
>>>>>>>
>>>>>>> I have heard rumours of some codebases in the games industry using a
>>>>>>> single jumbo source file for the entire build, but this is generally
>>>>>>> considered to be taking things too far and not our intended use case.
>>>>>>>
>>>>>>
>>>>>> Ah, my understanding was that jumbo builds were often/mainly used for
>>>>>> optimized builds to get cross-module optimizations (LTO-esque) & so it'd be
>>>>>> likely to be the whole program.
>>>>>>
>>>>>>
>>>>>>> The size of Chromium's jumbo compilation units is tunable- you can
>>>>>>> simply #include fewer real source files per jumbo source file- the bigger
>>>>>>> your build farm is, the smaller you want this number to be.  The optimal
>>>>>>> setup depends on things like the shape of the dependency graph and the
>>>>>>> relative costs of the original source files.  IIRC we currently only have
>>>>>>> build-wide "jumbo_file_merge_limit" setting, though that might have changed
>>>>>>> since I last looked (V8 would benefit from this, since its source files
>>>>>>> compile more slowly than most Chromium source files).
>>>>>>>
>>>>>>>
>>>>>>> -Mostyn.
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Apr 10, 2018 at 5:12 AM Mostyn Bramley-Moore via cfe-dev <
>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Hi,I am a member of a small group of Chromium developers who are
>>>>>>>>> working on adding a unity build[1] setup to Chromium[2], in order to reduce
>>>>>>>>> the project's long and ever-increasing compile times.  We're calling these
>>>>>>>>> "jumbo" builds, because this term is not as overloaded as "unity".We're
>>>>>>>>> slowly making progress, but find that a lot of our time is spent renaming
>>>>>>>>> things in anonymous namespaces- it would be much simpler if it was possible
>>>>>>>>> to automatically treat these as if they were file-local.   Jens Widell has
>>>>>>>>> put together a proof-of-concept which appears to work reasonably well, it
>>>>>>>>> consists of a clang plugin and a small clang
>>>>>>>>> patch:https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1
>>>>>>>>> <https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1>https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f
>>>>>>>>> <https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f>After
>>>>>>>>> building clang and the plugin, you generate jumbo source files that look
>>>>>>>>> like:jumbo_source_1.cc:#pragma jumbo#include
>>>>>>>>> "real_source_file_1.cc"#include "real_source_file_2.cc"#include
>>>>>>>>> "real_source_file_3.cc"Then, you compile something like this:clang++ -c
>>>>>>>>> jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang
>>>>>>>>> -add-plugin -Xclang jumbo-supportThe plugin gives unique names[3] to the
>>>>>>>>> anonymous namespaces without otherwise changing their semantics, and also
>>>>>>>>> #undef's macros defined in each top-level source file before processing the
>>>>>>>>> next top-level source file.  That way header files can still define macros
>>>>>>>>> that are used in multiple source files in the jumbo translation unit.
>>>>>>>>> Collisions between macros defined in header files and names used in other
>>>>>>>>> headers and other source files are still possible, but less likely.To show
>>>>>>>>> how much these two changes help, here's a patch to make Chromium's network
>>>>>>>>> code build in jumbo
>>>>>>>>> mode:https://chromium-review.googlesource.com/c/chromium/src/+/966523
>>>>>>>>> <https://chromium-review.googlesource.com/c/chromium/src/+/966523>
>>>>>>>>> (+352/-377 lines)And here's the corresponding patch using the
>>>>>>>>> proof-of-concept JumboSupport
>>>>>>>>> plugin:https://chromium-review.googlesource.com/c/chromium/src/+/962062
>>>>>>>>> <https://chromium-review.googlesource.com/c/chromium/src/+/962062> (+53/-52
>>>>>>>>> lines)It seems clear that the version using the JumboSupport plugin would
>>>>>>>>> require less effort to create, review and merge into the codebase.  We have
>>>>>>>>> a few other feature ideas, but these two changes seem to do most of the
>>>>>>>>> work for us.So now we're trying to figure out the best way forward- would a
>>>>>>>>> feature like this be welcome to the Clang project?  And if so, how would
>>>>>>>>> you recommend that we go about it? We would prefer to do this in a way that
>>>>>>>>> does not require a locally patched Clang and could live with building a
>>>>>>>>> custom plugin, although implementing this entirely in Clang would be even
>>>>>>>>> better.*
>>>>>>>>>
>>>>>>>>
>> I've been thinking about ways to get the benefits of unity builds without
>> the semantic changes. With the functionality we introduced for
>> -fmodules-local-submodule-visibility, we have the abililty to parse one
>> file, then make it "invisible" and parse another file, skipping all the
>> repeated parts from the two parses, which would give us some (maybe most)
>> of the performance benefit of unity builds without the semantic changes.
>> (This is not quite as good as a unity build: you'd still repeatedly lex and
>> preprocess the files #included into both source files. We could implicitly
>> treat header files with include guards as being "modular" to get the
>> performance back, but then you also get back some of the semantic changes.)
>>
>>
>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Thanks,-Mostyn.[1] If you're not familiar with unity builds, the
>>>>>>>>> idea is to compile multiple source files per compiler invocation, reducing
>>>>>>>>> the overhead of processing header files (which can be surprisingly high).
>>>>>>>>> We do this by taking a list of the source files in a target and generating
>>>>>>>>> "jumbo" source files that #include multiple "real" source files, and then
>>>>>>>>> we feed these jumbo files to the compiler one at a time.  This way, we
>>>>>>>>> don't prevent the usage of valuable build tools like ccache and icecc that
>>>>>>>>> only support a single source file on the command line.[2] Daniel Bratell
>>>>>>>>> has a summary of our progress jumbo-ifying the Chromium codebase
>>>>>>>>> here:https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#
>>>>>>>>> <https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#>[3]
>>>>>>>>> The JumboSupport plugin assigns names to the anonymous namespaces in a
>>>>>>>>> given file:  foo::(anonymous namespace)::bar is replaced with a symbol name
>>>>>>>>> of the form foo::__anonymous_<number>::bar where <number> is unique to the
>>>>>>>>> file within the jumbo translation unit.  Due to the internal linkage of
>>>>>>>>> these symbols, <number> does not need to be unique across multiple object
>>>>>>>>> files/jumbo source files.*
>>>>>>>>> --
>>>>>>>>> Mostyn Bramley-Moore
>>>>>>>>> Vewd Software
>>>>>>>>> mostynb at vewd.com <mostynb at opera.com>
>>>>>>>>> _______________________________________________
>>>>>>>>> cfe-dev mailing list
>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Mostyn Bramley-Moore
>>>>>>> Vewd Software
>>>>>>> mostynb at vewd.com <mostynb at opera.com>
>>>>>>>
>>>>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>
>
> --
> Mostyn Bramley-Moore
> Vewd Software
> mostynb at vewd.com <mostynb at opera.com>
>

-- 
Mostyn Bramley-Moore
Vewd Software
mostynb at vewd.com <mostynb at opera.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180411/883ca0c5/attachment.html>