[cfe-dev] JumboSupport: making unity builds easier in Clang

Thu Apr 26 16:23:31 PDT 2018

Hans, Richard, and I spent some more time discussing this today, and we
came to the conclusion that this could absolutely be built partially with
existing modules functionality. In this case, by "module" I'm not referring
to a chunk of serialized AST, I'm just referring to the in-memory data
structures that clang uses to control name lookup.

The idea is that each .cpp file can be its own module, and all headers
would be part of a global module. Each .cpp file is only allowed to look up
names in the global module. My understanding is that this is where
-fmodules-local-submodules-visibility comes into play, although I'm not
clear on the details. This symbol hiding is the first part of what jumbo
needs, and it's actually implemented similarly to the way it was done in
the JumboSupport patch on github. It's basically filtering out declarations
that aren't supposed to be visible during name lookup.

The second part is avoiding name mangling collisions. It seemed pretty
simple to us to extend both name manglers to include a unique module id in
the names of all internal linkage symbols, so 'static int f() { return 42;
}' becomes _ZL1fv.1 (add .1, .2, etc). c++filt already knows how to
demangle those, so that will just work. This wouldn't break any existing
users, because after all, these are things with internal linkage, the names
shouldn't matter as long as they look nice in the debugger.

The last thing is to make it so that all included headers not listed in the
jumbo file (or perhaps on the command line) are in one global module. We
weren't able to find a way to express this today with module maps, but I
don't think it would be too hard to do.

---

We also discussed how we could, in the long run, get the compile time
benefits of jumbo builds without the semantic changes. The basic idea is
that every "modular header", i.e. a header that can successfully parse by
itself with only command line macros defined, could be its own module.
Again, we're not talking about AST serialization, just changing name lookup
rules. It's just a module for name lookup purposes. In order for this to
work, all code needs to follow very strict include-what-you-use rules:
transitive includes wouldn't be visible from indirect users of a header.
Obviously, we are not in this world today, but it's one we could work
towards.

Once the codebase follows IWYU, then it shouldn't matter (barring bugs, of
which I'm sure there will be many) what the jumbo factor is. Ignoring
resource exhaustion, a build that succeeds with a jumbo factor of 50 should
also succeed with a jumbo factor of 1. Devs can work locally with jumbo and
not worry about forgetting includes that they happen to get transitively.

On Tue, Apr 10, 2018 at 5:12 AM Mostyn Bramley-Moore via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Hi,I am a member of a small group of Chromium developers who are working
> on adding a unity build[1] setup to Chromium[2], in order to reduce the
> project's long and ever-increasing compile times.  We're calling these
> "jumbo" builds, because this term is not as overloaded as "unity".We're
> slowly making progress, but find that a lot of our time is spent renaming
> things in anonymous namespaces- it would be much simpler if it was possible
> to automatically treat these as if they were file-local.   Jens Widell has
> put together a proof-of-concept which appears to work reasonably well, it
> consists of a clang plugin and a small clang
> patch:https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1
> <https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1>https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f
> <https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f>After
> building clang and the plugin, you generate jumbo source files that look
> like:jumbo_source_1.cc:#pragma jumbo#include
> "real_source_file_1.cc"#include "real_source_file_2.cc"#include
> "real_source_file_3.cc"Then, you compile something like this:clang++ -c
> jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang
> -add-plugin -Xclang jumbo-supportThe plugin gives unique names[3] to the
> anonymous namespaces without otherwise changing their semantics, and also
> #undef's macros defined in each top-level source file before processing the
> next top-level source file.  That way header files can still define macros
> that are used in multiple source files in the jumbo translation unit.
> Collisions between macros defined in header files and names used in other
> headers and other source files are still possible, but less likely.To show
> how much these two changes help, here's a patch to make Chromium's network
> code build in jumbo
> mode:https://chromium-review.googlesource.com/c/chromium/src/+/966523
> <https://chromium-review.googlesource.com/c/chromium/src/+/966523>
> (+352/-377 lines)And here's the corresponding patch using the
> proof-of-concept JumboSupport
> plugin:https://chromium-review.googlesource.com/c/chromium/src/+/962062
> <https://chromium-review.googlesource.com/c/chromium/src/+/962062> (+53/-52
> lines)It seems clear that the version using the JumboSupport plugin would
> require less effort to create, review and merge into the codebase.  We have
> a few other feature ideas, but these two changes seem to do most of the
> work for us.So now we're trying to figure out the best way forward- would a
> feature like this be welcome to the Clang project?  And if so, how would
> you recommend that we go about it? We would prefer to do this in a way that
> does not require a locally patched Clang and could live with building a
> custom plugin, although implementing this entirely in Clang would be even
> better.Thanks,-Mostyn.[1] If you're not familiar with unity builds, the
> idea is to compile multiple source files per compiler invocation, reducing
> the overhead of processing header files (which can be surprisingly high).
> We do this by taking a list of the source files in a target and generating
> "jumbo" source files that #include multiple "real" source files, and then
> we feed these jumbo files to the compiler one at a time.  This way, we
> don't prevent the usage of valuable build tools like ccache and icecc that
> only support a single source file on the command line.[2] Daniel Bratell
> has a summary of our progress jumbo-ifying the Chromium codebase
> here:https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#
> <https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#>[3]
> The JumboSupport plugin assigns names to the anonymous namespaces in a
> given file:  foo::(anonymous namespace)::bar is replaced with a symbol name
> of the form foo::__anonymous_<number>::bar where <number> is unique to the
> file within the jumbo translation unit.  Due to the internal linkage of
> these symbols, <number> does not need to be unique across multiple object
> files/jumbo source files.*
> --
> Mostyn Bramley-Moore
> Vewd Software
> mostynb at vewd.com <mostynb at opera.com>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180426/bd61325e/attachment.html>