[cfe-dev] JumboSupport: making unity builds easier in Clang

Jens Widell via cfe-dev cfe-dev at lists.llvm.org
Wed May 2 02:53:38 PDT 2018


On Fri, Apr 27, 2018 at 1:23 AM, Reid Kleckner <rnk at google.com> wrote:
> Hans, Richard, and I spent some more time discussing this today, and we came
> to the conclusion that this could absolutely be built partially with
> existing modules functionality. In this case, by "module" I'm not referring
> to a chunk of serialized AST, I'm just referring to the in-memory data
> structures that clang uses to control name lookup.
>
> The idea is that each .cpp file can be its own module, and all headers would
> be part of a global module. Each .cpp file is only allowed to look up names
> in the global module. My understanding is that this is where
> -fmodules-local-submodules-visibility comes into play, although I'm not
> clear on the details. This symbol hiding is the first part of what jumbo
> needs, and it's actually implemented similarly to the way it was done in the
> JumboSupport patch on github. It's basically filtering out declarations that
> aren't supposed to be visible during name lookup.

So, I've been playing around a bit. (Keep in mind here that I'm very
new to the clang code base, and even newer when it comes to anything
to do with modules.)

First off, general discussion:

I realize that the modules feature is the current feature of clang's
most suited to be extended for unity building, rather than adding
something completely new. But given that unity building could be seen
as something of a hack, applied to "legacy" code bases not yet taking
advantage of the shiny future where modules are usable... is it a good
idea to add complexity to the modules feature? Rather than treating
unity building support as a separate small feature, that is?

It seems the module feature can be tricked into doing some seemingly
useful things for unity building already (see below) but I have no
idea how much more would be needed. And I also don't know if "abusing"
a compiler feature this way is good long-term. Will this break as the
modules feature is being developed in clang, because we depended on
the internals of it? Will developing the modules feature in clang
become more difficult because it is being abused by the Chromium
project to do unity building?

(These are honest question; I'm not familiar enough with any of this
to claim to know the answers.)


Experiment report:

My test case is `test.cc`, which includes `test1.cc` and `test2.cc`,
where `test1.cc` and `test2.cc` each define a function named `f` in
the anonymous namespace, and define a function in the global namespace
(with different names) that call `f`. Normally, this of course fails
to compile due to conflicting definitions of `f`. They also contain
conflicting definitions of a macro, and both define `enum Foo { FOO
};` in the anonymous namespace, which of course also normally leads to
warnings/errors.

In my test, I'm calling clang with `-Xclang
-fmodules-local-submodule-visibility -fmodule-name=test
-fmodule-map-file=test.modulemap`.

My `test.modulemap` contains

 module test {
  module test1 {
   header "test1.cc"
  }
  module test2 {
   header "test2.cc"
  }
 }

And in `test.cc`, I've surrounded the includes with `#pragma clang
module begin test.testX`/`#pragma clang module end`.

So far, all of this seems doable in Chromium's jumbo mechanism; adding
compiler arguments is fine, and the module map file could be generated
alongside the source file that includes the real source files. And
generating some pragmas there is fine, of course.

The test result is that in `test2.cc`, lookup of `f` fails.
Specifically, `error: use of undeclared identifier 'f'`. But there are
no complaints about conflicting declarations.

However, if I expand this minimal test case a bit, by including a
common header (with an include guard) that declares a function
(`print()`) that both `test1.cc` and `test2.cc` call, things seem to
fall apart a bit. (But you did note that includes needed to be
handled.)

I then get errors like

  In file included from test.cc:15:
 ./test2.cc:12:3: error: declaration of 'print' must be imported from
module 'test.test1' before it is required
    print("second f()");
    ^
  ./test.h:4:6: note: previous declaration is here
  void print(const char*);
       ^

and

  In file included from test.cc:15:
  ./test2.cc:22:3: warning: ambiguous use of internal linkage
declaration 'f' defined in multiple modules
[-Wmodules-ambiguous-internal-linkage]
    f();
    ^
  ./test1.cc:11:6: note: declared here in module 'test.test1'
  void f() {
       ^
  ./test2.cc:11:6: note: declared here in module 'test.test2'
  void f() {
       ^

Oddly enough, `f` went from being undeclared in `test2.cc` to now
having an ambigious declaration. But maybe I just triggered an
"earlier" error that hid that one.


> The second part is avoiding name mangling collisions. It seemed pretty
> simple to us to extend both name manglers to include a unique module id in
> the names of all internal linkage symbols, so 'static int f() { return 42;
> }' becomes _ZL1fv.1 (add .1, .2, etc). c++filt already knows how to demangle
> those, so that will just work. This wouldn't break any existing users,
> because after all, these are things with internal linkage, the names
> shouldn't matter as long as they look nice in the debugger.

That seems like a nicer approach than mine, for sure.

--
Jens



More information about the cfe-dev mailing list