r241620 - Wrap clang modules and pch files in an object file container.

Wed Jul 15 13:16:59 PDT 2015

On Wed, Jul 15, 2015 at 12:54 PM, Adrian Prantl <aprantl at apple.com> wrote:

> Here is a patch that implements a -fmodule-format=[obj,raw] option. I have
> not yet implemented adding the module format to the module hash or filename.
>

Generally, we only have a -cc1 flag to switch to the non-default state.

> The default setting is obj (based on the assumption that this is most
> beneficial to platforms with a shared module cache) and the driver adds an
> explicit -fmodule-format=raw on Linux.
>

I think this is backwards; I think putting more stuff into the precompiled
module format should be opt-in rather than opt-out.

I am not sure whether having the driver emit this option for Linux is a
> good idea: Are explicit module builds a feature of the Google build system,
> or are they a Linux platform feature?
>

Neither; they're a Clang feature. Implicit module builds are a
compatibility feature for legacy build systems, and should be avoided
wherever possible because they introduce a performance hit, interact poorly
with distributed builds, behave badly if the cache gets cleared (especially
once we put debug info in modules), and so on.

Currently the only way to enable obj-wrapped modules on Linux is to pass a
> cc1 option. Even with -fmodule-format=raw specified, clang can still read
> obj-wrapped modules.
>

OK, but presumably we'll add -gmodules or similar at some point to resolve
that issue.

I’m not in love with the actual implementation, so suggestions and feedback
> are very welcome!
>

I assume the "if (1 || ..." was not intentional?

It doesn't seem ideal to have the top-level driver create the
wrapper-format handler, and then ignore that from the frontend code. That's
also not going to scale to module formats other than obj and raw. Is there
any other way we can deal with this without breaking the layering?

> — adrian
>
>
>
> On Jul 13, 2015, at 7:25 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Mon, Jul 13, 2015 at 6:02 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Jul 13, 2015, at 5:47 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Mon, Jul 13, 2015 at 3:06 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>> > On Jul 13, 2015, at 2:00 PM, Eric Christopher <echristo at gmail.com>
>>> wrote:
>>> >
>>> > Hi Adrian,
>>> >
>>> > Finally getting around to looking at some of this and I think it's
>>> going in slightly the wrong direction. In general I think begin -able- to
>>> put modules in object files to simplify wrapping, use, etc is a good thing.
>>> I think being required to do so is somewhat problematic.
>>> >
>>>
>>> Let me start with that the current infrastructure already allows
>>> selecting whether you want wrapped modules or not by passing the
>>> appropriate PCHContainerOperations object to CompilerInstance. Clang
>>> currently unconditionally uses an object file wrapper, all of
>>> clang-tools-extra doesn’t. We could easily control the behavior of clang
>>> based on a (new) command line option.
>>>
>>> But.. on a platform with a shared module cache you always have to assume
>>> that a module once built will eventually be used by a client that wants to
>>> read the debug info. Think llvm-dsymutil — it does not know and does not
>>> want to know how to build clang modules, but does want to read all the
>>> debug info from a clang module.
>>>
>>> > Imagine, for example, you have a giant distributed build system...
>>> >
>>> > You'd want to create a pile of modules (that may reference/include/etc
>>> other modules) that aren't don't or may not have debug information as part
>>> of them (because you might want to build without it or have the debug info
>>> alongside it as a separate compilation). Waiting on the full build of the
>>> module including debug is going to adversely affect your overall build time
>>> and so shouldn't be necessary - especially if you want to be able to have
>>> information separate ultimately.
>>> >
>>> > Make sense?
>>>
>>> Not sure if you would be saving much by having the debug info
>>> separately, from what I’ve measured so far the debug info for a module
>>> makes up less than 10% of the total size. Admittedly, build-time-wise going
>>> through the backend to emit the object file is a lot more expensive than
>>> just dumping the raw PCH. [1]
>>>
>>> Yeah, I think wanting to be able to control the behavior is reasonable,
>>> we just need to be careful what the implications for consumers are. If we
>>> add a, e.g., an “-fraw-modules” [2] or switch to clang to turn off the
>>> object file wrapping, I’d strongly suggest that we add the value of this
>>> switch to the module hash (or add a an optional “-g” to the module file
>>> name after the hash or something like that) to avoid ugly race conditions
>>> between debug info and non-debug-info builds of the same module. This way
>>> we’d have essentially two separate module caches, with and without debug
>>> info.
>>>
>>
>> That's fine, I think (we don't use a module cache at all in our build
>> system; it doesn't really make much sense for a distributed build) and most
>> command-line flag changes already have this effect.
>>
>>
>> Great!
>>
>>
>>
>>> would that work for you?
>>> -- adrian
>>>
>>> [1] If you want to be serious about building the module debug info in
>>> parallel to the rest of the build, you could even have a clang-based tool
>>> import the just-built raw clang module and emit the debug info without
>>> having to parse the headers again :-)
>>>
>>
>> That is what we intend to do :) (Assuming this turns out to actually be
>> faster than re-parsing; faulting in the entire contents of a module has
>> much worse locality than parsing.)
>>
>> [2] -fraw-modules, -fmodule-format-raw, -fmodule-debug-info, ...?
>>>     I would imagine that the driver enables module debug info when
>>> "-gmodules” is present and by default on Darwin.
>>
>>
>> That seems reasonable to me. For the frontend flag, I think a flag to
>> turn this on or to select the module format makes more sense than a flag to
>> switch to the raw format.
>>
>>
>> Okay then let’s narrow this down. Other possibilities in that direction
>> include (sorted from subjectively best to worst)
>>
>> -fmodule-format=obj
>> -fmodule-debug-info
>> -ffat-modules
>> -fmodule-container
>> -fmodule-container-object
>>
>
> It's a -cc1 flag, so it doesn't really matter much. If this will
> eventually govern whether we put code for inline functions into the module,
> then I think we should avoid names like -fmodule-debug-info. Other than
> that, I don't really have a preference.
>
>> [One other thing... I think we may have made a mistake by putting the
>> reader and writer code behind the same interface: it forces tools that want
>> to read the module format to link against all of LLVM IR, code generation,
>> and so on, when all they really need is something like libObject.]
>>
>>
>> We can always split it into two implementations of the interface or two
>> interfaces, that’s not a very big deal. My assumption was that every tool
>> that wants to read the clang module format also wants to create modules
>> (because module cache... but as you noted that’s a Darwin-centric view) and
>> more low-level tools like llvm-bcanalyzer could be piped through
>> llvm-objdump.
>>
>> -- adrian
>>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150715/be009287/attachment.html>