r241620 - Wrap clang modules and pch files in an object file container.

Tue Jul 14 08:25:52 PDT 2015

On Mon, Jul 13, 2015 at 7:25 PM, Richard Smith <richard at metafoo.co.uk>
wrote:

> On Mon, Jul 13, 2015 at 6:02 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Jul 13, 2015, at 5:47 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Mon, Jul 13, 2015 at 3:06 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>> > On Jul 13, 2015, at 2:00 PM, Eric Christopher <echristo at gmail.com>
>>> wrote:
>>> >
>>> > Hi Adrian,
>>> >
>>> > Finally getting around to looking at some of this and I think it's
>>> going in slightly the wrong direction. In general I think begin -able- to
>>> put modules in object files to simplify wrapping, use, etc is a good thing.
>>> I think being required to do so is somewhat problematic.
>>> >
>>>
>>> Let me start with that the current infrastructure already allows
>>> selecting whether you want wrapped modules or not by passing the
>>> appropriate PCHContainerOperations object to CompilerInstance. Clang
>>> currently unconditionally uses an object file wrapper, all of
>>> clang-tools-extra doesn’t. We could easily control the behavior of clang
>>> based on a (new) command line option.
>>>
>>> But.. on a platform with a shared module cache you always have to assume
>>> that a module once built will eventually be used by a client that wants to
>>> read the debug info. Think llvm-dsymutil — it does not know and does not
>>> want to know how to build clang modules, but does want to read all the
>>> debug info from a clang module.
>>>
>>> > Imagine, for example, you have a giant distributed build system...
>>> >
>>> > You'd want to create a pile of modules (that may reference/include/etc
>>> other modules) that aren't don't or may not have debug information as part
>>> of them (because you might want to build without it or have the debug info
>>> alongside it as a separate compilation). Waiting on the full build of the
>>> module including debug is going to adversely affect your overall build time
>>> and so shouldn't be necessary - especially if you want to be able to have
>>> information separate ultimately.
>>> >
>>> > Make sense?
>>>
>>> Not sure if you would be saving much by having the debug info
>>> separately, from what I’ve measured so far the debug info for a module
>>> makes up less than 10% of the total size. Admittedly, build-time-wise going
>>> through the backend to emit the object file is a lot more expensive than
>>> just dumping the raw PCH. [1]
>>>
>>> Yeah, I think wanting to be able to control the behavior is reasonable,
>>> we just need to be careful what the implications for consumers are. If we
>>> add a, e.g., an “-fraw-modules” [2] or switch to clang to turn off the
>>> object file wrapping, I’d strongly suggest that we add the value of this
>>> switch to the module hash (or add a an optional “-g” to the module file
>>> name after the hash or something like that) to avoid ugly race conditions
>>> between debug info and non-debug-info builds of the same module. This way
>>> we’d have essentially two separate module caches, with and without debug
>>> info.
>>>
>>
>> That's fine, I think (we don't use a module cache at all in our build
>> system; it doesn't really make much sense for a distributed build) and most
>> command-line flag changes already have this effect.
>>
>>
>> Great!
>>
>>
>>
>>> would that work for you?
>>> -- adrian
>>>
>>> [1] If you want to be serious about building the module debug info in
>>> parallel to the rest of the build, you could even have a clang-based tool
>>> import the just-built raw clang module and emit the debug info without
>>> having to parse the headers again :-)
>>>
>>
>> That is what we intend to do :) (Assuming this turns out to actually be
>> faster than re-parsing; faulting in the entire contents of a module has
>> much worse locality than parsing.)
>>
>> [2] -fraw-modules, -fmodule-format-raw, -fmodule-debug-info, ...?
>>>     I would imagine that the driver enables module debug info when
>>> "-gmodules” is present and by default on Darwin.
>>
>>
>> That seems reasonable to me. For the frontend flag, I think a flag to
>> turn this on or to select the module format makes more sense than a flag to
>> switch to the raw format.
>>
>>
>> Okay then let’s narrow this down. Other possibilities in that direction
>> include (sorted from subjectively best to worst)
>>
>> -fmodule-format=obj
>> -fmodule-debug-info
>> -ffat-modules
>> -fmodule-container
>> -fmodule-container-object
>>
>
> It's a -cc1 flag, so it doesn't really matter much. If this will
> eventually govern whether we put code for inline functions into the module,
> then I think we should avoid names like -fmodule-debug-info. Other than
> that, I don't really have a preference.
>

What you're picturing there is essentially a flag that would indicate if we
should build all module-related-object-things into the module, or not? That
seems like a useful broad flag (with an eventual corresponding compiler
mode where we pass another flag and explicitly pass just the module and say
"build a separate object with all the module-related-object-things - for
use in a non-implicit-cache build)

(Hmm, we're going to have a weird middle ground in here - where the IR for
the inline functions needs to go in the module itself (as an
available_externally definition for use in non-LTO compilations of
dependent object files) and then the
build-separate-module-related-object-things would turn those into (weak?)
definitions, compile them (& the debug info) into a separate object file,
to be linked in at the end)

Should this just be keyed/defaulted off implicit/explicit modules, or
orthogonal to that choice?

> [One other thing... I think we may have made a mistake by putting the
>> reader and writer code behind the same interface: it forces tools that want
>> to read the module format to link against all of LLVM IR, code generation,
>> and so on, when all they really need is something like libObject.]
>>
>>
>> We can always split it into two implementations of the interface or two
>> interfaces, that’s not a very big deal. My assumption was that every tool
>> that wants to read the clang module format also wants to create modules
>> (because module cache... but as you noted that’s a Darwin-centric view) and
>> more low-level tools like llvm-bcanalyzer could be piped through
>> llvm-objdump.
>>
>> -- adrian
>>
>
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150714/8973154a/attachment.html>