r241620 - Wrap clang modules and pch files in an object file container.

Mon Jul 13 19:25:14 PDT 2015

On Mon, Jul 13, 2015 at 6:02 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Jul 13, 2015, at 5:47 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Mon, Jul 13, 2015 at 3:06 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>> > On Jul 13, 2015, at 2:00 PM, Eric Christopher <echristo at gmail.com>
>> wrote:
>> >
>> > Hi Adrian,
>> >
>> > Finally getting around to looking at some of this and I think it's
>> going in slightly the wrong direction. In general I think begin -able- to
>> put modules in object files to simplify wrapping, use, etc is a good thing.
>> I think being required to do so is somewhat problematic.
>> >
>>
>> Let me start with that the current infrastructure already allows
>> selecting whether you want wrapped modules or not by passing the
>> appropriate PCHContainerOperations object to CompilerInstance. Clang
>> currently unconditionally uses an object file wrapper, all of
>> clang-tools-extra doesn’t. We could easily control the behavior of clang
>> based on a (new) command line option.
>>
>> But.. on a platform with a shared module cache you always have to assume
>> that a module once built will eventually be used by a client that wants to
>> read the debug info. Think llvm-dsymutil — it does not know and does not
>> want to know how to build clang modules, but does want to read all the
>> debug info from a clang module.
>>
>> > Imagine, for example, you have a giant distributed build system...
>> >
>> > You'd want to create a pile of modules (that may reference/include/etc
>> other modules) that aren't don't or may not have debug information as part
>> of them (because you might want to build without it or have the debug info
>> alongside it as a separate compilation). Waiting on the full build of the
>> module including debug is going to adversely affect your overall build time
>> and so shouldn't be necessary - especially if you want to be able to have
>> information separate ultimately.
>> >
>> > Make sense?
>>
>> Not sure if you would be saving much by having the debug info separately,
>> from what I’ve measured so far the debug info for a module makes up less
>> than 10% of the total size. Admittedly, build-time-wise going through the
>> backend to emit the object file is a lot more expensive than just dumping
>> the raw PCH. [1]
>>
>> Yeah, I think wanting to be able to control the behavior is reasonable,
>> we just need to be careful what the implications for consumers are. If we
>> add a, e.g., an “-fraw-modules” [2] or switch to clang to turn off the
>> object file wrapping, I’d strongly suggest that we add the value of this
>> switch to the module hash (or add a an optional “-g” to the module file
>> name after the hash or something like that) to avoid ugly race conditions
>> between debug info and non-debug-info builds of the same module. This way
>> we’d have essentially two separate module caches, with and without debug
>> info.
>>
>
> That's fine, I think (we don't use a module cache at all in our build
> system; it doesn't really make much sense for a distributed build) and most
> command-line flag changes already have this effect.
>
>
> Great!
>
>
>
>> would that work for you?
>> -- adrian
>>
>> [1] If you want to be serious about building the module debug info in
>> parallel to the rest of the build, you could even have a clang-based tool
>> import the just-built raw clang module and emit the debug info without
>> having to parse the headers again :-)
>>
>
> That is what we intend to do :) (Assuming this turns out to actually be
> faster than re-parsing; faulting in the entire contents of a module has
> much worse locality than parsing.)
>
> [2] -fraw-modules, -fmodule-format-raw, -fmodule-debug-info, ...?
>>     I would imagine that the driver enables module debug info when
>> "-gmodules” is present and by default on Darwin.
>
>
> That seems reasonable to me. For the frontend flag, I think a flag to turn
> this on or to select the module format makes more sense than a flag to
> switch to the raw format.
>
>
> Okay then let’s narrow this down. Other possibilities in that direction
> include (sorted from subjectively best to worst)
>
> -fmodule-format=obj
> -fmodule-debug-info
> -ffat-modules
> -fmodule-container
> -fmodule-container-object
>

It's a -cc1 flag, so it doesn't really matter much. If this will eventually
govern whether we put code for inline functions into the module, then I
think we should avoid names like -fmodule-debug-info. Other than that, I
don't really have a preference.

> [One other thing... I think we may have made a mistake by putting the
> reader and writer code behind the same interface: it forces tools that want
> to read the module format to link against all of LLVM IR, code generation,
> and so on, when all they really need is something like libObject.]
>
>
> We can always split it into two implementations of the interface or two
> interfaces, that’s not a very big deal. My assumption was that every tool
> that wants to read the clang module format also wants to create modules
> (because module cache... but as you noted that’s a Darwin-centric view) and
> more low-level tools like llvm-bcanalyzer could be piped through
> llvm-objdump.
>
> -- adrian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150713/4676e5a3/attachment.html>