r241620 - Wrap clang modules and pch files in an object file container.

Wed Jul 15 12:54:14 PDT 2015

Here is a patch that implements a -fmodule-format=[obj,raw] option. I have not yet implemented adding the module format to the module hash or filename.
The default setting is obj (based on the assumption that this is most beneficial to platforms with a shared module cache) and the driver adds an explicit -fmodule-format=raw on Linux. I am not sure whether having the driver emit this option for Linux is a good idea: Are explicit module builds a feature of the Google build system, or are they a Linux platform feature? Currently the only way to enable obj-wrapped modules on Linux is to pass a cc1 option. Even with -fmodule-format=raw specified, clang can still read obj-wrapped modules. 
I’m not in love with the actual implementation, so suggestions and feedback are very welcome!

— adrian

> On Jul 13, 2015, at 7:25 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> 
> On Mon, Jul 13, 2015 at 6:02 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
> 
>> On Jul 13, 2015, at 5:47 PM, Richard Smith <richard at metafoo.co.uk <mailto:richard at metafoo.co.uk>> wrote:
>> 
>> On Mon, Jul 13, 2015 at 3:06 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>> > On Jul 13, 2015, at 2:00 PM, Eric Christopher <echristo at gmail.com <mailto:echristo at gmail.com>> wrote:
>> >
>> > Hi Adrian,
>> >
>> > Finally getting around to looking at some of this and I think it's going in slightly the wrong direction. In general I think begin -able- to put modules in object files to simplify wrapping, use, etc is a good thing. I think being required to do so is somewhat problematic.
>> >
>> 
>> Let me start with that the current infrastructure already allows selecting whether you want wrapped modules or not by passing the appropriate PCHContainerOperations object to CompilerInstance. Clang currently unconditionally uses an object file wrapper, all of clang-tools-extra doesn’t. We could easily control the behavior of clang based on a (new) command line option.
>> 
>> But.. on a platform with a shared module cache you always have to assume that a module once built will eventually be used by a client that wants to read the debug info. Think llvm-dsymutil — it does not know and does not want to know how to build clang modules, but does want to read all the debug info from a clang module.
>> 
>> > Imagine, for example, you have a giant distributed build system...
>> >
>> > You'd want to create a pile of modules (that may reference/include/etc other modules) that aren't don't or may not have debug information as part of them (because you might want to build without it or have the debug info alongside it as a separate compilation). Waiting on the full build of the module including debug is going to adversely affect your overall build time and so shouldn't be necessary - especially if you want to be able to have information separate ultimately.
>> >
>> > Make sense?
>> 
>> Not sure if you would be saving much by having the debug info separately, from what I’ve measured so far the debug info for a module makes up less than 10% of the total size. Admittedly, build-time-wise going through the backend to emit the object file is a lot more expensive than just dumping the raw PCH. [1]
>> 
>> Yeah, I think wanting to be able to control the behavior is reasonable, we just need to be careful what the implications for consumers are. If we add a, e.g., an “-fraw-modules” [2] or switch to clang to turn off the object file wrapping, I’d strongly suggest that we add the value of this switch to the module hash (or add a an optional “-g” to the module file name after the hash or something like that) to avoid ugly race conditions between debug info and non-debug-info builds of the same module. This way we’d have essentially two separate module caches, with and without debug info.
>> 
>> That's fine, I think (we don't use a module cache at all in our build system; it doesn't really make much sense for a distributed build) and most command-line flag changes already have this effect.
> 
> Great!
>>  
>> would that work for you?
>> -- adrian
>> 
>> [1] If you want to be serious about building the module debug info in parallel to the rest of the build, you could even have a clang-based tool import the just-built raw clang module and emit the debug info without having to parse the headers again :-)
>> 
>> That is what we intend to do :) (Assuming this turns out to actually be faster than re-parsing; faulting in the entire contents of a module has much worse locality than parsing.)
>> 
>> [2] -fraw-modules, -fmodule-format-raw, -fmodule-debug-info, ...?
>>     I would imagine that the driver enables module debug info when "-gmodules” is present and by default on Darwin.
>> 
>> That seems reasonable to me. For the frontend flag, I think a flag to turn this on or to select the module format makes more sense than a flag to switch to the raw format.
> 
> Okay then let’s narrow this down. Other possibilities in that direction include (sorted from subjectively best to worst)
> 
> -fmodule-format=obj
> -fmodule-debug-info
> -ffat-modules
> -fmodule-container
> -fmodule-container-object
> 
> It's a -cc1 flag, so it doesn't really matter much. If this will eventually govern whether we put code for inline functions into the module, then I think we should avoid names like -fmodule-debug-info. Other than that, I don't really have a preference.
>> [One other thing... I think we may have made a mistake by putting the reader and writer code behind the same interface: it forces tools that want to read the module format to link against all of LLVM IR, code generation, and so on, when all they really need is something like libObject.]
> 
> We can always split it into two implementations of the interface or two interfaces, that’s not a very big deal. My assumption was that every tool that wants to read the clang module format also wants to create modules (because module cache... but as you noted that’s a Darwin-centric view) and more low-level tools like llvm-bcanalyzer could be piped through llvm-objdump.
> 
> -- adrian
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150715/ff269e70/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: module-format-select.diff
Type: application/octet-stream
Size: 28592 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150715/ff269e70/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150715/ff269e70/attachment-0001.html>