[cfe-dev] [PATCH] Wrap clang modules inside Mach-O/ELF/COFF containers

Fri Jan 9 17:27:37 PST 2015

On Fri, Jan 9, 2015 at 5:09 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Jan 9, 2015, at 5:02 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Fri, Jan 9, 2015 at 4:03 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>> On Jan 9, 2015, at 3:57 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Tue, Jan 6, 2015 at 10:07 AM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> > On Dec 12, 2014, at 8:47 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>> >
>>> >
>>> >> On Dec 12, 2014, at 5:37 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
>>> wrote:
>>> >>
>>> >>
>>> >>> On Dec 12, 2014, at 4:33 PM, Eric Christopher <echristo at gmail.com>
>>> wrote:
>>> >>>
>>> >>> Debug info for types isn't inherently a code generation concept. If
>>> you think about it, debug info for types is a stable (if lossy)
>>> serialization method for a module file. The line number etc for when
>>> there's code generated is a separate issue.
>>> >>
>>> >> I see what you mean, but it is a traditionally codegen product with a
>>> particular use-case, and it’s not reasonable to force it on every clang
>>> client that only wants to parse code, like libclang, static analyzers,
>>> migrators, refactoring tools, etc., or builds that didn’t ask for it.
>>> >
>>> > Good point, I tend to forget about non-compiler users of clang modules.
>>> >
>>> > If we do decide that having clang modules without debug info is
>>> desirable, and we want debug info to be generated lazily (only when needed)
>>> then putting it into a separate file is preferable, because it then can be
>>> captured as a dependency by build systems.
>>> >
>>> > It looks like at this point everyone’s argument is really depending on
>>> an assumption that emitting debug info is expensive (or really cheap!,
>>> respectively), so my suggestion is to revisit this thread once I actually
>>> have some numbers on how long it takes to emit debug info and how much
>>> space it takes up. I’ll try to get that done soon.
>>>
>>> Hi Argyrios,
>>>
>>> back from the break, here are the promised numbers to make our decision
>>> easier:
>>>
>>> I did an experiment where I patched clang to emit debug type info for
>>> each type (patch attached for the curious), and compiled an empty program
>>> that imports the Cocoa.h header. To compare the sizes I emitted the DWARF
>>> to a separate file:
>>>
>>> -rw-r--r--  1 adrian  staff  2151068 Dec 19 16:30
>>> Foundation-3QM1BFEPXW18W.pcm
>>> -rw-r--r--  1 adrian  staff   110772 Dec 19 16:30
>>> Foundation-3QM1BFEPXW18W.pcm.o
>>>
>>> here’s AppKit:
>>>
>>> -rw-r--r--  1 adrian  staff  3302744 Dec 19 16:40 AppKit-5HXLHEH4UB4M.pcm
>>> -rw-r--r--  1 adrian  staff   279080 Dec 19 16:40
>>> AppKit-5HXLHEH4UB4M.pcm.o
>>>
>>> The median of the size of the DWARF compared to the size of the pcm over
>>> all the modules pulled in by Cocoa.h is 5%; i.e., the DWARF would take up
>>> roughly 5% of the size of the individual modules.
>>>
>>> From these numbers I would argue that DWARF emission is comparatively
>>> cheap. To keep the implementation simple, I’d prefer to have everything in
>>> one file; this way we won’t have to introduce another layer of locking for
>>> creating the pcm.o files lazily, but if someone wants to point out that
>>> this is a lame excuse, be my guest ;-)
>>> [Another reason to argue for separate .pcm.o files is if we ever want to
>>> put something target-specific in there, such as code. Currently this is not
>>> the case,
>>
>>
>> I certainly have plans to do this, as mentioned previously on this thread.
>>
>>
>>> and even if we did this, we would still benefit from having the DWARF
>>> type information shared between the several .pcm.o files]
>>>
>>
>> Is there any disadvantage to having the debug information for a module
>> split over two .o files (one for the types and another for the inline
>> functions / template instantiations)?
>>
>>
>> I think that having it split is actually an advantage. By split I mean
>> having the .pcm which contains AST and the DWARF for the types ands then
>> several .pcm.o’s for each target that contains e.g., IR for inline
>> functions+debug info and the debug info in the various targets refers to
>> the shared DWARF type info in the .pcm. As far as the debug info is
>> concerned, we would use the same mechanisms for the .pcm.o files as we
>> would for any other object that imports the module.
>>
>
> OK, I'm fine with that (though in our case I think we'll want to turn this
> feature off and put all the DWARF output into the same file as the inline
> functions etc). Do you have a plan for supporting debug fission with this
> mode?
>
>
> Do you mean one .pcm that contains AST+DWARF+IR or one .pcm that contains
> the AST and one .pcm.o that contains IR+DWARF? The first one is
> straightforward. The latter would need some design because clang would need
> to know how to find the .pcm.o file when compiling a file that uses types
> defined in it.
>

I mean: suppose I want to use (implicit) modules and fission at the same
time. How does that work once these changes land and we put DWARF for types
into the .pcm file?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150109/ef168f17/attachment.html>