[cfe-dev] [PATCH] Wrap clang modules inside Mach-O/ELF/COFF containers

Richard Smith richard at metafoo.co.uk
Mon Jan 12 19:40:09 PST 2015


On Mon, Jan 12, 2015 at 2:11 PM, David Blaikie <dblaikie at gmail.com> wrote:

> On Mon, Jan 12, 2015 at 1:56 PM, Richard Smith <richard at metafoo.co.uk>
> wrote:
>
>> On Fri, Jan 9, 2015 at 8:26 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>> On Fri, Jan 9, 2015 at 5:02 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>>> On Fri, Jan 9, 2015 at 4:03 PM, Adrian Prantl <aprantl at apple.com>
>>>> wrote:
>>>>
>>>>> On Jan 9, 2015, at 3:57 PM, Richard Smith <richard at metafoo.co.uk>
>>>>> wrote:
>>>>>
>>>>> On Tue, Jan 6, 2015 at 10:07 AM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> > On Dec 12, 2014, at 8:47 PM, Adrian Prantl <aprantl at apple.com>
>>>>>> wrote:
>>>>>> >
>>>>>> >
>>>>>> >> On Dec 12, 2014, at 5:37 PM, Argyrios Kyrtzidis <
>>>>>> kyrtzidis at apple.com> wrote:
>>>>>> >>
>>>>>> >>
>>>>>> >>> On Dec 12, 2014, at 4:33 PM, Eric Christopher <echristo at gmail.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Debug info for types isn't inherently a code generation concept.
>>>>>> If you think about it, debug info for types is a stable (if lossy)
>>>>>> serialization method for a module file. The line number etc for when
>>>>>> there's code generated is a separate issue.
>>>>>> >>
>>>>>> >> I see what you mean, but it is a traditionally codegen product
>>>>>> with a particular use-case, and it’s not reasonable to force it on every
>>>>>> clang client that only wants to parse code, like libclang, static
>>>>>> analyzers, migrators, refactoring tools, etc., or builds that didn’t ask
>>>>>> for it.
>>>>>> >
>>>>>> > Good point, I tend to forget about non-compiler users of clang
>>>>>> modules.
>>>>>> >
>>>>>> > If we do decide that having clang modules without debug info is
>>>>>> desirable, and we want debug info to be generated lazily (only when needed)
>>>>>> then putting it into a separate file is preferable, because it then can be
>>>>>> captured as a dependency by build systems.
>>>>>> >
>>>>>> > It looks like at this point everyone’s argument is really depending
>>>>>> on an assumption that emitting debug info is expensive (or really cheap!,
>>>>>> respectively), so my suggestion is to revisit this thread once I actually
>>>>>> have some numbers on how long it takes to emit debug info and how much
>>>>>> space it takes up. I’ll try to get that done soon.
>>>>>>
>>>>>> Hi Argyrios,
>>>>>>
>>>>>> back from the break, here are the promised numbers to make our
>>>>>> decision easier:
>>>>>>
>>>>>> I did an experiment where I patched clang to emit debug type info for
>>>>>> each type (patch attached for the curious), and compiled an empty program
>>>>>> that imports the Cocoa.h header. To compare the sizes I emitted the DWARF
>>>>>> to a separate file:
>>>>>>
>>>>>> -rw-r--r--  1 adrian  staff  2151068 Dec 19 16:30
>>>>>> Foundation-3QM1BFEPXW18W.pcm
>>>>>> -rw-r--r--  1 adrian  staff   110772 Dec 19 16:30
>>>>>> Foundation-3QM1BFEPXW18W.pcm.o
>>>>>>
>>>>>> here’s AppKit:
>>>>>>
>>>>>> -rw-r--r--  1 adrian  staff  3302744 Dec 19 16:40
>>>>>> AppKit-5HXLHEH4UB4M.pcm
>>>>>> -rw-r--r--  1 adrian  staff   279080 Dec 19 16:40
>>>>>> AppKit-5HXLHEH4UB4M.pcm.o
>>>>>>
>>>>>> The median of the size of the DWARF compared to the size of the pcm
>>>>>> over all the modules pulled in by Cocoa.h is 5%; i.e., the DWARF would take
>>>>>> up roughly 5% of the size of the individual modules.
>>>>>>
>>>>>> From these numbers I would argue that DWARF emission is comparatively
>>>>>> cheap. To keep the implementation simple, I’d prefer to have everything in
>>>>>> one file; this way we won’t have to introduce another layer of locking for
>>>>>> creating the pcm.o files lazily, but if someone wants to point out that
>>>>>> this is a lame excuse, be my guest ;-)
>>>>>> [Another reason to argue for separate .pcm.o files is if we ever want
>>>>>> to put something target-specific in there, such as code. Currently this is
>>>>>> not the case,
>>>>>
>>>>>
>>>>> I certainly have plans to do this, as mentioned previously on this
>>>>> thread.
>>>>>
>>>>>
>>>>>> and even if we did this, we would still benefit from having the DWARF
>>>>>> type information shared between the several .pcm.o files]
>>>>>>
>>>>>
>>>>> Is there any disadvantage to having the debug information for a module
>>>>> split over two .o files (one for the types and another for the inline
>>>>> functions / template instantiations)?
>>>>>
>>>>>
>>>>> I think that having it split is actually an advantage. By split I mean
>>>>> having the .pcm which contains AST and the DWARF for the types ands then
>>>>> several .pcm.o’s for each target that contains e.g., IR for inline
>>>>> functions+debug info and the debug info in the various targets refers to
>>>>> the shared DWARF type info in the .pcm. As far as the debug info is
>>>>> concerned, we would use the same mechanisms for the .pcm.o files as we
>>>>> would for any other object that imports the module.
>>>>>
>>>>
>>>> OK, I'm fine with that (though in our case I think we'll want to turn
>>>> this feature off and put all the DWARF output into the same file as the
>>>> inline functions etc). Do you have a plan for supporting debug fission with
>>>> this mode?
>>>>
>>>
>>> The way I was thinking is that this is, in some sense, fission already.
>>>
>>> We would put a simple module skeleton compile unit that represents the
>>> module in each object file compiled using that module - comdat it so it's
>>> dedup'd by the linker, and that would reference the pcm.o file just like we
>>> reference .dwo files today - and in there we'd have all the usual
>>> debug_types.dwo, etc.
>>>
>>> So this /is/ fission.
>>>
>>> If we wanted to split the debug info out from the module, I don't think
>>> this would really change - we'd just point at that other file instead.
>>>
>>> (& when we eventually have inline functions and their debug info in the
>>> module, we could drop the comdat and just put the skeleton CU in that
>>> object file to be linked in directly (and to contain the debug info for
>>> those inline functions, etc))
>>>
>>> Does that sound reasonable/make sense - I can flesh out some of the
>>> DWARF terminology I've used if it's unclear.
>>>
>>
>> This is the answer I was hoping for / expecting, I just wanted to make
>> sure that this had been considered. To my mind, this means that it's
>> neither relevant nor necessary that the .pcm file is an ELF / MachO / COFF
>> / etc. object file, all that matters is that it's a file that DWARF readers
>> are able to read DWARF from (and a format that we can read Clang's PCM
>> information from). Does that give us any additional flexibility regarding
>> the format?
>>
>
> My guess would be that this doesn't give us any additional flexibility
> today - I think GDB is the only implementation of Fission today and, while
> I don't know for sure, I don't have any reason to believe it can handle
> .dwo files in any format other than ELF (or perhaps generalized to any
> object file GDB can cope with on each platform it supports).
>
>
>> One other change that I would like to be made with this one: fix
>> llvm-bcanalyzer so that it can read whatever file format we end up using
>> for .pcm files. We get several fringe benefits such as this from using
>> bitcode, and it would be unfortunate to lose them.
>>
>
> Would it be sufficient to teach llvm-readelf or something to have options
> (if it doesn't have them already) to dump a specific section to stdout and
> you'd just pipe that to bcanalyzer?
>

That seems reasonable to me.

Adrian: have you looked at the file size increase for an empty module from
>> adding this wrapper format and skeleton/empty DWARF information? That'd be
>> an interesting data point (mostly just to assuage my concern here -- some
>> builds will have thousands of these files loaded, and a few dozen KiB per
>> PCM file adds up to a lot of address space).
>>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150112/017ba891/attachment.html>


More information about the cfe-dev mailing list