[PATCH] Have clang list the imported modules in the debug info

Wed Mar 18 16:41:10 PDT 2015

On Wed, Mar 18, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Mar 18, 2015, at 4:02 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, Mar 18, 2015 at 3:50 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Mar 17, 2015, at 6:44 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Tue, Mar 17, 2015 at 3:47 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> > On Mar 17, 2015, at 10:03 AM, Greg Clayton <gclayton at apple.com> wrote:
>>> >
>>> >
>>> >> On Mar 17, 2015, at 9:46 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Mar 17, 2015 at 9:42 AM, Greg Clayton <gclayton at apple.com>
>>> wrote:
>>> >>
>>> >>> On Mar 16, 2015, at 6:47 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>> >>>
>>> >>> Thanks for the explanation David, I missed that it is entirely the
>>> linker's (or some dwarf post-processor's) responsibility to find the module
>>> files and link in the debug info from the .pcm files, so debugger doesn’t
>>> notice a difference.
>>> >>>
>>> >>> I think there's still some confusion here. Sorry if I'm rehashing
>>> something, but I'll try to explain how this all works.
>>> >>>
>>> >>> Normal split DWARF:
>>> >>>
>>> >>> Compiler generates two files: .o and .dwo.
>>> >>> .dwo has static, non-relocatable debug info.
>>> >>> .o has a skeleton compile_unit that has the name of the .dwo file
>>> and a hash to verify that the .dwo file isn't stale when the debugger reads
>>> it.
>>> >>> The .o files are all linked together, the .dwo files stay where they
>>> are.
>>> >>> The debugger reads the linked executable, finds the skeleton
>>> compile_units contained therein, and find/loads the .dwo files
>>> >>>
>>> >>> The scenario I have in mind for module debug info is this:
>>> >>> Module is compiled as an object file with debug info (this file is
>>> actually a .dwo file, even if it has some other extension - it has the
>>> non-relocatable debug info in it)
>>> >>> .o file has a comdat'd skeleton compile_unit describing the
>>> .dwo/module file
>>> >>> <from here on no extra work is required, the linker and debugger
>>> just act as normal>
>>> >>> The .o files are linked together, the skeleton compile_units get
>>> deduplicated by the linker (comdat sections)
>>> >>
>>> >> One issue I can think of is we will need to figure out a way to make
>>> COMDAT work with mach-o. COMDAT requires large number of sections and
>>> mach-o can only have 255.
>>> >>
>>> >> Ah, fair enough - how does MachO handle inline functions (the most
>>> common use of comdat) currently, then?
>>> >
>>> > Currently mach-o relies on symbols in the symbol table being marked as
>>> weak and I believe the data for these symbols are in special sections that
>>> are marked as containing items that can be coalesced.
>>> >
>>> That’s not necessarily an issue that needs to be solved on Darwin, or am
>>> I maybe missing something? The linker leaves all debug info in the .o (as
>>> it currently does) and llvm-dsymutil is resolving all the external module
>>> type references while creating the .dSYM bundle.
>>>
>>
>> Yeah, with a debug aware linker (or in the case of dsymutil, a debug-only
>> linker) you would just know that since you're looking at object files,
>> module references will be redundant across objects and should be
>> deduplicated (by the dwo hash, most likely).
>>
>> If you're not teaching your debugger to read modules, and want to link
>> the debug info in from the .dwos - at that point you can probably drop the
>> skeleton stuff entirely (you'd still need to teach your debugger about .dwo
>> sections and some of the esoteric things there - like str_index and the
>> extra/special line table just for file names (decl_file, etc, uses this))
>> and just put the contents of the module debug info straight in the dsym.
>> It'd be a bit weird, but do-able without too much work, I'd imagine. You
>> could move them back into the original sections, if you wanted to avoid the
>> weird .dwo +non-.dwo sections together... *shrug* not sure what exactly
>> you'd want there.
>>
>>
>> My plan was to have -gmodules to behave like the latter variant
>> unless -gsplit-dwarf is also present; this way there wouldn't be any weird
>> Darwin-specific code paths.
>>
>
> Not sure I quite follow (mostly my fault given the rambling paragraph up
> there) - given the lack of a dsymutil-like tool on other platforms as part
> of the common tool path for debug info, I'm not sure module debug info
> without split dwarf is viable in that world. There's no tool to read these
> extra files at any point.
>
>
> In theory someone could port llvm-dsymutil to a different platform, but
> that scenario is a little far-fetched. I’m not sure what will happen if
> LLDB is presented with linked, non-split debug info that contains module
> references.
>

Linked non-split debug info should come out for free - all the debug info
would be is a bunch of TUs in a single comdat - no skeleton CU, nothing
else. It would look just like normal DWARF, except with one comdat instead
of multiple, for each set of types from a module. (& there would be no real
size gains - since you'd be redundantly including all the type information
in every object file)

>
>
> I suppose we could be creating one giant comdat for the module's debug
> info (no skeleton unit, no distinct type unit comdats, just one big
> comdat). But we'd probably want/need a tool to do the merging at compile
> time (like the objcopy feature for split-dwarf, but in reverse - we'd
> compile, then run a tool to smoosh all the comdats from the modules onto
> the object we just generated). It wouldn't provide much in the way of space
> savings, a little less stress on the linker (fewer comdats to handle), etc.
> Not sure if there's a default mode of objcopy that would cope with this
> straight out, or whether we'd need a new feature there (which wouldn't be a
> priority for Google to implement, since we use fission, nor a priority for
> you to implement since you have dsymutil, etc - so I'm not sure anyone
> would bother)
>
> Long story short: maybe just error on -gmodules if -gsplit-dwarf isn't
> specified or the platform isn't darwin? (& if it's darwin, dsymutil could
> read the module skeletons to find which modules to link into the .dSYM?)
>
>
> That’s reasonable, too :-)
> The plan is for llvm-dsymutil to follow the references in the module
> skeletons, copy the module CUs
>

TUs for now

> into the .dSYM, and fixup the external type references to become
> DW_FORM_ref_addrs.
>

Sounds good for you guys - the fixup work will be a bit non-trivial, since
it'll need to remove the type skeletons in the CUs, move all the extra
members from the skeletons into the type unit (& resolve any duplicates),
etc... - does that make sense? (otherwise I can provide some DWARF snippets
to explain better)

>
> -- adrian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150318/74da703c/attachment.html>