[PATCH] Have clang list the imported modules in the debug info

Wed Mar 18 16:53:05 PDT 2015

> On Mar 18, 2015, at 4:41 PM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> 
> 
> On Wed, Mar 18, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
> 
>> On Mar 18, 2015, at 4:02 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>> 
>> 
>> 
>> On Wed, Mar 18, 2015 at 3:50 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>> 
>>> On Mar 17, 2015, at 6:44 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On Tue, Mar 17, 2015 at 3:47 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>> 
>>> > On Mar 17, 2015, at 10:03 AM, Greg Clayton <gclayton at apple.com <mailto:gclayton at apple.com>> wrote:
>>> >
>>> >
>>> >> On Mar 17, 2015, at 9:46 AM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Mar 17, 2015 at 9:42 AM, Greg Clayton <gclayton at apple.com <mailto:gclayton at apple.com>> wrote:
>>> >>
>>> >>> On Mar 16, 2015, at 6:47 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
>>> >>>
>>> >>> Thanks for the explanation David, I missed that it is entirely the linker's (or some dwarf post-processor's) responsibility to find the module files and link in the debug info from the .pcm files, so debugger doesn’t notice a difference.
>>> >>>
>>> >>> I think there's still some confusion here. Sorry if I'm rehashing something, but I'll try to explain how this all works.
>>> >>>
>>> >>> Normal split DWARF:
>>> >>>
>>> >>> Compiler generates two files: .o and .dwo.
>>> >>> .dwo has static, non-relocatable debug info.
>>> >>> .o has a skeleton compile_unit that has the name of the .dwo file and a hash to verify that the .dwo file isn't stale when the debugger reads it.
>>> >>> The .o files are all linked together, the .dwo files stay where they are.
>>> >>> The debugger reads the linked executable, finds the skeleton compile_units contained therein, and find/loads the .dwo files
>>> >>>
>>> >>> The scenario I have in mind for module debug info is this:
>>> >>> Module is compiled as an object file with debug info (this file is actually a .dwo file, even if it has some other extension - it has the non-relocatable debug info in it)
>>> >>> .o file has a comdat'd skeleton compile_unit describing the .dwo/module file
>>> >>> <from here on no extra work is required, the linker and debugger just act as normal>
>>> >>> The .o files are linked together, the skeleton compile_units get deduplicated by the linker (comdat sections)
>>> >>
>>> >> One issue I can think of is we will need to figure out a way to make COMDAT work with mach-o. COMDAT requires large number of sections and mach-o can only have 255.
>>> >>
>>> >> Ah, fair enough - how does MachO handle inline functions (the most common use of comdat) currently, then?
>>> >
>>> > Currently mach-o relies on symbols in the symbol table being marked as weak and I believe the data for these symbols are in special sections that are marked as containing items that can be coalesced.
>>> >
>>> That’s not necessarily an issue that needs to be solved on Darwin, or am I maybe missing something? The linker leaves all debug info in the .o (as it currently does) and llvm-dsymutil is resolving all the external module type references while creating the .dSYM bundle.
>>> 
>>> Yeah, with a debug aware linker (or in the case of dsymutil, a debug-only linker) you would just know that since you're looking at object files, module references will be redundant across objects and should be deduplicated (by the dwo hash, most likely).
>>> 
>>> If you're not teaching your debugger to read modules, and want to link the debug info in from the .dwos - at that point you can probably drop the skeleton stuff entirely (you'd still need to teach your debugger about .dwo sections and some of the esoteric things there - like str_index and the extra/special line table just for file names (decl_file, etc, uses this)) and just put the contents of the module debug info straight in the dsym. It'd be a bit weird, but do-able without too much work, I'd imagine. You could move them back into the original sections, if you wanted to avoid the weird .dwo +non-.dwo sections together... *shrug* not sure what exactly you'd want there.
>> 
>> My plan was to have -gmodules to behave like the latter variant unless -gsplit-dwarf is also present; this way there wouldn't be any weird Darwin-specific code paths.
>> 
>> Not sure I quite follow (mostly my fault given the rambling paragraph up there) - given the lack of a dsymutil-like tool on other platforms as part of the common tool path for debug info, I'm not sure module debug info without split dwarf is viable in that world. There's no tool to read these extra files at any point.
> 
> In theory someone could port llvm-dsymutil to a different platform, but that scenario is a little far-fetched. I’m not sure what will happen if LLDB is presented with linked, non-split debug info that contains module references.
> 
> Linked non-split debug info should come out for free - all the debug info would be is a bunch of TUs in a single comdat - no skeleton CU, nothing else. It would look just like normal DWARF, except with one comdat instead of multiple, for each set of types from a module. (& there would be no real size gains - since you'd be redundantly including all the type information in every object file)
>  
> 
>> 
>> I suppose we could be creating one giant comdat for the module's debug info (no skeleton unit, no distinct type unit comdats, just one big comdat). But we'd probably want/need a tool to do the merging at compile time (like the objcopy feature for split-dwarf, but in reverse - we'd compile, then run a tool to smoosh all the comdats from the modules onto the object we just generated). It wouldn't provide much in the way of space savings, a little less stress on the linker (fewer comdats to handle), etc. Not sure if there's a default mode of objcopy that would cope with this straight out, or whether we'd need a new feature there (which wouldn't be a priority for Google to implement, since we use fission, nor a priority for you to implement since you have dsymutil, etc - so I'm not sure anyone would bother)
>> 
>> Long story short: maybe just error on -gmodules if -gsplit-dwarf isn't specified or the platform isn't darwin? (& if it's darwin, dsymutil could read the module skeletons to find which modules to link into the .dSYM?)
> 
> That’s reasonable, too :-)
> The plan is for llvm-dsymutil to follow the references in the module skeletons, copy the module CUs
> 
> TUs for now
>  
> into the .dSYM, and fixup the external type references to become DW_FORM_ref_addrs.
> 
> Sounds good for you guys - the fixup work will be a bit non-trivial, since it'll need to remove the type skeletons in the CUs, move all the extra members from the skeletons into the type unit (& resolve any duplicates), etc... - does that make sense? (otherwise I can provide some DWARF snippets to explain better)

Or we use a weird Darwin-specific code path to not emit the modules with -generate-type-units in the first place (bag of DWARF+index mapping hash to DIE), which would make dsymutil's job really easy. As much as I’d like to get rid of platform-specific behavior, due to the automatic way that modules are generated on Darwin I don’t see an elegant way of making this switchable by the user.

-- adrian
>  
> 
> -- adrian
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150318/d451caa9/attachment.html>