[PATCH] Have clang list the imported modules in the debug info

Wed Mar 18 18:51:32 PDT 2015

On Wed, Mar 18, 2015 at 5:21 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Mar 18, 2015, at 5:03 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, Mar 18, 2015 at 4:53 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Mar 18, 2015, at 4:41 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Wed, Mar 18, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> On Mar 18, 2015, at 4:02 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, Mar 18, 2015 at 3:50 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Mar 17, 2015, at 6:44 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Mar 17, 2015 at 3:47 PM, Adrian Prantl <aprantl at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> > On Mar 17, 2015, at 10:03 AM, Greg Clayton <gclayton at apple.com>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >> On Mar 17, 2015, at 9:46 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Mar 17, 2015 at 9:42 AM, Greg Clayton <gclayton at apple.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> On Mar 16, 2015, at 6:47 PM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Thanks for the explanation David, I missed that it is entirely the
>>>>> linker's (or some dwarf post-processor's) responsibility to find the module
>>>>> files and link in the debug info from the .pcm files, so debugger doesn’t
>>>>> notice a difference.
>>>>> >>>
>>>>> >>> I think there's still some confusion here. Sorry if I'm rehashing
>>>>> something, but I'll try to explain how this all works.
>>>>> >>>
>>>>> >>> Normal split DWARF:
>>>>> >>>
>>>>> >>> Compiler generates two files: .o and .dwo.
>>>>> >>> .dwo has static, non-relocatable debug info.
>>>>> >>> .o has a skeleton compile_unit that has the name of the .dwo file
>>>>> and a hash to verify that the .dwo file isn't stale when the debugger reads
>>>>> it.
>>>>> >>> The .o files are all linked together, the .dwo files stay where
>>>>> they are.
>>>>> >>> The debugger reads the linked executable, finds the skeleton
>>>>> compile_units contained therein, and find/loads the .dwo files
>>>>> >>>
>>>>> >>> The scenario I have in mind for module debug info is this:
>>>>> >>> Module is compiled as an object file with debug info (this file is
>>>>> actually a .dwo file, even if it has some other extension - it has the
>>>>> non-relocatable debug info in it)
>>>>> >>> .o file has a comdat'd skeleton compile_unit describing the
>>>>> .dwo/module file
>>>>> >>> <from here on no extra work is required, the linker and debugger
>>>>> just act as normal>
>>>>> >>> The .o files are linked together, the skeleton compile_units get
>>>>> deduplicated by the linker (comdat sections)
>>>>> >>
>>>>> >> One issue I can think of is we will need to figure out a way to
>>>>> make COMDAT work with mach-o. COMDAT requires large number of sections and
>>>>> mach-o can only have 255.
>>>>> >>
>>>>> >> Ah, fair enough - how does MachO handle inline functions (the most
>>>>> common use of comdat) currently, then?
>>>>> >
>>>>> > Currently mach-o relies on symbols in the symbol table being marked
>>>>> as weak and I believe the data for these symbols are in special sections
>>>>> that are marked as containing items that can be coalesced.
>>>>> >
>>>>> That’s not necessarily an issue that needs to be solved on Darwin, or
>>>>> am I maybe missing something? The linker leaves all debug info in the .o
>>>>> (as it currently does) and llvm-dsymutil is resolving all the external
>>>>> module type references while creating the .dSYM bundle.
>>>>>
>>>>
>>>> Yeah, with a debug aware linker (or in the case of dsymutil, a
>>>> debug-only linker) you would just know that since you're looking at object
>>>> files, module references will be redundant across objects and should be
>>>> deduplicated (by the dwo hash, most likely).
>>>>
>>>> If you're not teaching your debugger to read modules, and want to link
>>>> the debug info in from the .dwos - at that point you can probably drop the
>>>> skeleton stuff entirely (you'd still need to teach your debugger about .dwo
>>>> sections and some of the esoteric things there - like str_index and the
>>>> extra/special line table just for file names (decl_file, etc, uses this))
>>>> and just put the contents of the module debug info straight in the dsym.
>>>> It'd be a bit weird, but do-able without too much work, I'd imagine. You
>>>> could move them back into the original sections, if you wanted to avoid the
>>>> weird .dwo +non-.dwo sections together... *shrug* not sure what exactly
>>>> you'd want there.
>>>>
>>>>
>>>> My plan was to have -gmodules to behave like the latter variant
>>>> unless -gsplit-dwarf is also present; this way there wouldn't be any weird
>>>> Darwin-specific code paths.
>>>>
>>>
>>> Not sure I quite follow (mostly my fault given the rambling paragraph up
>>> there) - given the lack of a dsymutil-like tool on other platforms as part
>>> of the common tool path for debug info, I'm not sure module debug info
>>> without split dwarf is viable in that world. There's no tool to read these
>>> extra files at any point.
>>>
>>>
>>> In theory someone could port llvm-dsymutil to a different platform, but
>>> that scenario is a little far-fetched. I’m not sure what will happen if
>>> LLDB is presented with linked, non-split debug info that contains module
>>> references.
>>>
>>
>> Linked non-split debug info should come out for free - all the debug info
>> would be is a bunch of TUs in a single comdat - no skeleton CU, nothing
>> else. It would look just like normal DWARF, except with one comdat instead
>> of multiple, for each set of types from a module. (& there would be no real
>> size gains - since you'd be redundantly including all the type information
>> in every object file)
>>
>>
>>>
>>>
>>> I suppose we could be creating one giant comdat for the module's debug
>>> info (no skeleton unit, no distinct type unit comdats, just one big
>>> comdat). But we'd probably want/need a tool to do the merging at compile
>>> time (like the objcopy feature for split-dwarf, but in reverse - we'd
>>> compile, then run a tool to smoosh all the comdats from the modules onto
>>> the object we just generated). It wouldn't provide much in the way of space
>>> savings, a little less stress on the linker (fewer comdats to handle), etc.
>>> Not sure if there's a default mode of objcopy that would cope with this
>>> straight out, or whether we'd need a new feature there (which wouldn't be a
>>> priority for Google to implement, since we use fission, nor a priority for
>>> you to implement since you have dsymutil, etc - so I'm not sure anyone
>>> would bother)
>>>
>>> Long story short: maybe just error on -gmodules if -gsplit-dwarf isn't
>>> specified or the platform isn't darwin? (& if it's darwin, dsymutil could
>>> read the module skeletons to find which modules to link into the .dSYM?)
>>>
>>>
>>> That’s reasonable, too :-)
>>> The plan is for llvm-dsymutil to follow the references in the module
>>> skeletons, copy the module CUs
>>>
>>
>> TUs for now
>>
>>
>>> into the .dSYM, and fixup the external type references to become
>>> DW_FORM_ref_addrs.
>>>
>>
>> Sounds good for you guys - the fixup work will be a bit non-trivial,
>> since it'll need to remove the type skeletons in the CUs, move all the
>> extra members from the skeletons into the type unit (& resolve any
>> duplicates), etc... - does that make sense? (otherwise I can provide some
>> DWARF snippets to explain better)
>>
>>
>> Or we use a weird Darwin-specific code path to not emit the modules with
>> -generate-type-units in the first place (bag of DWARF+index mapping hash to
>> DIE),
>>
>
> bag-o-dwarf still doesn't address all the issues with type member merging
> I described above. Certain things can't go in the type in the module
> because they depend on context - most importantly/obviously, implicit
> special members and member function template instatiations.
>
>
> I suppose you could still have type references reference the type in the
> bag-o-dwarf/type unit directly (DW_AT_type with DW_FORM_ref_sig8) while
> having the partial type (the type declaration with its extra CU-specific
> members) which would simplify the dwarf in the easy cases.
>
>
> Yes, something along these lines would make a good first iteration.
>

Given that you'd need to support the partial type anyway - this might be a
good second iteration. The first can use the current type unit stuff, and
keep the dsymutil support even simpler - it shouldn't need to fixup any
type references, since they're all already CU-local (they reference the
stub type declaration/skeleton, and it references the type unit via hash,
not via relocation or anything).

That's all free/how it works today & mostly code you're going to have to
cope with anyway (the debugger's going to need to be able to find all the
stubs and collect all the extra members from them to present the full story
anyway)

>
>
>> which would make dsymutil's job really easy. As much as I’d like to get
>> rid of platform-specific behavior, due to the automatic way that modules
>> are generated on Darwin I don’t see an elegant way of making this
>> switchable by the user.
>>
>
> Not sure I quite follow here how implicit modules impact this
> functionality. We can still have a flag that you pass to the compiler that
> dictates how debug info in modules is created/what schema we use.
>
>
> The problem is the combination of implicit generation and a global module
> cache. I guess we could treat a module with the wrong kind of debug info as
> out of date, but I’m not excited.
>
> -- adrian
>
>
> - David
>
>
>>
>> -- adrian
>>
>>
>>
>>>
>>> -- adrian
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150318/a02bd0c9/attachment.html>