[PATCH] Have clang list the imported modules in the debug info

Wed Mar 18 17:31:57 PDT 2015

On Wed, Mar 18, 2015 at 5:21 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Mar 18, 2015, at 5:03 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, Mar 18, 2015 at 4:53 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Mar 18, 2015, at 4:41 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Wed, Mar 18, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> On Mar 18, 2015, at 4:02 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, Mar 18, 2015 at 3:50 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Mar 17, 2015, at 6:44 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Mar 17, 2015 at 3:47 PM, Adrian Prantl <aprantl at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> > On Mar 17, 2015, at 10:03 AM, Greg Clayton <gclayton at apple.com>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >> On Mar 17, 2015, at 9:46 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Mar 17, 2015 at 9:42 AM, Greg Clayton <gclayton at apple.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> On Mar 16, 2015, at 6:47 PM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Thanks for the explanation David, I missed that it is entirely the
>>>>> linker's (or some dwarf post-processor's) responsibility to find the module
>>>>> files and link in the debug info from the .pcm files, so debugger doesn’t
>>>>> notice a difference.
>>>>> >>>
>>>>> >>> I think there's still some confusion here. Sorry if I'm rehashing
>>>>> something, but I'll try to explain how this all works.
>>>>> >>>
>>>>> >>> Normal split DWARF:
>>>>> >>>
>>>>> >>> Compiler generates two files: .o and .dwo.
>>>>> >>> .dwo has static, non-relocatable debug info.
>>>>> >>> .o has a skeleton compile_unit that has the name of the .dwo file
>>>>> and a hash to verify that the .dwo file isn't stale when the debugger reads
>>>>> it.
>>>>> >>> The .o files are all linked together, the .dwo files stay where
>>>>> they are.
>>>>> >>> The debugger reads the linked executable, finds the skeleton
>>>>> compile_units contained therein, and find/loads the .dwo files
>>>>> >>>
>>>>> >>> The scenario I have in mind for module debug info is this:
>>>>> >>> Module is compiled as an object file with debug info (this file is
>>>>> actually a .dwo file, even if it has some other extension - it has the
>>>>> non-relocatable debug info in it)
>>>>> >>> .o file has a comdat'd skeleton compile_unit describing the
>>>>> .dwo/module file
>>>>> >>> <from here on no extra work is required, the linker and debugger
>>>>> just act as normal>
>>>>> >>> The .o files are linked together, the skeleton compile_units get
>>>>> deduplicated by the linker (comdat sections)
>>>>> >>
>>>>> >> One issue I can think of is we will need to figure out a way to
>>>>> make COMDAT work with mach-o. COMDAT requires large number of sections and
>>>>> mach-o can only have 255.
>>>>> >>
>>>>> >> Ah, fair enough - how does MachO handle inline functions (the most
>>>>> common use of comdat) currently, then?
>>>>> >
>>>>> > Currently mach-o relies on symbols in the symbol table being marked
>>>>> as weak and I believe the data for these symbols are in special sections
>>>>> that are marked as containing items that can be coalesced.
>>>>> >
>>>>> That’s not necessarily an issue that needs to be solved on Darwin, or
>>>>> am I maybe missing something? The linker leaves all debug info in the .o
>>>>> (as it currently does) and llvm-dsymutil is resolving all the external
>>>>> module type references while creating the .dSYM bundle.
>>>>>
>>>>
>>>> Yeah, with a debug aware linker (or in the case of dsymutil, a
>>>> debug-only linker) you would just know that since you're looking at object
>>>> files, module references will be redundant across objects and should be
>>>> deduplicated (by the dwo hash, most likely).
>>>>
>>>> If you're not teaching your debugger to read modules, and want to link
>>>> the debug info in from the .dwos - at that point you can probably drop the
>>>> skeleton stuff entirely (you'd still need to teach your debugger about .dwo
>>>> sections and some of the esoteric things there - like str_index and the
>>>> extra/special line table just for file names (decl_file, etc, uses this))
>>>> and just put the contents of the module debug info straight in the dsym.
>>>> It'd be a bit weird, but do-able without too much work, I'd imagine. You
>>>> could move them back into the original sections, if you wanted to avoid the
>>>> weird .dwo +non-.dwo sections together... *shrug* not sure what exactly
>>>> you'd want there.
>>>>
>>>>
>>>> My plan was to have -gmodules to behave like the latter variant
>>>> unless -gsplit-dwarf is also present; this way there wouldn't be any weird
>>>> Darwin-specific code paths.
>>>>
>>>
>>> Not sure I quite follow (mostly my fault given the rambling paragraph up
>>> there) - given the lack of a dsymutil-like tool on other platforms as part
>>> of the common tool path for debug info, I'm not sure module debug info
>>> without split dwarf is viable in that world. There's no tool to read these
>>> extra files at any point.
>>>
>>>
>>> In theory someone could port llvm-dsymutil to a different platform, but
>>> that scenario is a little far-fetched. I’m not sure what will happen if
>>> LLDB is presented with linked, non-split debug info that contains module
>>> references.
>>>
>>
>> Linked non-split debug info should come out for free - all the debug info
>> would be is a bunch of TUs in a single comdat - no skeleton CU, nothing
>> else. It would look just like normal DWARF, except with one comdat instead
>> of multiple, for each set of types from a module. (& there would be no real
>> size gains - since you'd be redundantly including all the type information
>> in every object file)
>>
>>
>>>
>>>
>>> I suppose we could be creating one giant comdat for the module's debug
>>> info (no skeleton unit, no distinct type unit comdats, just one big
>>> comdat). But we'd probably want/need a tool to do the merging at compile
>>> time (like the objcopy feature for split-dwarf, but in reverse - we'd
>>> compile, then run a tool to smoosh all the comdats from the modules onto
>>> the object we just generated). It wouldn't provide much in the way of space
>>> savings, a little less stress on the linker (fewer comdats to handle), etc.
>>> Not sure if there's a default mode of objcopy that would cope with this
>>> straight out, or whether we'd need a new feature there (which wouldn't be a
>>> priority for Google to implement, since we use fission, nor a priority for
>>> you to implement since you have dsymutil, etc - so I'm not sure anyone
>>> would bother)
>>>
>>> Long story short: maybe just error on -gmodules if -gsplit-dwarf isn't
>>> specified or the platform isn't darwin? (& if it's darwin, dsymutil could
>>> read the module skeletons to find which modules to link into the .dSYM?)
>>>
>>>
>>> That’s reasonable, too :-)
>>> The plan is for llvm-dsymutil to follow the references in the module
>>> skeletons, copy the module CUs
>>>
>>
>> TUs for now
>>
>>
>>> into the .dSYM, and fixup the external type references to become
>>> DW_FORM_ref_addrs.
>>>
>>
>> Sounds good for you guys - the fixup work will be a bit non-trivial,
>> since it'll need to remove the type skeletons in the CUs, move all the
>> extra members from the skeletons into the type unit (& resolve any
>> duplicates), etc... - does that make sense? (otherwise I can provide some
>> DWARF snippets to explain better)
>>
>>
>> Or we use a weird Darwin-specific code path to not emit the modules with
>> -generate-type-units in the first place (bag of DWARF+index mapping hash to
>> DIE),
>>
>
> bag-o-dwarf still doesn't address all the issues with type member merging
> I described above. Certain things can't go in the type in the module
> because they depend on context - most importantly/obviously, implicit
> special members and member function template instatiations.
>
>
> I suppose you could still have type references reference the type in the
> bag-o-dwarf/type unit directly (DW_AT_type with DW_FORM_ref_sig8) while
> having the partial type (the type declaration with its extra CU-specific
> members) which would simplify the dwarf in the easy cases.
>
>
> Yes, something along these lines would make a good first iteration.
>
>
>
>> which would make dsymutil's job really easy. As much as I’d like to get
>> rid of platform-specific behavior, due to the automatic way that modules
>> are generated on Darwin I don’t see an elegant way of making this
>> switchable by the user.
>>
>
> Not sure I quite follow here how implicit modules impact this
> functionality. We can still have a flag that you pass to the compiler that
> dictates how debug info in modules is created/what schema we use.
>
>
> The problem is the combination of implicit generation and a global module
> cache. I guess we could treat a module with the wrong kind of debug info as
> out of date, but I’m not excited.
>

I'm assuming the global module cache already has to factor in command line
arguments to the compiler (things as simple as configuration macros, for
example) - so this would be another property to the module cache key.

>
> -- adrian
>
>
> - David
>
>
>>
>> -- adrian
>>
>>
>>
>>>
>>> -- adrian
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150318/f3156475/attachment.html>