[PATCH] Have clang list the imported modules in the debug info

Mon May 4 10:53:11 PDT 2015

On Fri, May 1, 2015 at 8:52 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On May 1, 2015, at 5:25 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, May 1, 2015 at 5:19 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On May 1, 2015, at 4:55 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Fri, May 1, 2015 at 4:39 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> > On May 1, 2015, at 10:01 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>> >>
>>> >>> On May 1, 2015, at 9:23 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>> >>>
>>> >>> > On Apr 30, 2015, at 4:55 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com>
>>> wrote:
>>> >>> >>
>>> >>> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <
>>> aprantl at apple.com> wrote:
>>> >>> >> >>
>>> >>> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <
>>> dblaikie at gmail.com> wrote:
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul <
>>> Paul_Robinson at playstation.sony.com> wrote:
>>> >>> >> >> > Beyond the above (that using a new tag would mean this would
>>> go from 'free' to 'not free' for GDB) having a new top level tag is pretty
>>> substantial (we only have two at the moment, and with our talk of modules
>>> being a "bag of dwarf" might go back to having one top level tag? (it's not
>>> clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag,
>>> I don't think it is?)
>>> >>> >> >> >
>>> >>> >> >> >> The .debug_info section contains one or more compilation
>>> units, partial units, or in DWARF 5, type units.  DW_TAG_module isn't a
>>> unit, if you want it to be handled independently then it would need to be
>>> wrapped in a DW_TAG_partial_unit.  You would probably then use
>>> DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module.
>>> >>> >> >> >>
>>> >>> >> >> >
>>> >>> >> >> > This makes a fair bit of sense - though the terminology's
>>> never going to quite line up with modules, I suspect, and this would still
>>> require modifying existing consumers (well, GDB) that can handle
>>> split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe
>>> that does work? - and still don't know how existing consumers would handle
>>> imported_unit either - could be worth some testing, as it sounds sort of
>>> right out of several less right options).
>>> >>> >> >>
>>> >>> >> >> Thanks for all the input so far!
>>> >>> >> >> To concretize this end of the discussion up let’s sketch some
>>> dwarf of how this could look like in practice.
>>> >>> >> >>
>>> >>> >> >> ELF (no imports)
>>> >>> >> >> ----------------
>>> >>> >> >>
>>> >>> >> >> On ELF or COFF a foo.c referencing types from the module
>>> Foundation looks like this:
>>> >>> >> >>
>>> >>> >> >> .debug_info:
>>> >>> >> >>   DW_TAG_compile_unit
>>> >>> >> >>     DW_AT_name(“foo.c”)
>>> >>> >> >>
>>> >>> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
>>> >>> >> >>   DW_TAG_partial_unit
>>> >>> >> >
>>> >>> >> > For now I'd suggest we use compile_unit - that way it'll just
>>> work with existing split-dwarf consumers. We can see about standardizing a
>>> top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps?
>>> I'm not sure.
>>> >>> >> >
>>> >>> >> >>
>>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> Side question: Is .debug_info.dwo the right section to put the
>>> module skeleton in, or should it be a .debug_info section like normal
>>> fission skeletons?
>>> >>> >> >
>>> >>> >> > Skeletons go in .debug_info, the dwo sections are just for the
>>> .dwo file (or the module file, in our new case - the extension isn't
>>> actually important).
>>> >>> >> >
>>> >>> >> > It might be worth you compiling an example or two of
>>> split-dwarf to see how this all works hands-on.
>>> >>> >> >
>>> >>> >> >> Mach-O (no comdat, no imports)
>>> >>> >> >> ------------------------------
>>> >>> >> >>
>>> >>> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not
>>> sure if that option is the best discriminator) this could look like:
>>> >>> >> >>
>>> >>> >> >> .debug_info:
>>> >>> >> >>   DW_TAG_compile_unit
>>> >>> >> >>     DW_AT_name(“foo.c”)
>>> >>> >> >>   DW_TAG_partial_unit
>>> >>> >> >>
>>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> Mach-O (no comdat, with imports)
>>> >>> >> >> ------------------------------
>>> >>> >> >>
>>> >>> >> >> If we add the module import information to this, we get:
>>> >>> >> >>
>>> >>> >> >> .debug_info:
>>> >>> >> >>   DW_TAG_compile_unit
>>> >>> >> >>     DW_AT_name(“foo.c”)
>>> >>> >> >>     DW_TAG_imported_module
>>> >>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x10)
>>> >>> >> >
>>> >>> >> > Since we got went down the tangent of explaining split-dwarf
>>> many emails ago, I've forgotten (& can't readily find) what we were
>>> discussing about what ways the imported_module could work.
>>> >>> >> >
>>> >>> >> > The simplest representation I can think of would be to have it
>>> reference, by signature, the module unit (whatever tag it uses) -
>>> DW_FORM_ref_sig8, seems the simplest thing to do.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >>   DW_TAG_partial_unit
>>> >>> >> >>
>>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>>> >>> >> >>
>>> >>> >> >> 0x10:
>>> >>> >> >
>>> >>> >> > This is inside the partial unit? I figured we'd just put these
>>> attributes on the top level (compile_unit, or whatever it might be later) -
>>> potentially conditionalized on platform, sure.
>>> >>> >> >
>>> >>> >> >>     DW_TAG_module
>>> >>> >> >>       DW_AT_name(“Foundation”)
>>> >>> >> >>       DW_AT_LLVM_sysroot(“/“)
>>> >>> >> >>       DW_AT_LLVM_include_dir(“”)
>>> >>> >> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>>> >>> >> >>       ...
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> ELF (comdat, with imports)
>>> >>> >> >> --------------------------
>>> >>> >> >>
>>> >>> >> >> But now let’s go back to ELF. Since the skeleton with the
>>> partial unit is comdat'd, I assume that this breaks the FORM_ref_addr used
>>> in the DW_AT_import. We could reuse the module hash as a signature for the
>>> module:
>>> >>> >> >>
>>> >>> >> >> .debug_info:
>>> >>> >> >>   DW_TAG_compile_unit
>>> >>> >> >>     DW_AT_name(“foo.c”)
>>> >>> >> >>     DW_TAG_imported_module
>>> >>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
>>> >>> >> >
>>> >>> >> > Still only really need these imported_modules for lldb, right?
>>> I'd consider having them off-by-default for non-darwin, but I'm not
>>> strictly wedded to that notion. Wouldn't mind seeing size impact numbers of
>>> some kind - if it's really fractional % increase & GDB doesn't fall over
>>> when it sees them (in whatever FORM/tag/etc we decide on) then that's not
>>> the end of the world.
>>> >>> >> >
>>> >>> >> > Just seems nice if the default mode is the nice, standard,
>>> split-dwarf output. Doesn't need anything fancy.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat)
>>> >>> >> >>   DW_TAG_partial_unit
>>> >>> >> >>
>>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>>> >>> >> >>
>>> >>> >> >>     DW_TAG_module
>>> >>> >> >>       DW_AT_signature(“0x1234ABCDE”)
>>> >>> >> >>       DW_AT_name(“Foundation”)
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > The thing you haven't covered is the actual .dwo sections
>>> (.debug_info.dwo (we'll probably need a simple stub compile_unit to make
>>> this correct split-dwarf) and .debug_types.dwo being important - but all
>>> the supporting .dwo sections will be necessary) that go in the module file.
>>> >>> >> >
>>> >>> >> >> This is bending the definition of DW_AT_signature, but I guess
>>> it could be made to work. Or we could say that for now, users have to
>>> choose between the comdat optimization and having the module imports
>>> recorded in Dwarf, since GDB wouldn’t know what to do with that information
>>> anyway.
>>> >>> >>
>>> >>> >> Sorry for the long delay. Here’s a more complete example that
>>> should include all the suggestions made so far. For context I also included
>>> external type references in the example although admittedly this is a bit
>>> out of scope for this thread:
>>> >>> >>
>>> >>> >> ELF (typeunits, comdats, with imports)
>>> >>> >> --------------------------------------
>>> >>> >>
>>> >>> >> On ELF or COFF a bar.c referencing type Foo from the module
>>> FooLib looks like this:
>>> >>> >>
>>> >>> >> bar.o
>>> >>> >> ~~~~~
>>> >>> >>
>>> >>> >> // To keep this example focussed/readable, I'm assuming that
>>> bar.o itself was not compiled with fission.
>>> >>> >> .debug_info:
>>> >>> >>   DW_TAG_compile_unit
>>> >>> >>     DW_AT_name(“bar.c”)
>>> >>> >>     ...
>>> >>> >>
>>> >>> >>     DW_TAG_imported_module // <- This could be optional on ELF.
>>> >>> >>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
>>> >>> >>
>>> >>> >>     DW_TAG_variable
>>> >>> >>       DW_AT_name(“MyFoo”)
>>> >>> >>       DW_AT_type [DW_FORM_ref4] 0x20
>>> >>> >> 0x20:
>>> >>> >>     DW_TAG_structure_type
>>> >>> >>       DW_AT_declaration (true)
>>> >>> >>       DW_AT_signature [DW_FORM_ref_sig8] (0xF00)
>>> >>> >>
>>> >>> >>
>>> >>> >> // Split DWARF skeleton CU for the module Foo.
>>> >>> >>   DW_TAG_compile_unit
>>> >>> >>
>>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>>> >>> >>     ...
>>> >>> >>
>>> >>> >> // Comdat’d partial unit containing the optional module
>>> descriptor.
>>> >>> >> .debug_info, group 0xABCD1234, comdat
>>> >>> >>   DW_TAG_partial_unit
>>> >>> >>     DW_TAG_module
>>> >>> >>       DW_AT_name(“FooLib”)
>>> >>> >>       DW_AT_LLVM_sysroot(“/“)
>>> >>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
>>> >>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>>> >>> >>       ...
>>> >>> >>
>>> >>> >> FooLib-XYZ.pcm
>>> >>> >> ~~~~~~~~~~~~~~
>>> >>> >>
>>> >>> >> .debug_info.dwo
>>> >>> >>   DW_TAG_compile_unit
>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>>> >>> >>     ...
>>> >>> >>
>>> >>> >> // Type unit for the type Foo.
>>> >>> >> .debug_types.dwo, group 0xF00, comdat
>>> >>> >>   DW_TAG_type_unit
>>> >>> >>     DW_TAG_structure_type
>>> >>> >>       DW_AT_name (“Foo”)
>>> >>> >>       ...
>>> >>> >>
>>> >>> >>
>>> >>> >> I think it awkward to have both the skeleton compile_unit in
>>> .debug_info and the partial_unit containing the TAG_module. Personally I’d
>>> prefer putting the TAG_module into the skeleton CU and then just refer to
>>> it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat
>>> section, it looks like that’s what’s necessary.
>>> >>> >
>>> >>> > It's been a while & I've probably lost all the context, but I
>>> think my original theory was to have the skeleton compile_unit be comdat'd
>>> so they'd deduplicate on linking (so we'd only have one reference to the
>>> module.dwo in the linked binary). I don't recall there being a need for a
>>> separate partial_unit - I imagine we'd just put the LLDB/LLVM extension
>>> attributes on the skeleton compile_unit and expect debuggers that didn't
>>> understand them, to ignore them.
>>> >>> >
>>> >>> > Was there some reason this didn't work/make sense? Because you
>>> need a DW_TAG_module to import with DW_TAG_imported_module?
>>> >>> Using DW_TAG_module was the best practice that was recommended on
>>> dwarf-discuss.
>>> >>>
>>> >>> Did they have any ideas on how to reference it without duplicating
>>> it in every CU?
>>> >>
>>> >> We didn’t touch the deduplication issue.
>>> >>
>>> >>> Once we've got the "Bag O Dwarf" stuff (rather than the narrower
>>> type units) this would be easier - (I suppose we could do a partial
>>> solution/abuse of type units - use a type unit header (perhaps with Eric's
>>> merged type/compile unit work) and a DW_FORM_ref_sig8 value for the
>>> DW_AT_module in the DW_TAG_imported_module.
>>> >>>
>>> >>> Though I suppose if we're going to have DW_TAG_imported_module in
>>> every CU that references a module, it might not be that big of a deal to
>>> include the DW_TAG_module itself there too... while I don't care about this
>>> scheme immediately, Google's growing LLDB investment in various platforms,
>>> so I am vaguely concerned about getting this right & it's not immediately
>>> obvious to me what that right answer is.
>>> >>
>>> >> Maybe the best path forward is to stage this by initially putting the
>>> DW_TAG_module into the main CU and leave the deduplication as an
>>> optimization to be implemented once the bag’o dwarf is more fleshed out.
>>> This way we won’t do anything that would confuse consumers (assuming they
>>> ignore unknown tags) and the extra overhead is likely not even going to be
>>> noticeable, since all the string attributes inside the TAG_module can
>>> already be deduplicated by traditional means.
>>> >
>>> > Perhaps. I'd still like to think through/document what this looks like
>>> a bit more. Where the data ends up, what it's used for, etc. Sorry to draw
>>> this out.
>>> >
>>> > :/ *ponders*
>>>
>>>
>>> Let’s construct this:
>>>
>>> The most straightforward representation is to not unique the TAG_module
>>> and place it into the main CU.
>>>
>>> bar.o
>>> ~~~~~
>>>
>>> .debug_info:
>>>   DW_TAG_compile_unit
>>>     ...
>>>     DW_TAG_imported_module
>>>       DW_AT_import [DW_FORM_ref4] (0x20)
>>> 0x20:
>>>     DW_TAG_module
>>>       DW_AT_name(“FooLib”)
>>>       DW_AT_LLVM_sysroot(“/“)
>>>       DW_AT_LLVM_include_dirs(“-I/path”)
>>>       DW_AT_LLVM_macros(“-DNDEBUG”)
>>>
>>
>> Might as well put all these LLVM attributes on the skeleton CU, though -
>> so they can be deduplicated (& just put the dwo_id in this module
>> somewhere, perhaps just using the DW_AT_dwo_id attribute - possibly that's
>> the only attribute the DW_TAG_module would need, ideally). Unless we need
>> to consider the submodule issue (in which case the skeleton unit would
>> reference the whole module but the submodules would reference/describe the
>> respective submodules?)?
>>
>>
>> We cannot put them into the skeleton CU if the skeleton CU is going to be
>> comdat’d, because we’d then have to refer to it via a signature and that
>> leads us directly to the can of worms discussed in the next paragraph :-)
>>
>>
>>
>>>       ...
>>>
>>> // Split DWARF skeleton, comdat'd.
>>> .debug_info, group 0xFEDB9876, comdat
>>>   DW_TAG_compile_unit
>>>
>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>>     DW_AT_dwo_id(“0xFEDB9876”)
>>>     ...
>>>
>>> On Mach-O the split DWARF skeleton would not be a comdat’d, but
>>> llvm-dsymutil can just ignore it.
>>>
>>>
>>> If we want to dedup the TAG_module we need to refer to it via signature.
>>> This means we need to wrap it in a type_unit or a DWARF5 TAG_type_unit. We
>>> might as well throw it in with the skeleton CU.
>>>
>>> .debug_info:
>>>   DW_TAG_compile_unit
>>>     ...
>>>     DW_TAG_imported_module
>>>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
>>>
>>> // Split DWARF skeleton, comdat'd.
>>> .debug_info, group 0xFEDB9876, comdat
>>>   DW_TAG_compile_unit
>>>
>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>>     DW_AT_dwo_id(“0xFEDB9876”)
>>>     ...
>>>     DW_TAG_type_unit (signature: 0xABCD1234)
>>>
>>
>> Can't really put a type_unit inside a compile_unit - it'd need to be
>> top-level with an appropriate type unit header, etc. & then we'd need two
>> different units/headers, could still comdat them, but it's a weird abuse of
>> type units & would probably confuse consumers. I don't know whether that's
>> worth the effort.
>>
>> Oh right.
>>
>>
>>
>>>       DW_TAG_module
>>>         DW_AT_name(“FooLib”)
>>>         DW_AT_LLVM_sysroot(“/“)
>>>         DW_AT_LLVM_include_dirs(“-I/path”)
>>>         DW_AT_LLVM_macros(“-DNDEBUG”)
>>>         ...
>>>
>>> Now that raises the question about what happens with multiple modules
>>> within one PCM.
>>
>>
>> Is the right term "submodule"? it's sort of confusing to talk about
>> multiple modules within a pcm.
>>
>>
>> Yes, a module with nested submodules.
>> http://clang.llvm.org/docs/Modules.html#submodule-declaration
>>
>>
>>
>>> Assuming that the ELF linker is linking and deduping all the non-.dwo
>>> sections, we may loose some of the TAG_modules (if not every CU imports all
>>> submodules) in the binary, but that wouldn’t matter because the consumer
>>> would find all TAG_modules by signature in the .pcm
>>
>>
>> Is there any reason we need to reference the submodules individually,
>> rather than just reference the whole module
>>
>>
>> My assumption is that an AST-aware debugger will want to import the exact
>> submodules that were imported by the CU before dropping into the expression
>> evaluator to replicate the environment of the CU as much as possible.
>>
>
> I'm just not picturing that. It seems pretty likely that a debugger user
> is more likely to treat the whole set of names in the program, not just
> those syntactically valid at that point in the source file.
>
>
> Module imports only work if the debugger has the precise list of models
> imported by the current CU. Clang modules are not namespaces, and any two
> modules may conflict.
>

Right, as you say - ODR & C languages. (& I've no idea if file-scoped
static/anonymous namespace things can go in C++ modules and what happens if
you have conflicting modules in that regard - I guess they can conflict
too? Dunno - maybe anon namespaces in C++ modules aren't allowed)

> The cool thing is that with the imported modules the debugger effectively
> becomes clang and have the entire world visible to the current CU
> available, including any types and functions that never made it into the
> debug info because they were optimized out, or because there were
> uninstantiated templates that cannot be represented by DWARF.
>
> A simple example would be if I'm debugging LLVM and I'm in some generic
> optimization pass, but I want to cast my Instruction pointer to some
> specific instruction type to examine it in more detail - even though this
> pass doesn't care about that specific Instruction type nor include the
> header in which it's declared.
>
>
> If, however, the type lookup fails, the debugger can still fall back to
> the traditional behavior, find the type in the accelerator tables and
> reconstruct it from DWARF (if it is there).
>

So you're going to need to implement fission (to at least some degree)
support in LLDB, then? (to support the case where you haven't linked debug
info with llvm-dsymutil, but you've hit one of these lookup problems where
you need to cross possibly-conflicting modules)

OK, so I think it's probably reasonable for now to just add DW_TAG_modules
to the CU for each referenced module (or does it have to be each referenced
submodule? (can two submodules within a single module be
contradictory/conflicting?)). Since we don't have any good way to reference
the module is a foreign unit while deduplicating that unit... there's not
much point having the imported_module - but if you think it adds anything,
I'm open to ideas. Maybe later (when we have Bag O' DWARF) we can do that.
& only do this when targeting lldb (on by default on Darwin, off by default
elsewhere).

& LLDB, once it's got the Fission support it'll need for this anyway, will
fallback gracefully if these special modules are omitted.

- David

>
>
>>  (& have just a single, whole module in the pcm)?
>>
>>
>> That’s probably not what you meant, but just to be sure: The pcm will
>> always have the entire module with all submodules in it. But the debugger
>> may choose to import only a subset of those.
>>
>>
>>
>>> file referred to by whichever skeleton CU makes it into the binary:
>>>
>>> FooLib-XYZ.pcm
>>> ~~~~~~~~~~~~~~
>>>
>>> .debug_info.dwo
>>>  DW_TAG_compile_unit
>>>    DW_AT_dwo_id(“0xFEDB9876”)
>>>    ...
>>>
>>>  DW_TAG_type_unit (signature: 0xABCD1234)
>>>    DW_TAG_module
>>>      DW_AT_name(“FooLib”)
>>>      ...
>>>  DW_TAG_type_unit (signature: 0xCDEF3456)
>>>    DW_TAG_module
>>>      DW_AT_name(“FooLib”)
>>>      DW_TAG_module
>>>        DW_AT_name(“SubFoo”)
>>>        ...
>>>
>>> So.. this should work as long as nobody points out that a module isn’t
>>> really a type.
>>>
>>
>> Yeah, probably worth waiting for "Bag O DWARF".
>>
>> For now, as you mentioned earlier, maybe just putting the imported_module
>> and the module into the compile_unit when tuning for LLDB (so Darwin by
>> default, and anywhere else where someone tunes for LLDB in the future) &
>> leave them out otherwise.
>>
>>
>> Sounds prefectly reasonable.
>>
>>
>> Could you remind me why LLDB wants to know which modules are referenced
>> from a CU? (rather than just all the modules used by a program overall?)
>>
>>
>> LLDB uses clang for the expression evaluation. Traditionally it would
>> look up a type in DWARF, build a clang AST out of it and then import it.
>> With this it could directly import the clang modules and have access to
>> everything in the module. But, clang modules are not namespaces, so modules
>> can conflict (and that would probably manifest as a crash in libclang).
>>
>
> What's an example of such a conflict? Is that valid (or is it just in ODR
> violations) - as mentioned above, it seems to me that only importing the
> things lexically available in this source file isn't what a debugger user
> would really want. I certainly think I'd trip over that a lot.
>
>
> Keep in mind that Objective-C (and C) do not have an ODR, so it’s not just
> “just” :-)
> Being able to import modules does not mean that the debugger cannot still
> fall back to loading types from DWARF; in fact it will have to do that for
> all local types anyway.
>
> -- adrian
>
>
>
>> It therefore needs to know which modules are imported in the current CU
>> before dropping into the expression evaluator.
>>
>> - adrian
>>
>>
>>
>>>
>>>
>>>
>>> On Macho-O, in the absence of comdats, we have:
>>>
>>> bar.o
>>> ~~~~~
>>>
>>> .debug_info:
>>>   DW_TAG_compile_unit
>>>     ...
>>>     DW_TAG_imported_module
>>>       DW_AT_import [DW_FORM_ref4] (0x20)
>>>
>>>     DW_TAG_module           // uniqued by dsymutil.
>>>       DW_AT_name(“FooLib”)
>>>       DW_AT_LLVM_sysroot(“/“)
>>>       DW_AT_LLVM_include_dirs(“-I/path”)
>>>       DW_AT_LLVM_macros(“-DNDEBUG”)
>>>       ...
>>>
>>> // Split DWARF skeleton, thrown out by dsymutil.
>>>
>>
>> Thrown out? Because it's going to read everything in from the module and
>> merge it in to a single linked debug info blob, I take it?
>>
>>
>>> .debug_info, group 0xFEDB9876, comdat
>>>   DW_TAG_compile_unit
>>>
>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>>     DW_AT_dwo_id(“0xFEDB9876”)
>>>     ...
>>>
>>> FooLib-XYZ.pcm
>>> ~~~~~~~~~~~~~~
>>>
>>> .debug_info:
>>>   DW_TAG_compile_unit
>>>     DW_AT_dwo_id(“0xFEDB9876”)
>>>     ...
>>>
>>>     DW_TAG_module
>>>       DW_AT_name(“FooLib”)
>>>       DW_TAG_module
>>>         DW_AT_name(“SubFoo”)
>>>         ...
>>>
>>> -- adrian
>>>
>>> >
>>> >>
>>> >>>
>>> >>> > If it turns out that's the right way to get a target for the
>>> imported_module, we could put both the skeleton CU and the partial unit in
>>> the same comdat and dedup them both together.
>>> >>>
>>> >>> I think this works as long as we only have one TAG_module per .pcm
>>> file (because we need to refer to it via signature).
>>> >>>
>>> >>> Not quite following here - why would we have more than one module
>>> per pcm - a pcm is a module, right?
>>> >>
>>> >> Clang modules may have submodules and a compile unit could import two
>>> submodules that live in the same .pcm file. For example on Darwin there is
>>> a module Darwin.pcm that contains a submodule “C" that contains the
>>> submodule “stdio".
>>> >
>>> > OK, so this bit's relevant to your use case in LLDB of loading the
>>> right things for the right context, but not relevant to the context-less
>>> debuggers like GDB that will just treat everything as one big namespace
>>> (except for file-local things, etc). So it's important for your imported
>>> modules but not for the basic Fission style debug reference.
>>> >
>>> > Well, maybe - I'm not sure what you're picturing in terms of the DWARF
>>> in the module for submodules? If you want that granularity we'll have to
>>> talk about how to split the DWARF in the module into chunks per submodule?
>>> >
>>> >>
>>> >>>
>>> >>> But if we don’t mind having duplicate dwo_* references in the same
>>> .o file this would also work with more than one TAG_module (or submodules).
>>> >>>
>>> >>>
>>> >>> .debug_info:
>>> >>>  DW_TAG_compile_unit
>>> >>>    DW_AT_name(“bar.c”)
>>> >>>    ...
>>> >>>
>>> >>>    DW_TAG_imported_module // <- This could be optional on ELF.
>>> >>>      DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876)
>>> >>>
>>> >>>    ...
>>> >>>
>>> >>> // Comdat’d split DWARF skeleton CU for the module Foo.
>>> >>> .debug_info, group 0xFEDB9876, comdat
>>> >>>  DW_TAG_compile_unit
>>> >>>
>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>> >>>    DW_AT_dwo_id(“0xFEDB9876”)
>>> >>>    ...
>>> >>>
>>> >>>    DW_TAG_module
>>> >>>      DW_AT_name(“FooLib”)
>>> >>>      DW_AT_LLVM_sysroot(“/“)
>>> >>>      DW_AT_LLVM_include_dirs(“-I/path”)
>>> >>>      DW_AT_LLVM_macros(“-DNDEBUG”)
>>> >>>      ...
>>> >>>
>>> >>>
>>> >>> >
>>> >>> > But this gets into complicated territory when the original binary
>>> is built with fission... which will be relevant for modules on ELF with
>>> LLDB. Hmm, maybe it's not too complicated - the partial_unit would end up
>>> in the .dwo file (maybe we'd have to teach the .dwo file to deduplicate
>>> these too - the same way it does for type units... - might require a new
>>> header to include the hash, etc :/)... would be tricky to have the dwp tool
>>> resolve the relocations to these things. Cross-unit references as you've
>>> got there aren't something that every DWARF consumer is totally cool with,
>>> I don't think?
>>> >>>
>>> >>> Ah. I thought the deduplication happens because all ELF sections
>>> sharing the same group are uniqued based on the group id.
>>> >>>
>>> >>> COMDAT groups deduplicate for a normal non-fission build, but
>>> fission linking doesn't require the .dwo file to use/contain COMDATs as it
>>> uses a DWARF-aware tool (so you don't bother putting the type units in
>>> COMDAT groups, for example - the fission linker knows how to parse
>>> debug_types, find the type unit headers and their hashes and deduplicates
>>> them that way).
>>> >>
>>> >> Ok that makes sense.
>>> >>
>>> >> -- adrian
>>> >>
>>> >>>
>>> >>> It certainly would be nice if we could avoid introducing a new
>>> .debug_info header...
>>> >>>
>>> >>> >
>>> >>> > Sort of inclined to have the imported module stuff just for LLDB,
>>> but I've lost some of the context for that in the ensuing weeks.
>>> >>>
>>> >>> -- adrian
>>> >>>
>>> >>> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> MachO (no typeunits, no comdats, with imports)
>>> >>> >> ----------------------------------------------
>>> >>> >>
>>> >>> >> Since we don’t have comdat sections in Mach-O and we don’t have
>>> the tool support for type units, the way that external types can be
>>> referenced necessarily needs to be a bit different. The design that Greg
>>> and I came up with for Mach-O relies on llvm-dsymutil to fix up the DWARF
>>> for non-module-aware consumers. Just as ELF DWARF consumers need not be
>>> able to tell the difference between module debugging an split DWARF, on
>>> Mach-O the .dSYM bundle generated by llvm-dsymutil looks like traditional
>>> DWARF.
>>> >>> >>
>>> >>> >> There are three differences in the DWARF output that make this
>>> possible:
>>> >>> >>   - Refer to external types by UID rather than by type signature.
>>> >>> >>     (This doubles as the key that allows a debugger to look
>>> import the type
>>> >>> >>      directly from the AST and protects us against hash
>>> collisions)
>>> >>> >>   - Add an index to the .o file that maps UID -> module file.
>>> >>> >>     (Fast lookup + UIDs for C and ObjC are only unique within a
>>> module)
>>> >>> >>   - Add an entry for each type’s UID to the types accelerator
>>> table.
>>> >>> >>     (Fast lookup)
>>> >>> >>
>>> >>> >> bar.o
>>> >>> >> ~~~~~
>>> >>> >>
>>> >>> >> .debug_info:
>>> >>> >>   DW_TAG_compile_unit
>>> >>> >>     DW_AT_name(“bar.c”)
>>> >>> >>     DW_TAG_imported_module
>>> >>> >>       DW_AT_import(DW_FORM_ref_addr 0x40)
>>> >>> >>
>>> >>> >>     DW_TAG_variable
>>> >>> >>       DW_AT_name(“MyFoo”)
>>> >>> >>       DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”)  // We could use a
>>> custom FORM here
>>> >>> >>
>>> >>> >>   // Skeleton unit.
>>> >>> >>   DW_TAG_compile_unit
>>> >>> >>
>>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>>> >>> >>     ...
>>> >>> >> 0x40:
>>> >>> >>     DW_TAG_module
>>> >>> >>       DW_AT_name(“FooLib”)
>>> >>> >>       DW_AT_LLVM_sysroot(“/“)
>>> >>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
>>> >>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>>> >>> >>
>>> >>> >> // This index uses the usual accelerator table format.
>>> >>> >> .apple_exttypes:
>>> >>> >> { “_ZTS3Foo” => debug_str offset of
>>> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }
>>> >>> >>
>>> >>> >> FooLib-XYZ.pcm
>>> >>> >> ~~~~~~~~~~~~~~
>>> >>> >>
>>> >>> >> .debug_info
>>> >>> >>   DW_TAG_compile_unit
>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>>> >>> >>
>>> >>> >> 0x80:
>>> >>> >>   DW_TAG_structure_type
>>> >>> >>     DW_AT_name (“Foo”)
>>> >>> >>     DW_AT_signature
>>> >>> >>     ...
>>> >>> >>
>>> >>> >> // In addition to the entry for “Foo”, there is also an entry for
>>> the type’s UID “_ZTS3Foo” pointing to the type definition DIE.
>>> >>> >> .apple_types
>>> >>> >> { “Foo” => 0x80 }
>>> >>> >> { “_ZTS3Foo” => 0x80 }
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> When the debug info linker (llvm-dsymutil) is run, it first pulls
>>> in the .debug_info section from the clang module and fixes up all the
>>> DW_FORM_strp external type references by turning them into a
>>> DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled
>>> in from the module. To find the correct type DIE it looks up the UID in the
>>> .apple_exttypes index, finds the module, looks up the UID in the regular
>>> .apple_types accelerator table and replaces the temporary DW_FROM_strp with
>>> a DW_FORM_ref_addr (which incidentally takes up the same amount of space in
>>> the DIE).
>>> >>> >>
>>> >>> >>
>>> >>> >> Thoughts?
>>> >>> >> --
>>> >>> >> adrian
>>> >>> >>
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150504/1fb2f732/attachment.html>