[PATCH] Have clang list the imported modules in the debug info

Mon May 4 13:31:24 PDT 2015

On Mon, May 4, 2015 at 12:27 PM, Adrian Prantl <aprantl at apple.com> wrote:

> >
> >> On May 4, 2015, at 11:38 AM, David Blaikie <dblaikie at gmail.com> wrote:
> >>
> >>
> >>
> >> On Mon, May 4, 2015 at 11:24 AM, Adrian Prantl <aprantl at apple.com>
> wrote:
> >>
> >>> On May 4, 2015, at 10:53 AM, David Blaikie <dblaikie at gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Fri, May 1, 2015 at 8:52 PM, Adrian Prantl <aprantl at apple.com>
> wrote:
> >>>>
> >>>>> On May 1, 2015, at 5:25 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, May 1, 2015 at 5:19 PM, Adrian Prantl <aprantl at apple.com>
> wrote:
> >>>>>
> >>>>>> On May 1, 2015, at 4:55 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, May 1, 2015 at 4:39 PM, Adrian Prantl <aprantl at apple.com>
> wrote:
> >>>>>>
> >>>>>> > On May 1, 2015, at 10:01 AM, David Blaikie <dblaikie at gmail.com>
> wrote:
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <aprantl at apple.com>
> wrote:
> >>>>>> >>
> >>>>>> >>> On May 1, 2015, at 9:23 AM, David Blaikie <dblaikie at gmail.com>
> wrote:
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>>
> >>>>>> >>> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <
> aprantl at apple.com> wrote:
> >>>>>> >>>
> >>>>>> >>> > On Apr 30, 2015, at 4:55 PM, David Blaikie <
> dblaikie at gmail.com> wrote:
> >>>>>> >>> >
> >>>>>> >>> >
> >>>>>> >>> >
> >>>>>> >>> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <
> aprantl at apple.com> wrote:
> >>>>>> >>> >>
> >>>>>> >>> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <
> dblaikie at gmail.com> wrote:
> >>>>>> >>> >> >
> >>>>>> >>> >> >
> >>>>>> >>> >> >
> >>>>>> >>> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <
> aprantl at apple.com> wrote:
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <
> dblaikie at gmail.com> wrote:
> >>>>>> >>> >> >> >
> >>>>>> >>> >> >> >
> >>>>>> >>> >> >> >
> >>>>>> >>> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul <
> Paul_Robinson at playstation.sony.com> wrote:
> >>>>>> >>> >> >> > Beyond the above (that using a new tag would mean this
> would go from 'free' to 'not free' for GDB) having a new top level tag is
> pretty substantial (we only have two at the moment, and with our talk of
> modules being a "bag of dwarf" might go back to having one top level tag?
> (it's not clear to me from DWARF4 whether DW_TAG_module is currently a
> top-level tag, I don't think it is?)
> >>>>>> >>> >> >> >
> >>>>>> >>> >> >> >> The .debug_info section contains one or more
> compilation units, partial units, or in DWARF 5, type units.  DW_TAG_module
> isn't a unit, if you want it to be handled independently then it would need
> to be wrapped in a DW_TAG_partial_unit.  You would probably then use
> DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module.
> >>>>>> >>> >> >> >>
> >>>>>> >>> >> >> >
> >>>>>> >>> >> >> > This makes a fair bit of sense - though the
> terminology's never going to quite line up with modules, I suspect, and
> this would still require modifying existing consumers (well, GDB) that can
> handle split-dwarf today, I suspect (not sure how it'd handle partial_unit
> - maybe that does work? - and still don't know how existing consumers would
> handle imported_unit either - could be worth some testing, as it sounds
> sort of right out of several less right options).
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> Thanks for all the input so far!
> >>>>>> >>> >> >> To concretize this end of the discussion up let’s sketch
> some dwarf of how this could look like in practice.
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> ELF (no imports)
> >>>>>> >>> >> >> ----------------
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> On ELF or COFF a foo.c referencing types from the module
> Foundation looks like this:
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> .debug_info:
> >>>>>> >>> >> >>   DW_TAG_compile_unit
> >>>>>> >>> >> >>     DW_AT_name(“foo.c”)
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
> >>>>>> >>> >> >>   DW_TAG_partial_unit
> >>>>>> >>> >> >
> >>>>>> >>> >> > For now I'd suggest we use compile_unit - that way it'll
> just work with existing split-dwarf consumers. We can see about
> standardizing a top-level DW_TAG_module or using DW_TAG_partial_unit here
> later, perhaps? I'm not sure.
> >>>>>> >>> >> >
> >>>>>> >>> >> >>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >>>>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >>>>>> >>> >> >>
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> Side question: Is .debug_info.dwo the right section to
> put the module skeleton in, or should it be a .debug_info section like
> normal fission skeletons?
> >>>>>> >>> >> >
> >>>>>> >>> >> > Skeletons go in .debug_info, the dwo sections are just for
> the .dwo file (or the module file, in our new case - the extension isn't
> actually important).
> >>>>>> >>> >> >
> >>>>>> >>> >> > It might be worth you compiling an example or two of
> split-dwarf to see how this all works hands-on.
> >>>>>> >>> >> >
> >>>>>> >>> >> >> Mach-O (no comdat, no imports)
> >>>>>> >>> >> >> ------------------------------
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable
> (not sure if that option is the best discriminator) this could look like:
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> .debug_info:
> >>>>>> >>> >> >>   DW_TAG_compile_unit
> >>>>>> >>> >> >>     DW_AT_name(“foo.c”)
> >>>>>> >>> >> >>   DW_TAG_partial_unit
> >>>>>> >>> >> >>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >>>>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >>>>>> >>> >> >>
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> Mach-O (no comdat, with imports)
> >>>>>> >>> >> >> ------------------------------
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> If we add the module import information to this, we get:
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> .debug_info:
> >>>>>> >>> >> >>   DW_TAG_compile_unit
> >>>>>> >>> >> >>     DW_AT_name(“foo.c”)
> >>>>>> >>> >> >>     DW_TAG_imported_module
> >>>>>> >>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x10)
> >>>>>> >>> >> >
> >>>>>> >>> >> > Since we got went down the tangent of explaining
> split-dwarf many emails ago, I've forgotten (& can't readily find) what we
> were discussing about what ways the imported_module could work.
> >>>>>> >>> >> >
> >>>>>> >>> >> > The simplest representation I can think of would be to
> have it reference, by signature, the module unit (whatever tag it uses) -
> DW_FORM_ref_sig8, seems the simplest thing to do.
> >>>>>> >>> >> >
> >>>>>> >>> >> >>
> >>>>>> >>> >> >>   DW_TAG_partial_unit
> >>>>>> >>> >> >>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >>>>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> 0x10:
> >>>>>> >>> >> >
> >>>>>> >>> >> > This is inside the partial unit? I figured we'd just put
> these attributes on the top level (compile_unit, or whatever it might be
> later) - potentially conditionalized on platform, sure.
> >>>>>> >>> >> >
> >>>>>> >>> >> >>     DW_TAG_module
> >>>>>> >>> >> >>       DW_AT_name(“Foundation”)
> >>>>>> >>> >> >>       DW_AT_LLVM_sysroot(“/“)
> >>>>>> >>> >> >>       DW_AT_LLVM_include_dir(“”)
> >>>>>> >>> >> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>>> >>> >> >>       ...
> >>>>>> >>> >> >>
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> ELF (comdat, with imports)
> >>>>>> >>> >> >> --------------------------
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> But now let’s go back to ELF. Since the skeleton with the
> partial unit is comdat'd, I assume that this breaks the FORM_ref_addr used
> in the DW_AT_import. We could reuse the module hash as a signature for the
> module:
> >>>>>> >>> >> >>
> >>>>>> >>> >> >> .debug_info:
> >>>>>> >>> >> >>   DW_TAG_compile_unit
> >>>>>> >>> >> >>     DW_AT_name(“foo.c”)
> >>>>>> >>> >> >>     DW_TAG_imported_module
> >>>>>> >>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
> >>>>>> >>> >> >
> >>>>>> >>> >> > Still only really need these imported_modules for lldb,
> right? I'd consider having them off-by-default for non-darwin, but I'm not
> strictly wedded to that notion. Wouldn't mind seeing size impact numbers of
> some kind - if it's really fractional % increase & GDB doesn't fall over
> when it sees them (in whatever FORM/tag/etc we decide on) then that's not
> the end of the world.
> >>>>>> >>> >> >
> >>>>>> >>> >> > Just seems nice if the default mode is the nice, standard,
> split-dwarf output. Doesn't need anything fancy.
> >>>>>> >>> >> >
> >>>>>> >>> >> >
> >>>>>> >>> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat)
> >>>>>> >>> >> >>   DW_TAG_partial_unit
> >>>>>> >>> >> >>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >>>>>> >>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >>>>>> >>> >> >>
> >>>>>> >>> >> >>     DW_TAG_module
> >>>>>> >>> >> >>       DW_AT_signature(“0x1234ABCDE”)
> >>>>>> >>> >> >>       DW_AT_name(“Foundation”)
> >>>>>> >>> >> >
> >>>>>> >>> >> >
> >>>>>> >>> >> > The thing you haven't covered is the actual .dwo sections
> (.debug_info.dwo (we'll probably need a simple stub compile_unit to make
> this correct split-dwarf) and .debug_types.dwo being important - but all
> the supporting .dwo sections will be necessary) that go in the module file.
> >>>>>> >>> >> >
> >>>>>> >>> >> >> This is bending the definition of DW_AT_signature, but I
> guess it could be made to work. Or we could say that for now, users have to
> choose between the comdat optimization and having the module imports
> recorded in Dwarf, since GDB wouldn’t know what to do with that information
> anyway.
> >>>>>> >>> >>
> >>>>>> >>> >> Sorry for the long delay. Here’s a more complete example
> that should include all the suggestions made so far. For context I also
> included external type references in the example although admittedly this
> is a bit out of scope for this thread:
> >>>>>> >>> >>
> >>>>>> >>> >> ELF (typeunits, comdats, with imports)
> >>>>>> >>> >> --------------------------------------
> >>>>>> >>> >>
> >>>>>> >>> >> On ELF or COFF a bar.c referencing type Foo from the module
> FooLib looks like this:
> >>>>>> >>> >>
> >>>>>> >>> >> bar.o
> >>>>>> >>> >> ~~~~~
> >>>>>> >>> >>
> >>>>>> >>> >> // To keep this example focussed/readable, I'm assuming that
> bar.o itself was not compiled with fission.
> >>>>>> >>> >> .debug_info:
> >>>>>> >>> >>   DW_TAG_compile_unit
> >>>>>> >>> >>     DW_AT_name(“bar.c”)
> >>>>>> >>> >>     ...
> >>>>>> >>> >>
> >>>>>> >>> >>     DW_TAG_imported_module // <- This could be optional on
> ELF.
> >>>>>> >>> >>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
> >>>>>> >>> >>
> >>>>>> >>> >>     DW_TAG_variable
> >>>>>> >>> >>       DW_AT_name(“MyFoo”)
> >>>>>> >>> >>       DW_AT_type [DW_FORM_ref4] 0x20
> >>>>>> >>> >> 0x20:
> >>>>>> >>> >>     DW_TAG_structure_type
> >>>>>> >>> >>       DW_AT_declaration (true)
> >>>>>> >>> >>       DW_AT_signature [DW_FORM_ref_sig8] (0xF00)
> >>>>>> >>> >>
> >>>>>> >>> >>
> >>>>>> >>> >> // Split DWARF skeleton CU for the module Foo.
> >>>>>> >>> >>   DW_TAG_compile_unit
> >>>>>> >>> >>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>>>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>>> >>> >>     ...
> >>>>>> >>> >>
> >>>>>> >>> >> // Comdat’d partial unit containing the optional module
> descriptor.
> >>>>>> >>> >> .debug_info, group 0xABCD1234, comdat
> >>>>>> >>> >>   DW_TAG_partial_unit
> >>>>>> >>> >>     DW_TAG_module
> >>>>>> >>> >>       DW_AT_name(“FooLib”)
> >>>>>> >>> >>       DW_AT_LLVM_sysroot(“/“)
> >>>>>> >>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
> >>>>>> >>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>>> >>> >>       ...
> >>>>>> >>> >>
> >>>>>> >>> >> FooLib-XYZ.pcm
> >>>>>> >>> >> ~~~~~~~~~~~~~~
> >>>>>> >>> >>
> >>>>>> >>> >> .debug_info.dwo
> >>>>>> >>> >>   DW_TAG_compile_unit
> >>>>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>>> >>> >>     ...
> >>>>>> >>> >>
> >>>>>> >>> >> // Type unit for the type Foo.
> >>>>>> >>> >> .debug_types.dwo, group 0xF00, comdat
> >>>>>> >>> >>   DW_TAG_type_unit
> >>>>>> >>> >>     DW_TAG_structure_type
> >>>>>> >>> >>       DW_AT_name (“Foo”)
> >>>>>> >>> >>       ...
> >>>>>> >>> >>
> >>>>>> >>> >>
> >>>>>> >>> >> I think it awkward to have both the skeleton compile_unit in
> .debug_info and the partial_unit containing the TAG_module. Personally I’d
> prefer putting the TAG_module into the skeleton CU and then just refer to
> it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat
> section, it looks like that’s what’s necessary.
> >>>>>> >>> >
> >>>>>> >>> > It's been a while & I've probably lost all the context, but I
> think my original theory was to have the skeleton compile_unit be comdat'd
> so they'd deduplicate on linking (so we'd only have one reference to the
> module.dwo in the linked binary). I don't recall there being a need for a
> separate partial_unit - I imagine we'd just put the LLDB/LLVM extension
> attributes on the skeleton compile_unit and expect debuggers that didn't
> understand them, to ignore them.
> >>>>>> >>> >
> >>>>>> >>> > Was there some reason this didn't work/make sense? Because
> you need a DW_TAG_module to import with DW_TAG_imported_module?
> >>>>>> >>> Using DW_TAG_module was the best practice that was recommended
> on dwarf-discuss.
> >>>>>> >>>
> >>>>>> >>> Did they have any ideas on how to reference it without
> duplicating it in every CU?
> >>>>>> >>
> >>>>>> >> We didn’t touch the deduplication issue.
> >>>>>> >>
> >>>>>> >>> Once we've got the "Bag O Dwarf" stuff (rather than the
> narrower type units) this would be easier - (I suppose we could do a
> partial solution/abuse of type units - use a type unit header (perhaps with
> Eric's merged type/compile unit work) and a DW_FORM_ref_sig8 value for the
> DW_AT_module in the DW_TAG_imported_module.
> >>>>>> >>>
> >>>>>> >>> Though I suppose if we're going to have DW_TAG_imported_module
> in every CU that references a module, it might not be that big of a deal to
> include the DW_TAG_module itself there too... while I don't care about this
> scheme immediately, Google's growing LLDB investment in various platforms,
> so I am vaguely concerned about getting this right & it's not immediately
> obvious to me what that right answer is.
> >>>>>> >>
> >>>>>> >> Maybe the best path forward is to stage this by initially
> putting the DW_TAG_module into the main CU and leave the deduplication as
> an optimization to be implemented once the bag’o dwarf is more fleshed out.
> This way we won’t do anything that would confuse consumers (assuming they
> ignore unknown tags) and the extra overhead is likely not even going to be
> noticeable, since all the string attributes inside the TAG_module can
> already be deduplicated by traditional means.
> >>>>>> >
> >>>>>> > Perhaps. I'd still like to think through/document what this looks
> like a bit more. Where the data ends up, what it's used for, etc. Sorry to
> draw this out.
> >>>>>> >
> >>>>>> > :/ *ponders*
> >>>>>>
> >>>>>>
> >>>>>> Let’s construct this:
> >>>>>>
> >>>>>> The most straightforward representation is to not unique the
> TAG_module and place it into the main CU.
> >>>>>>
> >>>>>> bar.o
> >>>>>> ~~~~~
> >>>>>>
> >>>>>> .debug_info:
> >>>>>>   DW_TAG_compile_unit
> >>>>>>     ...
> >>>>>>     DW_TAG_imported_module
> >>>>>>       DW_AT_import [DW_FORM_ref4] (0x20)
> >>>>>> 0x20:
> >>>>>>     DW_TAG_module
> >>>>>>       DW_AT_name(“FooLib”)
> >>>>>>       DW_AT_LLVM_sysroot(“/“)
> >>>>>>       DW_AT_LLVM_include_dirs(“-I/path”)
> >>>>>>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>>>
> >>>>>> Might as well put all these LLVM attributes on the skeleton CU,
> though - so they can be deduplicated (& just put the dwo_id in this module
> somewhere, perhaps just using the DW_AT_dwo_id attribute - possibly that's
> the only attribute the DW_TAG_module would need, ideally). Unless we need
> to consider the submodule issue (in which case the skeleton unit would
> reference the whole module but the submodules would reference/describe the
> respective submodules?)?
> >>>>>
> >>>>> We cannot put them into the skeleton CU if the skeleton CU is going
> to be comdat’d, because we’d then have to refer to it via a signature and
> that leads us directly to the can of worms discussed in the next paragraph
> :-)
> >>>>>>
> >>>>>>       ...
> >>>>>>
> >>>>>> // Split DWARF skeleton, comdat'd.
> >>>>>> .debug_info, group 0xFEDB9876, comdat
> >>>>>>   DW_TAG_compile_unit
> >>>>>>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>>>>>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>>>     ...
> >>>>>>
> >>>>>> On Mach-O the split DWARF skeleton would not be a comdat’d, but
> llvm-dsymutil can just ignore it.
> >>>>>>
> >>>>>>
> >>>>>> If we want to dedup the TAG_module we need to refer to it via
> signature. This means we need to wrap it in a type_unit or a DWARF5
> TAG_type_unit. We might as well throw it in with the skeleton CU.
> >>>>>>
> >>>>>> .debug_info:
> >>>>>>   DW_TAG_compile_unit
> >>>>>>     ...
> >>>>>>     DW_TAG_imported_module
> >>>>>>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
> >>>>>>
> >>>>>> // Split DWARF skeleton, comdat'd.
> >>>>>> .debug_info, group 0xFEDB9876, comdat
> >>>>>>   DW_TAG_compile_unit
> >>>>>>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>>>>>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>>>     ...
> >>>>>>     DW_TAG_type_unit (signature: 0xABCD1234)
> >>>>>>
> >>>>>> Can't really put a type_unit inside a compile_unit - it'd need to
> be top-level with an appropriate type unit header, etc. & then we'd need
> two different units/headers, could still comdat them, but it's a weird
> abuse of type units & would probably confuse consumers. I don't know
> whether that's worth the effort.
> >>>>> Oh right.
> >>>>>
> >>>>>>
> >>>>>>       DW_TAG_module
> >>>>>>         DW_AT_name(“FooLib”)
> >>>>>>         DW_AT_LLVM_sysroot(“/“)
> >>>>>>         DW_AT_LLVM_include_dirs(“-I/path”)
> >>>>>>         DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>>>         ...
> >>>>>>
> >>>>>> Now that raises the question about what happens with multiple
> modules within one PCM.
> >>>>>>
> >>>>>> Is the right term "submodule"? it's sort of confusing to talk about
> multiple modules within a pcm.
> >>>>>
> >>>>> Yes, a module with nested submodules.
> >>>>> http://clang.llvm.org/docs/Modules.html#submodule-declaration
> >>>>>
> >>>>>>
> >>>>>> Assuming that the ELF linker is linking and deduping all the
> non-.dwo sections, we may loose some of the TAG_modules (if not every CU
> imports all submodules) in the binary, but that wouldn’t matter because the
> consumer would find all TAG_modules by signature in the .pcm
> >>>>>>
> >>>>>> Is there any reason we need to reference the submodules
> individually, rather than just reference the whole module
> >>>>>
> >>>>> My assumption is that an AST-aware debugger will want to import the
> exact submodules that were imported by the CU before dropping into the
> expression evaluator to replicate the environment of the CU as much as
> possible.
> >>>>>
> >>>>> I'm just not picturing that. It seems pretty likely that a debugger
> user is more likely to treat the whole set of names in the program, not
> just those syntactically valid at that point in the source file.
> >>>>
> >>>> Module imports only work if the debugger has the precise list of
> models imported by the current CU. Clang modules are not namespaces, and
> any two modules may conflict.
> >>>
> >>> Right, as you say - ODR & C languages. (& I've no idea if file-scoped
> static/anonymous namespace things can go in C++ modules and what happens if
> you have conflicting modules in that regard - I guess they can conflict
> too? Dunno - maybe anon namespaces in C++ modules aren't allowed)
> >>
> >> It sounds like a strange concept to put an anonymous namespace into a
> public module, but then again there exists
> clang/test/Modules/anon-namespace.cpp (it only uses an empty anonymous
> namespace, though). I’m not sure how this is meant to be used.
> >>
> >>>>
> >>>> The cool thing is that with the imported modules the debugger
> effectively becomes clang and have the entire world visible to the current
> CU available, including any types and functions that never made it into the
> debug info because they were optimized out, or because there were
> uninstantiated templates that cannot be represented by DWARF.
> >>>>
> >>>>> A simple example would be if I'm debugging LLVM and I'm in some
> generic optimization pass, but I want to cast my Instruction pointer to
> some specific instruction type to examine it in more detail - even though
> this pass doesn't care about that specific Instruction type nor include the
> header in which it's declared.
> >>>>
> >>>> If, however, the type lookup fails, the debugger can still fall back
> to the traditional behavior, find the type in the accelerator tables and
> reconstruct it from DWARF (if it is there).
> >>>
> >>> So you're going to need to implement fission (to at least some degree)
> support in LLDB, then? (to support the case where you haven't linked debug
> info with llvm-dsymutil, but you've hit one of these lookup problems where
> you need to cross possibly-conflicting modules)
> >>
> >> Yes. Specifically, it won’t support type units, and it will look up
> types by name rather than by signature. (cf. the second part of
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html
> )
> >
> > How are you going to reference the types in the module's fission CU
> without type units/signatures? Are you going to emit type declarations into
> the normal CU and rely on the debugger to know that these declarations can
> be resolved by looking elsewhere? (just without the benefit of constraining
> that search to just looking for a matching TU?)
>
> If you look at the example in
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html,
> there will be an external type index (using the usual accelerator table
> format) that maps an external type’s UID to a pcm. In the pcm there is an
> extra accelerator table entry that maps UID to DIE offset.
>

I mean I guess that's up to you, but seems like a relatively large
workaround compared to supporting type units... (I mean certainly seems
like strictly less work to do the workaround than implementing type units
in LLDB, but a relatively large amount of work to do/throw away eventually
once LLDB supports type units)

>
> >>
> >>
> >>>
> >>> OK, so I think it's probably reasonable for now to just add
> DW_TAG_modules to the CU for each referenced module (or does it have to be
> each referenced submodule? (can two submodules within a single module be
> contradictory/conflicting?)). Since we don't have any good way to reference
> the module is a foreign unit while deduplicating that unit... there's not
> much point having the imported_module - but if you think it adds anything,
> I'm open to ideas.
> >> It could help keeping things simpler.
> >> Emitting it doesn’t add much semantic value because module imports
> always occur at the top level, but it will make the transition to the
> deduplicated TAG_modules easier — It could be easier to teach consumers
> once about imported_module({ref to TAG_module}) rather than having them
> also recognize top-level TAG_modules as an intermediate step. It’s also
> slightly easier to implement in LLVM because the imported_module allows us
> to anchor the TAG_module in the CU, but that’s not a very strong argument.
> >
> > Agreed on all counts (not a strong argument, but convenient enough, etc,
> etc).
> >
> > I'm still not entirely sure what the right answer is here, though, which
> is why I'm hesitant to bake anything in too strongly.
> >
> > To come back to one of the outstanding questions: Do you need submodule
> import information, or just module level (if modules cannot have internal
> conflicts and you can't avoid cross-module conflicts just by lack of
> visibility (I have no idea if either of those things are true) then you may
> just need per-module not per-submodule info)?
>
> At the moment I do not think that it makes sense for two submodules to
> conflict, but there is nothing in the clang documentation that explicitly
> forbids this. With this in mind, I think it is reasonable to not support
> submodules (at least initially) and always emit an import for the parent
> module.
> Thats what I wanted to write ... but I as I’m browsing through our
> documentation,
> http://clang.llvm.org/docs/Modules.html#conflict-declarations explicitly
> gives an example of two conflicting submodules, so maybe this is not a
> reasonable simplification after all. On the other hand, a quick grep over
> all system module maps on OS X doesn’t show a single conflict declaration.
>
> I still believe we do not need to support submodules right from the start,
> but we should have a story for getting there if we need to.
>

Given the simple example that demonstrates the possibility, it seems fair
to have a story for what that looks like, yes - even if a first
pass/prototype doesn't support it.

>
> >
> > Also, does each submodule need different special attributes/flags? If
> the special codegen attributes you want are at the module level, it'd
> probably be best to keep those on the Skeleton CU for the module (that will
> be comdat folded, etc, on ELF - and they could be DWARF-aware deduplicated
> by llvm-dsymutil) so they're not duplicated. The DW_TAG_module would then
> just have a DW_AT_signature attribute or something similarly small/trivial
> to point to the skeleton CU.
>
> The attributes are derived from cc1 command line arguments. Not two
> submodules imported by one CU can have different attributes. All submodules
> in a pcm also share their attributes. Putting them into the skeleton CU
> appears to be the most efficient place to put them, though perhaps not the
> most logical one.
>

Why not the most logical? It'd be nice if it were a DW_TAG_module instead
of a DW_TAG_compile_unit - but given the limited vocabulary we have in
DWARF top level tags, it seems as good as we can have.

> I would prefer to stick the attributes on the (top-level) DW_TAG_module
> and later deduplicate the attributes together with the DW_TAG_module.
> Sticking them on the skeleton won’t save any space in the .o files and
> would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot -
> 1x FORM_ref_sig_8) per CU and imported module.

Seems nicer not to duplicate them, especially since not everyone will be
using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux
won't have that convenience). Eventually we can use Bag O' DWARF for the
skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow that
as a top-level tag, if desired/useful - I'm not sure it adds a lot) and
have the imported_module reference it that way. (DW_TAG_imported_module,
DW_AT_import, DW_FORM_ref_sig8)

I'm not /hugely/ invested in this, but we do have people caring about LLDB
on Linux and Windows, so avoiding tying the LLDB story to MachO and
dsymutil, etc, seems valuable.

> >

> If you need submodule import lists, then each DW_AT_module representing a
> submodule would have a name (anything else?) and the signature refering to
> its module skeleton CU.
>
> What I’m envisioning is
>
> .debug_info:
>   DW_TAG_compile_unit
>     ...
>     DW_TAG_imported_module
>      // import FooSubA
>      DW_AT_import [DW_FORM_ref4] (0x60)
>
>     DW_TAG_module
>       DW_AT_name(“FooLib”)
>       DW_AT_LLVM_sysroot(“/“)
>       DW_AT_LLVM_include_dirs(“-I/path”)
>       DW_AT_LLVM_macros(“-DNDEBUG”)
> 0x60:
>       DW_TAG_module
>         DW_AT_name(“FooSubA”)
>         // need not be emitted if not referenced.
>         DW_TAG_module
>           DW_AT_name(“FooSubASubA”)
>
>       // need not be emitted if not referenced.
>       DW_TAG_module
>         DW_AT_name(“FooSubB”)
>
>
>
> -- adrian
> >
>
> >>
> >>> Maybe later (when we have Bag O' DWARF) we can do that. & only do this
> when targeting lldb (on by default on Darwin, off by default elsewhere).
> >>>
> >>> & LLDB, once it's got the Fission support it'll need for this anyway,
> will fallback gracefully if these special modules are omitted.
> >>
> >> Sounds good to me!
> >>
> >> -- adrian
> >>
> >>>
> >>> - David
> >>>
> >>>
> >>>>
> >>>>>  (& have just a single, whole module in the pcm)?
> >>>>
> >>>> That’s probably not what you meant, but just to be sure: The pcm will
> always have the entire module with all submodules in it. But the debugger
> may choose to import only a subset of those.
> >>>>
> >>>>>
> >>>>> file referred to by whichever skeleton CU makes it into the binary:
> >>>>>
> >>>>> FooLib-XYZ.pcm
> >>>>> ~~~~~~~~~~~~~~
> >>>>>
> >>>>> .debug_info.dwo
> >>>>>  DW_TAG_compile_unit
> >>>>>    DW_AT_dwo_id(“0xFEDB9876”)
> >>>>>    ...
> >>>>>
> >>>>>  DW_TAG_type_unit (signature: 0xABCD1234)
> >>>>>    DW_TAG_module
> >>>>>      DW_AT_name(“FooLib”)
> >>>>>      ...
> >>>>>  DW_TAG_type_unit (signature: 0xCDEF3456)
> >>>>>    DW_TAG_module
> >>>>>      DW_AT_name(“FooLib”)
> >>>>>      DW_TAG_module
> >>>>>        DW_AT_name(“SubFoo”)
> >>>>>        ...
> >>>>>
> >>>>> So.. this should work as long as nobody points out that a module
> isn’t really a type.
> >>>>>
> >>>>> Yeah, probably worth waiting for "Bag O DWARF".
> >>>>>
> >>>>> For now, as you mentioned earlier, maybe just putting the
> imported_module and the module into the compile_unit when tuning for LLDB
> (so Darwin by default, and anywhere else where someone tunes for LLDB in
> the future) & leave them out otherwise.
> >>>>
> >>>> Sounds prefectly reasonable.
> >>>>>
> >>>>> Could you remind me why LLDB wants to know which modules are
> referenced from a CU? (rather than just all the modules used by a program
> overall?)
> >>>>
> >>>> LLDB uses clang for the expression evaluation. Traditionally it would
> look up a type in DWARF, build a clang AST out of it and then import it.
> With this it could directly import the clang modules and have access to
> everything in the module. But, clang modules are not namespaces, so modules
> can conflict (and that would probably manifest as a crash in libclang).
> >>>>
> >>>> What's an example of such a conflict? Is that valid (or is it just in
> ODR violations) - as mentioned above, it seems to me that only importing
> the things lexically available in this source file isn't what a debugger
> user would really want. I certainly think I'd trip over that a lot.
> >>>
> >>> Keep in mind that Objective-C (and C) do not have an ODR, so it’s not
> just “just” :-)
> >>> Being able to import modules does not mean that the debugger cannot
> still fall back to loading types from DWARF; in fact it will have to do
> that for all local types anyway.
> >>>
> >>> -- adrian
> >>>
> >>>>
> >>>> It therefore needs to know which modules are imported in the current
> CU before dropping into the expression evaluator.
> >>>>
> >>>> - adrian
> >>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Macho-O, in the absence of comdats, we have:
> >>>>>
> >>>>> bar.o
> >>>>> ~~~~~
> >>>>>
> >>>>> .debug_info:
> >>>>>   DW_TAG_compile_unit
> >>>>>     ...
> >>>>>     DW_TAG_imported_module
> >>>>>       DW_AT_import [DW_FORM_ref4] (0x20)
> >>>>>
> >>>>>     DW_TAG_module           // uniqued by dsymutil.
> >>>>>       DW_AT_name(“FooLib”)
> >>>>>       DW_AT_LLVM_sysroot(“/“)
> >>>>>       DW_AT_LLVM_include_dirs(“-I/path”)
> >>>>>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>>       ...
> >>>>>
> >>>>> // Split DWARF skeleton, thrown out by dsymutil.
> >>>>>
> >>>>> Thrown out? Because it's going to read everything in from the module
> and merge it in to a single linked debug info blob, I take it?
> >>>>>
> >>>>> .debug_info, group 0xFEDB9876, comdat
> >>>>>   DW_TAG_compile_unit
> >>>>>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>>>>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>>     ...
> >>>>>
> >>>>> FooLib-XYZ.pcm
> >>>>> ~~~~~~~~~~~~~~
> >>>>>
> >>>>> .debug_info:
> >>>>>   DW_TAG_compile_unit
> >>>>>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>>     ...
> >>>>>
> >>>>>     DW_TAG_module
> >>>>>       DW_AT_name(“FooLib”)
> >>>>>       DW_TAG_module
> >>>>>         DW_AT_name(“SubFoo”)
> >>>>>         ...
> >>>>>
> >>>>> -- adrian
> >>>>>
> >>>>> >
> >>>>> >>
> >>>>> >>>
> >>>>> >>> > If it turns out that's the right way to get a target for the
> imported_module, we could put both the skeleton CU and the partial unit in
> the same comdat and dedup them both together.
> >>>>> >>>
> >>>>> >>> I think this works as long as we only have one TAG_module per
> .pcm file (because we need to refer to it via signature).
> >>>>> >>>
> >>>>> >>> Not quite following here - why would we have more than one
> module per pcm - a pcm is a module, right?
> >>>>> >>
> >>>>> >> Clang modules may have submodules and a compile unit could import
> two submodules that live in the same .pcm file. For example on Darwin there
> is a module Darwin.pcm that contains a submodule “C" that contains the
> submodule “stdio".
> >>>>> >
> >>>>> > OK, so this bit's relevant to your use case in LLDB of loading the
> right things for the right context, but not relevant to the context-less
> debuggers like GDB that will just treat everything as one big namespace
> (except for file-local things, etc). So it's important for your imported
> modules but not for the basic Fission style debug reference.
> >>>>> >
> >>>>> > Well, maybe - I'm not sure what you're picturing in terms of the
> DWARF in the module for submodules? If you want that granularity we'll have
> to talk about how to split the DWARF in the module into chunks per
> submodule?
> >>>>> >
> >>>>> >>
> >>>>> >>>
> >>>>> >>> But if we don’t mind having duplicate dwo_* references in the
> same .o file this would also work with more than one TAG_module (or
> submodules).
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> .debug_info:
> >>>>> >>>  DW_TAG_compile_unit
> >>>>> >>>    DW_AT_name(“bar.c”)
> >>>>> >>>    ...
> >>>>> >>>
> >>>>> >>>    DW_TAG_imported_module // <- This could be optional on ELF.
> >>>>> >>>      DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876)
> >>>>> >>>
> >>>>> >>>    ...
> >>>>> >>>
> >>>>> >>> // Comdat’d split DWARF skeleton CU for the module Foo.
> >>>>> >>> .debug_info, group 0xFEDB9876, comdat
> >>>>> >>>  DW_TAG_compile_unit
> >>>>> >>>
> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>>>> >>>    DW_AT_dwo_id(“0xFEDB9876”)
> >>>>> >>>    ...
> >>>>> >>>
> >>>>> >>>    DW_TAG_module
> >>>>> >>>      DW_AT_name(“FooLib”)
> >>>>> >>>      DW_AT_LLVM_sysroot(“/“)
> >>>>> >>>      DW_AT_LLVM_include_dirs(“-I/path”)
> >>>>> >>>      DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>> >>>      ...
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> >
> >>>>> >>> > But this gets into complicated territory when the original
> binary is built with fission... which will be relevant for modules on ELF
> with LLDB. Hmm, maybe it's not too complicated - the partial_unit would end
> up in the .dwo file (maybe we'd have to teach the .dwo file to deduplicate
> these too - the same way it does for type units... - might require a new
> header to include the hash, etc :/)... would be tricky to have the dwp tool
> resolve the relocations to these things. Cross-unit references as you've
> got there aren't something that every DWARF consumer is totally cool with,
> I don't think?
> >>>>> >>>
> >>>>> >>> Ah. I thought the deduplication happens because all ELF sections
> sharing the same group are uniqued based on the group id.
> >>>>> >>>
> >>>>> >>> COMDAT groups deduplicate for a normal non-fission build, but
> fission linking doesn't require the .dwo file to use/contain COMDATs as it
> uses a DWARF-aware tool (so you don't bother putting the type units in
> COMDAT groups, for example - the fission linker knows how to parse
> debug_types, find the type unit headers and their hashes and deduplicates
> them that way).
> >>>>> >>
> >>>>> >> Ok that makes sense.
> >>>>> >>
> >>>>> >> -- adrian
> >>>>> >>
> >>>>> >>>
> >>>>> >>> It certainly would be nice if we could avoid introducing a new
> .debug_info header...
> >>>>> >>>
> >>>>> >>> >
> >>>>> >>> > Sort of inclined to have the imported module stuff just for
> LLDB, but I've lost some of the context for that in the ensuing weeks.
> >>>>> >>>
> >>>>> >>> -- adrian
> >>>>> >>>
> >>>>> >>> >
> >>>>> >>> >>
> >>>>> >>> >>
> >>>>> >>> >>
> >>>>> >>> >>
> >>>>> >>> >> MachO (no typeunits, no comdats, with imports)
> >>>>> >>> >> ----------------------------------------------
> >>>>> >>> >>
> >>>>> >>> >> Since we don’t have comdat sections in Mach-O and we don’t
> have the tool support for type units, the way that external types can be
> referenced necessarily needs to be a bit different. The design that Greg
> and I came up with for Mach-O relies on llvm-dsymutil to fix up the DWARF
> for non-module-aware consumers. Just as ELF DWARF consumers need not be
> able to tell the difference between module debugging an split DWARF, on
> Mach-O the .dSYM bundle generated by llvm-dsymutil looks like traditional
> DWARF.
> >>>>> >>> >>
> >>>>> >>> >> There are three differences in the DWARF output that make
> this possible:
> >>>>> >>> >>   - Refer to external types by UID rather than by type
> signature.
> >>>>> >>> >>     (This doubles as the key that allows a debugger to look
> import the type
> >>>>> >>> >>      directly from the AST and protects us against hash
> collisions)
> >>>>> >>> >>   - Add an index to the .o file that maps UID -> module file.
> >>>>> >>> >>     (Fast lookup + UIDs for C and ObjC are only unique within
> a module)
> >>>>> >>> >>   - Add an entry for each type’s UID to the types accelerator
> table.
> >>>>> >>> >>     (Fast lookup)
> >>>>> >>> >>
> >>>>> >>> >> bar.o
> >>>>> >>> >> ~~~~~
> >>>>> >>> >>
> >>>>> >>> >> .debug_info:
> >>>>> >>> >>   DW_TAG_compile_unit
> >>>>> >>> >>     DW_AT_name(“bar.c”)
> >>>>> >>> >>     DW_TAG_imported_module
> >>>>> >>> >>       DW_AT_import(DW_FORM_ref_addr 0x40)
> >>>>> >>> >>
> >>>>> >>> >>     DW_TAG_variable
> >>>>> >>> >>       DW_AT_name(“MyFoo”)
> >>>>> >>> >>       DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”)  // We could use
> a custom FORM here
> >>>>> >>> >>
> >>>>> >>> >>   // Skeleton unit.
> >>>>> >>> >>   DW_TAG_compile_unit
> >>>>> >>> >>
>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>> >>> >>     ...
> >>>>> >>> >> 0x40:
> >>>>> >>> >>     DW_TAG_module
> >>>>> >>> >>       DW_AT_name(“FooLib”)
> >>>>> >>> >>       DW_AT_LLVM_sysroot(“/“)
> >>>>> >>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
> >>>>> >>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>>>> >>> >>
> >>>>> >>> >> // This index uses the usual accelerator table format.
> >>>>> >>> >> .apple_exttypes:
> >>>>> >>> >> { “_ZTS3Foo” => debug_str offset of
> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }
> >>>>> >>> >>
> >>>>> >>> >> FooLib-XYZ.pcm
> >>>>> >>> >> ~~~~~~~~~~~~~~
> >>>>> >>> >>
> >>>>> >>> >> .debug_info
> >>>>> >>> >>   DW_TAG_compile_unit
> >>>>> >>> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>>>> >>> >>
> >>>>> >>> >> 0x80:
> >>>>> >>> >>   DW_TAG_structure_type
> >>>>> >>> >>     DW_AT_name (“Foo”)
> >>>>> >>> >>     DW_AT_signature
> >>>>> >>> >>     ...
> >>>>> >>> >>
> >>>>> >>> >> // In addition to the entry for “Foo”, there is also an entry
> for the type’s UID “_ZTS3Foo” pointing to the type definition DIE.
> >>>>> >>> >> .apple_types
> >>>>> >>> >> { “Foo” => 0x80 }
> >>>>> >>> >> { “_ZTS3Foo” => 0x80 }
> >>>>> >>> >>
> >>>>> >>> >>
> >>>>> >>> >>
> >>>>> >>> >> When the debug info linker (llvm-dsymutil) is run, it first
> pulls in the .debug_info section from the clang module and fixes up all the
> DW_FORM_strp external type references by turning them into a
> DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled
> in from the module. To find the correct type DIE it looks up the UID in the
> .apple_exttypes index, finds the module, looks up the UID in the regular
> .apple_types accelerator table and replaces the temporary DW_FROM_strp with
> a DW_FORM_ref_addr (which incidentally takes up the same amount of space in
> the DIE).
> >>>>> >>> >>
> >>>>> >>> >>
> >>>>> >>> >> Thoughts?
> >>>>> >>> >> --
> >>>>> >>> >> adrian
> >>>>> >>> >>
> >>>>> >>> >
> >>>>> >>>
> >>>>> >>
> >>>>> >>
> >>>>> >
> >>>
> >>>
> >>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150504/d1a4a452/attachment.html>