[PATCH] Have clang list the imported modules in the debug info

David Blaikie dblaikie at gmail.com
Fri May 1 10:01:44 PDT 2015


On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On May 1, 2015, at 9:23 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> > On Apr 30, 2015, at 4:55 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> >
>> >
>> >
>> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com>
>> wrote:
>> >>
>> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <aprantl at apple.com>
>> wrote:
>> >> >>
>> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>> >> >> > Beyond the above (that using a new tag would mean this would go
>> from 'free' to 'not free' for GDB) having a new top level tag is pretty
>> substantial (we only have two at the moment, and with our talk of modules
>> being a "bag of dwarf" might go back to having one top level tag? (it's not
>> clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag,
>> I don't think it is?)
>> >> >> >
>> >> >> >> The .debug_info section contains one or more compilation units,
>> partial units, or in DWARF 5, type units.  DW_TAG_module isn't a unit, if
>> you want it to be handled independently then it would need to be wrapped in
>> a DW_TAG_partial_unit.  You would probably then use DW_TAG_imported_unit to
>> refer to it, rather than DW_TAG_imported_module.
>> >> >> >>
>> >> >> >
>> >> >> > This makes a fair bit of sense - though the terminology's never
>> going to quite line up with modules, I suspect, and this would still
>> require modifying existing consumers (well, GDB) that can handle
>> split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe
>> that does work? - and still don't know how existing consumers would handle
>> imported_unit either - could be worth some testing, as it sounds sort of
>> right out of several less right options).
>> >> >>
>> >> >> Thanks for all the input so far!
>> >> >> To concretize this end of the discussion up let’s sketch some dwarf
>> of how this could look like in practice.
>> >> >>
>> >> >> ELF (no imports)
>> >> >> ----------------
>> >> >>
>> >> >> On ELF or COFF a foo.c referencing types from the module Foundation
>> looks like this:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>
>> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
>> >> >>   DW_TAG_partial_unit
>> >> >
>> >> > For now I'd suggest we use compile_unit - that way it'll just work
>> with existing split-dwarf consumers. We can see about standardizing a
>> top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps?
>> I'm not sure.
>> >> >
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >>
>> >> >> Side question: Is .debug_info.dwo the right section to put the
>> module skeleton in, or should it be a .debug_info section like normal
>> fission skeletons?
>> >> >
>> >> > Skeletons go in .debug_info, the dwo sections are just for the .dwo
>> file (or the module file, in our new case - the extension isn't actually
>> important).
>> >> >
>> >> > It might be worth you compiling an example or two of split-dwarf to
>> see how this all works hands-on.
>> >> >
>> >> >> Mach-O (no comdat, no imports)
>> >> >> ------------------------------
>> >> >>
>> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if
>> that option is the best discriminator) this could look like:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>   DW_TAG_partial_unit
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >>
>> >> >> Mach-O (no comdat, with imports)
>> >> >> ------------------------------
>> >> >>
>> >> >> If we add the module import information to this, we get:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>     DW_TAG_imported_module
>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x10)
>> >> >
>> >> > Since we got went down the tangent of explaining split-dwarf many
>> emails ago, I've forgotten (& can't readily find) what we were discussing
>> about what ways the imported_module could work.
>> >> >
>> >> > The simplest representation I can think of would be to have it
>> reference, by signature, the module unit (whatever tag it uses) -
>> DW_FORM_ref_sig8, seems the simplest thing to do.
>> >> >
>> >> >>
>> >> >>   DW_TAG_partial_unit
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >> 0x10:
>> >> >
>> >> > This is inside the partial unit? I figured we'd just put these
>> attributes on the top level (compile_unit, or whatever it might be later) -
>> potentially conditionalized on platform, sure.
>> >> >
>> >> >>     DW_TAG_module
>> >> >>       DW_AT_name(“Foundation”)
>> >> >>       DW_AT_LLVM_sysroot(“/“)
>> >> >>       DW_AT_LLVM_include_dir(“”)
>> >> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >> >>       ...
>> >> >>
>> >> >>
>> >> >> ELF (comdat, with imports)
>> >> >> --------------------------
>> >> >>
>> >> >> But now let’s go back to ELF. Since the skeleton with the partial
>> unit is comdat'd, I assume that this breaks the FORM_ref_addr used in the
>> DW_AT_import. We could reuse the module hash as a signature for the module:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>     DW_TAG_imported_module
>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
>> >> >
>> >> > Still only really need these imported_modules for lldb, right? I'd
>> consider having them off-by-default for non-darwin, but I'm not strictly
>> wedded to that notion. Wouldn't mind seeing size impact numbers of some
>> kind - if it's really fractional % increase & GDB doesn't fall over when it
>> sees them (in whatever FORM/tag/etc we decide on) then that's not the end
>> of the world.
>> >> >
>> >> > Just seems nice if the default mode is the nice, standard,
>> split-dwarf output. Doesn't need anything fancy.
>> >> >
>> >> >
>> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat)
>> >> >>   DW_TAG_partial_unit
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >>     DW_TAG_module
>> >> >>       DW_AT_signature(“0x1234ABCDE”)
>> >> >>       DW_AT_name(“Foundation”)
>> >> >
>> >> >
>> >> > The thing you haven't covered is the actual .dwo sections
>> (.debug_info.dwo (we'll probably need a simple stub compile_unit to make
>> this correct split-dwarf) and .debug_types.dwo being important - but all
>> the supporting .dwo sections will be necessary) that go in the module file.
>> >> >
>> >> >> This is bending the definition of DW_AT_signature, but I guess it
>> could be made to work. Or we could say that for now, users have to choose
>> between the comdat optimization and having the module imports recorded in
>> Dwarf, since GDB wouldn’t know what to do with that information anyway.
>> >>
>> >> Sorry for the long delay. Here’s a more complete example that should
>> include all the suggestions made so far. For context I also included
>> external type references in the example although admittedly this is a bit
>> out of scope for this thread:
>> >>
>> >> ELF (typeunits, comdats, with imports)
>> >> --------------------------------------
>> >>
>> >> On ELF or COFF a bar.c referencing type Foo from the module FooLib
>> looks like this:
>> >>
>> >> bar.o
>> >> ~~~~~
>> >>
>> >> // To keep this example focussed/readable, I'm assuming that bar.o
>> itself was not compiled with fission.
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“bar.c”)
>> >>     ...
>> >>
>> >>     DW_TAG_imported_module // <- This could be optional on ELF.
>> >>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
>> >>
>> >>     DW_TAG_variable
>> >>       DW_AT_name(“MyFoo”)
>> >>       DW_AT_type [DW_FORM_ref4] 0x20
>> >> 0x20:
>> >>     DW_TAG_structure_type
>> >>       DW_AT_declaration (true)
>> >>       DW_AT_signature [DW_FORM_ref_sig8] (0xF00)
>> >>
>> >>
>> >> // Split DWARF skeleton CU for the module Foo.
>> >>   DW_TAG_compile_unit
>> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>     ...
>> >>
>> >> // Comdat’d partial unit containing the optional module descriptor.
>> >> .debug_info, group 0xABCD1234, comdat
>> >>   DW_TAG_partial_unit
>> >>     DW_TAG_module
>> >>       DW_AT_name(“FooLib”)
>> >>       DW_AT_LLVM_sysroot(“/“)
>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >>       ...
>> >>
>> >> FooLib-XYZ.pcm
>> >> ~~~~~~~~~~~~~~
>> >>
>> >> .debug_info.dwo
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>     ...
>> >>
>> >> // Type unit for the type Foo.
>> >> .debug_types.dwo, group 0xF00, comdat
>> >>   DW_TAG_type_unit
>> >>     DW_TAG_structure_type
>> >>       DW_AT_name (“Foo”)
>> >>       ...
>> >>
>> >>
>> >> I think it awkward to have both the skeleton compile_unit in
>> .debug_info and the partial_unit containing the TAG_module. Personally I’d
>> prefer putting the TAG_module into the skeleton CU and then just refer to
>> it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat
>> section, it looks like that’s what’s necessary.
>> >
>> > It's been a while & I've probably lost all the context, but I think my
>> original theory was to have the skeleton compile_unit be comdat'd so they'd
>> deduplicate on linking (so we'd only have one reference to the module.dwo
>> in the linked binary). I don't recall there being a need for a separate
>> partial_unit - I imagine we'd just put the LLDB/LLVM extension attributes
>> on the skeleton compile_unit and expect debuggers that didn't understand
>> them, to ignore them.
>> >
>> > Was there some reason this didn't work/make sense? Because you need a
>> DW_TAG_module to import with DW_TAG_imported_module?
>> Using DW_TAG_module was the best practice that was recommended on
>> dwarf-discuss.
>>
>
> Did they have any ideas on how to reference it without duplicating it in
> every CU?
>
>
> We didn’t touch the deduplication issue.
>
> Once we've got the "Bag O Dwarf" stuff (rather than the narrower type
> units) this would be easier - (I suppose we could do a partial
> solution/abuse of type units - use a type unit header (perhaps with Eric's
> merged type/compile unit work) and a DW_FORM_ref_sig8 value for the
> DW_AT_module in the DW_TAG_imported_module.
>
> Though I suppose if we're going to have DW_TAG_imported_module in every CU
> that references a module, it might not be that big of a deal to include the
> DW_TAG_module itself there too... while I don't care about this scheme
> immediately, Google's growing LLDB investment in various platforms, so I am
> vaguely concerned about getting this right & it's not immediately obvious
> to me what that right answer is.
>
>
> Maybe the best path forward is to stage this by initially putting the
> DW_TAG_module into the main CU and leave the deduplication as an
> optimization to be implemented once the bag’o dwarf is more fleshed out.
> This way we won’t do anything that would confuse consumers (assuming they
> ignore unknown tags) and the extra overhead is likely not even going to be
> noticeable, since all the string attributes inside the TAG_module can
> already be deduplicated by traditional means.
>

Perhaps. I'd still like to think through/document what this looks like a
bit more. Where the data ends up, what it's used for, etc. Sorry to draw
this out.

:/ *ponders*


>
>
>
>> > If it turns out that's the right way to get a target for the
>> imported_module, we could put both the skeleton CU and the partial unit in
>> the same comdat and dedup them both together.
>>
>> I think this works as long as we only have one TAG_module per .pcm file
>> (because we need to refer to it via signature).
>
>
> Not quite following here - why would we have more than one module per pcm
> - a pcm is a module, right?
>
>
> Clang modules may have submodules and a compile unit could import two
> submodules that live in the same .pcm file. For example on Darwin there is
> a module Darwin.pcm that contains a submodule “C" that contains the
> submodule “stdio".
>

OK, so this bit's relevant to your use case in LLDB of loading the right
things for the right context, but not relevant to the context-less
debuggers like GDB that will just treat everything as one big namespace
(except for file-local things, etc). So it's important for your imported
modules but not for the basic Fission style debug reference.

Well, maybe - I'm not sure what you're picturing in terms of the DWARF in
the module for submodules? If you want that granularity we'll have to talk
about how to split the DWARF in the module into chunks per submodule?


>
>
>
>> But if we don’t mind having duplicate dwo_* references in the same .o
>> file this would also work with more than one TAG_module (or submodules).
>
>
>>
>> .debug_info:
>>  DW_TAG_compile_unit
>>    DW_AT_name(“bar.c”)
>>    ...
>>
>>    DW_TAG_imported_module // <- This could be optional on ELF.
>>      DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876)
>>
>>    ...
>>
>> // Comdat’d split DWARF skeleton CU for the module Foo.
>> .debug_info, group 0xFEDB9876, comdat
>>  DW_TAG_compile_unit
>>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>    DW_AT_dwo_id(“0xFEDB9876”)
>>    ...
>>
>>    DW_TAG_module
>>      DW_AT_name(“FooLib”)
>>      DW_AT_LLVM_sysroot(“/“)
>>      DW_AT_LLVM_include_dirs(“-I/path”)
>>      DW_AT_LLVM_macros(“-DNDEBUG”)
>>      ...
>>
>>
>> >
>> > But this gets into complicated territory when the original binary is
>> built with fission... which will be relevant for modules on ELF with LLDB.
>> Hmm, maybe it's not too complicated - the partial_unit would end up in the
>> .dwo file (maybe we'd have to teach the .dwo file to deduplicate these too
>> - the same way it does for type units... - might require a new header to
>> include the hash, etc :/)... would be tricky to have the dwp tool resolve
>> the relocations to these things. Cross-unit references as you've got there
>> aren't something that every DWARF consumer is totally cool with, I don't
>> think?
>>
>> Ah. I thought the deduplication happens because all ELF sections sharing
>> the same group are uniqued based on the group id.
>
>
> COMDAT groups deduplicate for a normal non-fission build, but fission
> linking doesn't require the .dwo file to use/contain COMDATs as it uses a
> DWARF-aware tool (so you don't bother putting the type units in COMDAT
> groups, for example - the fission linker knows how to parse debug_types,
> find the type unit headers and their hashes and deduplicates them that way).
>
>
> Ok that makes sense.
>
> -- adrian
>
>
>
>> It certainly would be nice if we could avoid introducing a new
>> .debug_info header...
>
>
>> >
>> > Sort of inclined to have the imported module stuff just for LLDB, but
>> I've lost some of the context for that in the ensuing weeks.
>>
>> -- adrian
>>
>> >
>> >>
>> >>
>> >>
>> >>
>> >> MachO (no typeunits, no comdats, with imports)
>> >> ----------------------------------------------
>> >>
>> >> Since we don’t have comdat sections in Mach-O and we don’t have the
>> tool support for type units, the way that external types can be referenced
>> necessarily needs to be a bit different. The design that Greg and I came up
>> with for Mach-O relies on llvm-dsymutil to fix up the DWARF for
>> non-module-aware consumers. Just as ELF DWARF consumers need not be able to
>> tell the difference between module debugging an split DWARF, on Mach-O the
>> .dSYM bundle generated by llvm-dsymutil looks like traditional DWARF.
>> >>
>> >> There are three differences in the DWARF output that make this
>> possible:
>> >>   - Refer to external types by UID rather than by type signature.
>> >>     (This doubles as the key that allows a debugger to look import the
>> type
>> >>      directly from the AST and protects us against hash collisions)
>> >>   - Add an index to the .o file that maps UID -> module file.
>> >>     (Fast lookup + UIDs for C and ObjC are only unique within a module)
>> >>   - Add an entry for each type’s UID to the types accelerator table.
>> >>     (Fast lookup)
>> >>
>> >> bar.o
>> >> ~~~~~
>> >>
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“bar.c”)
>> >>     DW_TAG_imported_module
>> >>       DW_AT_import(DW_FORM_ref_addr 0x40)
>> >>
>> >>     DW_TAG_variable
>> >>       DW_AT_name(“MyFoo”)
>> >>       DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”)  // We could use a custom
>> FORM here
>> >>
>> >>   // Skeleton unit.
>> >>   DW_TAG_compile_unit
>> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>     ...
>> >> 0x40:
>> >>     DW_TAG_module
>> >>       DW_AT_name(“FooLib”)
>> >>       DW_AT_LLVM_sysroot(“/“)
>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >>
>> >> // This index uses the usual accelerator table format.
>> >> .apple_exttypes:
>> >> { “_ZTS3Foo” => debug_str offset of
>> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }
>> >>
>> >> FooLib-XYZ.pcm
>> >> ~~~~~~~~~~~~~~
>> >>
>> >> .debug_info
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>
>> >> 0x80:
>> >>   DW_TAG_structure_type
>> >>     DW_AT_name (“Foo”)
>> >>     DW_AT_signature
>> >>     ...
>> >>
>> >> // In addition to the entry for “Foo”, there is also an entry for the
>> type’s UID “_ZTS3Foo” pointing to the type definition DIE.
>> >> .apple_types
>> >> { “Foo” => 0x80 }
>> >> { “_ZTS3Foo” => 0x80 }
>> >>
>> >>
>> >>
>> >> When the debug info linker (llvm-dsymutil) is run, it first pulls in
>> the .debug_info section from the clang module and fixes up all the
>> DW_FORM_strp external type references by turning them into a
>> DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled
>> in from the module. To find the correct type DIE it looks up the UID in the
>> .apple_exttypes index, finds the module, looks up the UID in the regular
>> .apple_types accelerator table and replaces the temporary DW_FROM_strp with
>> a DW_FORM_ref_addr (which incidentally takes up the same amount of space in
>> the DIE).
>> >>
>> >>
>> >> Thoughts?
>> >> --
>> >> adrian
>> >>
>> >
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150501/4ad3eb79/attachment.html>


More information about the cfe-commits mailing list