[PATCH] Have clang list the imported modules in the debug info

David Blaikie dblaikie at gmail.com
Fri May 1 09:56:06 PDT 2015


On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On May 1, 2015, at 9:23 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> > On Apr 30, 2015, at 4:55 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> >
>> >
>> >
>> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com>
>> wrote:
>> >>
>> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <aprantl at apple.com>
>> wrote:
>> >> >>
>> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>> >> >> > Beyond the above (that using a new tag would mean this would go
>> from 'free' to 'not free' for GDB) having a new top level tag is pretty
>> substantial (we only have two at the moment, and with our talk of modules
>> being a "bag of dwarf" might go back to having one top level tag? (it's not
>> clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag,
>> I don't think it is?)
>> >> >> >
>> >> >> >> The .debug_info section contains one or more compilation units,
>> partial units, or in DWARF 5, type units.  DW_TAG_module isn't a unit, if
>> you want it to be handled independently then it would need to be wrapped in
>> a DW_TAG_partial_unit.  You would probably then use DW_TAG_imported_unit to
>> refer to it, rather than DW_TAG_imported_module.
>> >> >> >>
>> >> >> >
>> >> >> > This makes a fair bit of sense - though the terminology's never
>> going to quite line up with modules, I suspect, and this would still
>> require modifying existing consumers (well, GDB) that can handle
>> split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe
>> that does work? - and still don't know how existing consumers would handle
>> imported_unit either - could be worth some testing, as it sounds sort of
>> right out of several less right options).
>> >> >>
>> >> >> Thanks for all the input so far!
>> >> >> To concretize this end of the discussion up let’s sketch some dwarf
>> of how this could look like in practice.
>> >> >>
>> >> >> ELF (no imports)
>> >> >> ----------------
>> >> >>
>> >> >> On ELF or COFF a foo.c referencing types from the module Foundation
>> looks like this:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>
>> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
>> >> >>   DW_TAG_partial_unit
>> >> >
>> >> > For now I'd suggest we use compile_unit - that way it'll just work
>> with existing split-dwarf consumers. We can see about standardizing a
>> top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps?
>> I'm not sure.
>> >> >
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >>
>> >> >> Side question: Is .debug_info.dwo the right section to put the
>> module skeleton in, or should it be a .debug_info section like normal
>> fission skeletons?
>> >> >
>> >> > Skeletons go in .debug_info, the dwo sections are just for the .dwo
>> file (or the module file, in our new case - the extension isn't actually
>> important).
>> >> >
>> >> > It might be worth you compiling an example or two of split-dwarf to
>> see how this all works hands-on.
>> >> >
>> >> >> Mach-O (no comdat, no imports)
>> >> >> ------------------------------
>> >> >>
>> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if
>> that option is the best discriminator) this could look like:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>   DW_TAG_partial_unit
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >>
>> >> >> Mach-O (no comdat, with imports)
>> >> >> ------------------------------
>> >> >>
>> >> >> If we add the module import information to this, we get:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>     DW_TAG_imported_module
>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x10)
>> >> >
>> >> > Since we got went down the tangent of explaining split-dwarf many
>> emails ago, I've forgotten (& can't readily find) what we were discussing
>> about what ways the imported_module could work.
>> >> >
>> >> > The simplest representation I can think of would be to have it
>> reference, by signature, the module unit (whatever tag it uses) -
>> DW_FORM_ref_sig8, seems the simplest thing to do.
>> >> >
>> >> >>
>> >> >>   DW_TAG_partial_unit
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >> 0x10:
>> >> >
>> >> > This is inside the partial unit? I figured we'd just put these
>> attributes on the top level (compile_unit, or whatever it might be later) -
>> potentially conditionalized on platform, sure.
>> >> >
>> >> >>     DW_TAG_module
>> >> >>       DW_AT_name(“Foundation”)
>> >> >>       DW_AT_LLVM_sysroot(“/“)
>> >> >>       DW_AT_LLVM_include_dir(“”)
>> >> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >> >>       ...
>> >> >>
>> >> >>
>> >> >> ELF (comdat, with imports)
>> >> >> --------------------------
>> >> >>
>> >> >> But now let’s go back to ELF. Since the skeleton with the partial
>> unit is comdat'd, I assume that this breaks the FORM_ref_addr used in the
>> DW_AT_import. We could reuse the module hash as a signature for the module:
>> >> >>
>> >> >> .debug_info:
>> >> >>   DW_TAG_compile_unit
>> >> >>     DW_AT_name(“foo.c”)
>> >> >>     DW_TAG_imported_module
>> >> >>       DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
>> >> >
>> >> > Still only really need these imported_modules for lldb, right? I'd
>> consider having them off-by-default for non-darwin, but I'm not strictly
>> wedded to that notion. Wouldn't mind seeing size impact numbers of some
>> kind - if it's really fractional % increase & GDB doesn't fall over when it
>> sees them (in whatever FORM/tag/etc we decide on) then that's not the end
>> of the world.
>> >> >
>> >> > Just seems nice if the default mode is the nice, standard,
>> split-dwarf output. Doesn't need anything fancy.
>> >> >
>> >> >
>> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat)
>> >> >>   DW_TAG_partial_unit
>> >> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >> >>
>> >> >>     DW_TAG_module
>> >> >>       DW_AT_signature(“0x1234ABCDE”)
>> >> >>       DW_AT_name(“Foundation”)
>> >> >
>> >> >
>> >> > The thing you haven't covered is the actual .dwo sections
>> (.debug_info.dwo (we'll probably need a simple stub compile_unit to make
>> this correct split-dwarf) and .debug_types.dwo being important - but all
>> the supporting .dwo sections will be necessary) that go in the module file.
>> >> >
>> >> >> This is bending the definition of DW_AT_signature, but I guess it
>> could be made to work. Or we could say that for now, users have to choose
>> between the comdat optimization and having the module imports recorded in
>> Dwarf, since GDB wouldn’t know what to do with that information anyway.
>> >>
>> >> Sorry for the long delay. Here’s a more complete example that should
>> include all the suggestions made so far. For context I also included
>> external type references in the example although admittedly this is a bit
>> out of scope for this thread:
>> >>
>> >> ELF (typeunits, comdats, with imports)
>> >> --------------------------------------
>> >>
>> >> On ELF or COFF a bar.c referencing type Foo from the module FooLib
>> looks like this:
>> >>
>> >> bar.o
>> >> ~~~~~
>> >>
>> >> // To keep this example focussed/readable, I'm assuming that bar.o
>> itself was not compiled with fission.
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“bar.c”)
>> >>     ...
>> >>
>> >>     DW_TAG_imported_module // <- This could be optional on ELF.
>> >>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
>> >>
>> >>     DW_TAG_variable
>> >>       DW_AT_name(“MyFoo”)
>> >>       DW_AT_type [DW_FORM_ref4] 0x20
>> >> 0x20:
>> >>     DW_TAG_structure_type
>> >>       DW_AT_declaration (true)
>> >>       DW_AT_signature [DW_FORM_ref_sig8] (0xF00)
>> >>
>> >>
>> >> // Split DWARF skeleton CU for the module Foo.
>> >>   DW_TAG_compile_unit
>> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>     ...
>> >>
>> >> // Comdat’d partial unit containing the optional module descriptor.
>> >> .debug_info, group 0xABCD1234, comdat
>> >>   DW_TAG_partial_unit
>> >>     DW_TAG_module
>> >>       DW_AT_name(“FooLib”)
>> >>       DW_AT_LLVM_sysroot(“/“)
>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >>       ...
>> >>
>> >> FooLib-XYZ.pcm
>> >> ~~~~~~~~~~~~~~
>> >>
>> >> .debug_info.dwo
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>     ...
>> >>
>> >> // Type unit for the type Foo.
>> >> .debug_types.dwo, group 0xF00, comdat
>> >>   DW_TAG_type_unit
>> >>     DW_TAG_structure_type
>> >>       DW_AT_name (“Foo”)
>> >>       ...
>> >>
>> >>
>> >> I think it awkward to have both the skeleton compile_unit in
>> .debug_info and the partial_unit containing the TAG_module. Personally I’d
>> prefer putting the TAG_module into the skeleton CU and then just refer to
>> it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat
>> section, it looks like that’s what’s necessary.
>> >
>> > It's been a while & I've probably lost all the context, but I think my
>> original theory was to have the skeleton compile_unit be comdat'd so they'd
>> deduplicate on linking (so we'd only have one reference to the module.dwo
>> in the linked binary). I don't recall there being a need for a separate
>> partial_unit - I imagine we'd just put the LLDB/LLVM extension attributes
>> on the skeleton compile_unit and expect debuggers that didn't understand
>> them, to ignore them.
>> >
>> > Was there some reason this didn't work/make sense? Because you need a
>> DW_TAG_module to import with DW_TAG_imported_module?
>> Using DW_TAG_module was the best practice that was recommended on
>> dwarf-discuss.
>>
>
> Did they have any ideas on how to reference it without duplicating it in
> every CU?
>
>
> We didn’t touch the deduplication issue.
>
> Once we've got the "Bag O Dwarf" stuff (rather than the narrower type
> units) this would be easier - (I suppose we could do a partial
> solution/abuse of type units - use a type unit header (perhaps with Eric's
> merged type/compile unit work) and a DW_FORM_ref_sig8 value for the
> DW_AT_module in the DW_TAG_imported_module.
>
> Though I suppose if we're going to have DW_TAG_imported_module in every CU
> that references a module, it might not be that big of a deal to include the
> DW_TAG_module itself there too... while I don't care about this scheme
> immediately, Google's growing LLDB investment in various platforms, so I am
> vaguely concerned about getting this right & it's not immediately obvious
> to me what that right answer is.
>
>
> Maybe the best path forward is to stage this by initially putting the
> DW_TAG_module into the main CU and leave the deduplication as an
> optimization to be implemented once the bag’o dwarf is more fleshed out.
> This way we won’t do anything that would confuse consumers (assuming they
> ignore unknown tags) and the extra overhead is likely not even going to be
> noticeable, since all the string attributes inside the TAG_module can
> already be deduplicated by traditional means.
>
>
>
>> > If it turns out that's the right way to get a target for the
>> imported_module, we could put both the skeleton CU and the partial unit in
>> the same comdat and dedup them both together.
>>
>> I think this works as long as we only have one TAG_module per .pcm file
>> (because we need to refer to it via signature).
>
>
> Not quite following here - why would we have more than one module per pcm
> - a pcm is a module, right?
>
>
> Clang modules may have submodules and a compile unit could import two
> submodules that live in the same .pcm file. For example on Darwin there is
> a module Darwin.pcm that contains a submodule “C" that contains the
> submodule “stdio".
>
>
>
>> But if we don’t mind having duplicate dwo_* references in the same .o
>> file this would also work with more than one TAG_module (or submodules).
>
>
>>
>> .debug_info:
>>  DW_TAG_compile_unit
>>    DW_AT_name(“bar.c”)
>>    ...
>>
>>    DW_TAG_imported_module // <- This could be optional on ELF.
>>      DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876)
>>
>>    ...
>>
>> // Comdat’d split DWARF skeleton CU for the module Foo.
>> .debug_info, group 0xFEDB9876, comdat
>>  DW_TAG_compile_unit
>>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>    DW_AT_dwo_id(“0xFEDB9876”)
>>    ...
>>
>>    DW_TAG_module
>>      DW_AT_name(“FooLib”)
>>      DW_AT_LLVM_sysroot(“/“)
>>      DW_AT_LLVM_include_dirs(“-I/path”)
>>      DW_AT_LLVM_macros(“-DNDEBUG”)
>>      ...
>>
>>
>> >
>> > But this gets into complicated territory when the original binary is
>> built with fission... which will be relevant for modules on ELF with LLDB.
>> Hmm, maybe it's not too complicated - the partial_unit would end up in the
>> .dwo file (maybe we'd have to teach the .dwo file to deduplicate these too
>> - the same way it does for type units... - might require a new header to
>> include the hash, etc :/)... would be tricky to have the dwp tool resolve
>> the relocations to these things. Cross-unit references as you've got there
>> aren't something that every DWARF consumer is totally cool with, I don't
>> think?
>>
>> Ah. I thought the deduplication happens because all ELF sections sharing
>> the same group are uniqued based on the group id.
>
>
> COMDAT groups deduplicate for a normal non-fission build, but fission
> linking doesn't require the .dwo file to use/contain COMDATs as it uses a
> DWARF-aware tool (so you don't bother putting the type units in COMDAT
> groups, for example - the fission linker knows how to parse debug_types,
> find the type unit headers and their hashes and deduplicates them that way).
>
>
> Ok that makes sense.
>
> -- adrian
>
>
>
>> It certainly would be nice if we could avoid introducing a new
>> .debug_info header...
>
>
>> >
>> > Sort of inclined to have the imported module stuff just for LLDB, but
>> I've lost some of the context for that in the ensuing weeks.
>>
>> -- adrian
>>
>> >
>> >>
>> >>
>> >>
>> >>
>> >> MachO (no typeunits, no comdats, with imports)
>> >> ----------------------------------------------
>> >>
>> >> Since we don’t have comdat sections in Mach-O and we don’t have the
>> tool support for type units, the way that external types can be referenced
>> necessarily needs to be a bit different. The design that Greg and I came up
>> with for Mach-O relies on llvm-dsymutil to fix up the DWARF for
>> non-module-aware consumers. Just as ELF DWARF consumers need not be able to
>> tell the difference between module debugging an split DWARF, on Mach-O the
>> .dSYM bundle generated by llvm-dsymutil looks like traditional DWARF.
>> >>
>> >> There are three differences in the DWARF output that make this
>> possible:
>> >>   - Refer to external types by UID rather than by type signature.
>> >>     (This doubles as the key that allows a debugger to look import the
>> type
>> >>      directly from the AST and protects us against hash collisions)
>> >>   - Add an index to the .o file that maps UID -> module file.
>> >>     (Fast lookup + UIDs for C and ObjC are only unique within a module)
>> >>   - Add an entry for each type’s UID to the types accelerator table.
>> >>     (Fast lookup)
>> >>
>> >> bar.o
>> >> ~~~~~
>> >>
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“bar.c”)
>> >>     DW_TAG_imported_module
>> >>       DW_AT_import(DW_FORM_ref_addr 0x40)
>> >>
>> >>     DW_TAG_variable
>> >>       DW_AT_name(“MyFoo”)
>> >>       DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”)  // We could use a custom
>> FORM here
>> >>
>> >>   // Skeleton unit.
>> >>   DW_TAG_compile_unit
>> >>
>>  DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>     ...
>> >> 0x40:
>> >>     DW_TAG_module
>> >>       DW_AT_name(“FooLib”)
>> >>       DW_AT_LLVM_sysroot(“/“)
>> >>       DW_AT_LLVM_include_dirs(“-I/path”)
>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >>
>> >> // This index uses the usual accelerator table format.
>> >> .apple_exttypes:
>> >> { “_ZTS3Foo” => debug_str offset of
>> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }
>> >>
>> >> FooLib-XYZ.pcm
>> >> ~~~~~~~~~~~~~~
>> >>
>> >> .debug_info
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_dwo_id(“0xFEDB9876”)
>> >>
>> >> 0x80:
>> >>   DW_TAG_structure_type
>> >>     DW_AT_name (“Foo”)
>> >>     DW_AT_signature
>> >>     ...
>> >>
>> >> // In addition to the entry for “Foo”, there is also an entry for the
>> type’s UID “_ZTS3Foo” pointing to the type definition DIE.
>> >> .apple_types
>> >> { “Foo” => 0x80 }
>> >> { “_ZTS3Foo” => 0x80 }
>> >>
>> >>
>> >>
>> >> When the debug info linker (llvm-dsymutil) is run, it first pulls in
>> the .debug_info section from the clang module and fixes up all the
>> DW_FORM_strp external type references by turning them into a
>> DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled
>> in from the module. To find the correct type DIE it looks up the UID in the
>> .apple_exttypes index, finds the module, looks up the UID in the regular
>> .apple_types accelerator table and replaces the temporary DW_FROM_strp with
>> a DW_FORM_ref_addr (which incidentally takes up the same amount of space in
>> the DIE).
>> >>
>> >>
>> >> Thoughts?
>> >> --
>> >> adrian
>> >>
>> >
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150501/f24d5528/attachment.html>


More information about the cfe-commits mailing list