[PATCH] Have clang list the imported modules in the debug info

Adrian Prantl aprantl at apple.com
Fri May 1 09:52:05 PDT 2015


> On May 1, 2015, at 9:23 AM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> 
> 
> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
> 
> > On Apr 30, 2015, at 4:55 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
> >
> >
> >
> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
> >>
> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
> >> >
> >> >
> >> >
> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
> >> >>
> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul <Paul_Robinson at playstation.sony.com <mailto:Paul_Robinson at playstation.sony.com>> wrote:
> >> >> > Beyond the above (that using a new tag would mean this would go from 'free' to 'not free' for GDB) having a new top level tag is pretty substantial (we only have two at the moment, and with our talk of modules being a "bag of dwarf" might go back to having one top level tag? (it's not clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag, I don't think it is?)
> >> >> >
> >> >> >> The .debug_info section contains one or more compilation units, partial units, or in DWARF 5, type units.  DW_TAG_module isn't a unit, if you want it to be handled independently then it would need to be wrapped in a DW_TAG_partial_unit.  You would probably then use DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module.
> >> >> >>
> >> >> >
> >> >> > This makes a fair bit of sense - though the terminology's never going to quite line up with modules, I suspect, and this would still require modifying existing consumers (well, GDB) that can handle split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe that does work? - and still don't know how existing consumers would handle imported_unit either - could be worth some testing, as it sounds sort of right out of several less right options).
> >> >>
> >> >> Thanks for all the input so far!
> >> >> To concretize this end of the discussion up let’s sketch some dwarf of how this could look like in practice.
> >> >>
> >> >> ELF (no imports)
> >> >> ----------------
> >> >>
> >> >> On ELF or COFF a foo.c referencing types from the module Foundation looks like this:
> >> >>
> >> >> .debug_info:
> >> >>   DW_TAG_compile_unit
> >> >>     DW_AT_name(“foo.c”)
> >> >>
> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
> >> >>   DW_TAG_partial_unit
> >> >
> >> > For now I'd suggest we use compile_unit - that way it'll just work with existing split-dwarf consumers. We can see about standardizing a top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps? I'm not sure.
> >> >
> >> >>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >> >>
> >> >>
> >> >> Side question: Is .debug_info.dwo the right section to put the module skeleton in, or should it be a .debug_info section like normal fission skeletons?
> >> >
> >> > Skeletons go in .debug_info, the dwo sections are just for the .dwo file (or the module file, in our new case - the extension isn't actually important).
> >> >
> >> > It might be worth you compiling an example or two of split-dwarf to see how this all works hands-on.
> >> >
> >> >> Mach-O (no comdat, no imports)
> >> >> ------------------------------
> >> >>
> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if that option is the best discriminator) this could look like:
> >> >>
> >> >> .debug_info:
> >> >>   DW_TAG_compile_unit
> >> >>     DW_AT_name(“foo.c”)
> >> >>   DW_TAG_partial_unit
> >> >>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >> >>
> >> >>
> >> >> Mach-O (no comdat, with imports)
> >> >> ------------------------------
> >> >>
> >> >> If we add the module import information to this, we get:
> >> >>
> >> >> .debug_info:
> >> >>   DW_TAG_compile_unit
> >> >>     DW_AT_name(“foo.c”)
> >> >>     DW_TAG_imported_module
> >> >>       DW_AT_import(DW_FORM_ref_addr 0x10)
> >> >
> >> > Since we got went down the tangent of explaining split-dwarf many emails ago, I've forgotten (& can't readily find) what we were discussing about what ways the imported_module could work.
> >> >
> >> > The simplest representation I can think of would be to have it reference, by signature, the module unit (whatever tag it uses) - DW_FORM_ref_sig8, seems the simplest thing to do.
> >> >
> >> >>
> >> >>   DW_TAG_partial_unit
> >> >>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >> >>
> >> >> 0x10:
> >> >
> >> > This is inside the partial unit? I figured we'd just put these attributes on the top level (compile_unit, or whatever it might be later) - potentially conditionalized on platform, sure.
> >> >
> >> >>     DW_TAG_module
> >> >>       DW_AT_name(“Foundation”)
> >> >>       DW_AT_LLVM_sysroot(“/“)
> >> >>       DW_AT_LLVM_include_dir(“”)
> >> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >> >>       ...
> >> >>
> >> >>
> >> >> ELF (comdat, with imports)
> >> >> --------------------------
> >> >>
> >> >> But now let’s go back to ELF. Since the skeleton with the partial unit is comdat'd, I assume that this breaks the FORM_ref_addr used in the DW_AT_import. We could reuse the module hash as a signature for the module:
> >> >>
> >> >> .debug_info:
> >> >>   DW_TAG_compile_unit
> >> >>     DW_AT_name(“foo.c”)
> >> >>     DW_TAG_imported_module
> >> >>       DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
> >> >
> >> > Still only really need these imported_modules for lldb, right? I'd consider having them off-by-default for non-darwin, but I'm not strictly wedded to that notion. Wouldn't mind seeing size impact numbers of some kind - if it's really fractional % increase & GDB doesn't fall over when it sees them (in whatever FORM/tag/etc we decide on) then that's not the end of the world.
> >> >
> >> > Just seems nice if the default mode is the nice, standard, split-dwarf output. Doesn't need anything fancy.
> >> >
> >> >
> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat)
> >> >>   DW_TAG_partial_unit
> >> >>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
> >> >>     DW_AT_dwo_id(“0x1234ABCDE”)
> >> >>
> >> >>     DW_TAG_module
> >> >>       DW_AT_signature(“0x1234ABCDE”)
> >> >>       DW_AT_name(“Foundation”)
> >> >
> >> >
> >> > The thing you haven't covered is the actual .dwo sections (.debug_info.dwo (we'll probably need a simple stub compile_unit to make this correct split-dwarf) and .debug_types.dwo being important - but all the supporting .dwo sections will be necessary) that go in the module file.
> >> >
> >> >> This is bending the definition of DW_AT_signature, but I guess it could be made to work. Or we could say that for now, users have to choose between the comdat optimization and having the module imports recorded in Dwarf, since GDB wouldn’t know what to do with that information anyway.
> >>
> >> Sorry for the long delay. Here’s a more complete example that should include all the suggestions made so far. For context I also included external type references in the example although admittedly this is a bit out of scope for this thread:
> >>
> >> ELF (typeunits, comdats, with imports)
> >> --------------------------------------
> >>
> >> On ELF or COFF a bar.c referencing type Foo from the module FooLib looks like this:
> >>
> >> bar.o
> >> ~~~~~
> >>
> >> // To keep this example focussed/readable, I'm assuming that bar.o itself was not compiled with fission.
> >> .debug_info:
> >>   DW_TAG_compile_unit
> >>     DW_AT_name(“bar.c”)
> >>     ...
> >>
> >>     DW_TAG_imported_module // <- This could be optional on ELF.
> >>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
> >>
> >>     DW_TAG_variable
> >>       DW_AT_name(“MyFoo”)
> >>       DW_AT_type [DW_FORM_ref4] 0x20
> >> 0x20:
> >>     DW_TAG_structure_type
> >>       DW_AT_declaration (true)
> >>       DW_AT_signature [DW_FORM_ref_sig8] (0xF00)
> >>
> >>
> >> // Split DWARF skeleton CU for the module Foo.
> >>   DW_TAG_compile_unit
> >>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>     ...
> >>
> >> // Comdat’d partial unit containing the optional module descriptor.
> >> .debug_info, group 0xABCD1234, comdat
> >>   DW_TAG_partial_unit
> >>     DW_TAG_module
> >>       DW_AT_name(“FooLib”)
> >>       DW_AT_LLVM_sysroot(“/“)
> >>       DW_AT_LLVM_include_dirs(“-I/path”)
> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>       ...
> >>
> >> FooLib-XYZ.pcm
> >> ~~~~~~~~~~~~~~
> >>
> >> .debug_info.dwo
> >>   DW_TAG_compile_unit
> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>     ...
> >>
> >> // Type unit for the type Foo.
> >> .debug_types.dwo, group 0xF00, comdat
> >>   DW_TAG_type_unit
> >>     DW_TAG_structure_type
> >>       DW_AT_name (“Foo”)
> >>       ...
> >>
> >>
> >> I think it awkward to have both the skeleton compile_unit in .debug_info and the partial_unit containing the TAG_module. Personally I’d prefer putting the TAG_module into the skeleton CU and then just refer to it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat section, it looks like that’s what’s necessary.
> >
> > It's been a while & I've probably lost all the context, but I think my original theory was to have the skeleton compile_unit be comdat'd so they'd deduplicate on linking (so we'd only have one reference to the module.dwo in the linked binary). I don't recall there being a need for a separate partial_unit - I imagine we'd just put the LLDB/LLVM extension attributes on the skeleton compile_unit and expect debuggers that didn't understand them, to ignore them.
> >
> > Was there some reason this didn't work/make sense? Because you need a DW_TAG_module to import with DW_TAG_imported_module?
> Using DW_TAG_module was the best practice that was recommended on dwarf-discuss.
> 
> Did they have any ideas on how to reference it without duplicating it in every CU?

We didn’t touch the deduplication issue.

> Once we've got the "Bag O Dwarf" stuff (rather than the narrower type units) this would be easier - (I suppose we could do a partial solution/abuse of type units - use a type unit header (perhaps with Eric's merged type/compile unit work) and a DW_FORM_ref_sig8 value for the DW_AT_module in the DW_TAG_imported_module.
> 
> Though I suppose if we're going to have DW_TAG_imported_module in every CU that references a module, it might not be that big of a deal to include the DW_TAG_module itself there too... while I don't care about this scheme immediately, Google's growing LLDB investment in various platforms, so I am vaguely concerned about getting this right & it's not immediately obvious to me what that right answer is.

Maybe the best path forward is to stage this by initially putting the DW_TAG_module into the main CU and leave the deduplication as an optimization to be implemented once the bag’o dwarf is more fleshed out. This way we won’t do anything that would confuse consumers (assuming they ignore unknown tags) and the extra overhead is likely not even going to be noticeable, since all the string attributes inside the TAG_module can already be deduplicated by traditional means.

>  
> > If it turns out that's the right way to get a target for the imported_module, we could put both the skeleton CU and the partial unit in the same comdat and dedup them both together.
> 
> I think this works as long as we only have one TAG_module per .pcm file (because we need to refer to it via signature).
> 
> Not quite following here - why would we have more than one module per pcm - a pcm is a module, right?

Clang modules may have submodules and a compile unit could import two submodules that live in the same .pcm file. For example on Darwin there is a module Darwin.pcm that contains a submodule “C" that contains the submodule “stdio".

>  
> But if we don’t mind having duplicate dwo_* references in the same .o file this would also work with more than one TAG_module (or submodules).
> 
> 
> .debug_info:
>  DW_TAG_compile_unit
>    DW_AT_name(“bar.c”)
>    ...
> 
>    DW_TAG_imported_module // <- This could be optional on ELF.
>      DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876)
> 
>    ...
> 
> // Comdat’d split DWARF skeleton CU for the module Foo.
> .debug_info, group 0xFEDB9876, comdat
>  DW_TAG_compile_unit
>    DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>    DW_AT_dwo_id(“0xFEDB9876”)
>    ...
> 
>    DW_TAG_module
>      DW_AT_name(“FooLib”)
>      DW_AT_LLVM_sysroot(“/“)
>      DW_AT_LLVM_include_dirs(“-I/path”)
>      DW_AT_LLVM_macros(“-DNDEBUG”)
>      ...
> 
> 
> >
> > But this gets into complicated territory when the original binary is built with fission... which will be relevant for modules on ELF with LLDB. Hmm, maybe it's not too complicated - the partial_unit would end up in the .dwo file (maybe we'd have to teach the .dwo file to deduplicate these too - the same way it does for type units... - might require a new header to include the hash, etc :/)... would be tricky to have the dwp tool resolve the relocations to these things. Cross-unit references as you've got there aren't something that every DWARF consumer is totally cool with, I don't think?
> 
> Ah. I thought the deduplication happens because all ELF sections sharing the same group are uniqued based on the group id.
> 
> COMDAT groups deduplicate for a normal non-fission build, but fission linking doesn't require the .dwo file to use/contain COMDATs as it uses a DWARF-aware tool (so you don't bother putting the type units in COMDAT groups, for example - the fission linker knows how to parse debug_types, find the type unit headers and their hashes and deduplicates them that way).

Ok that makes sense.

-- adrian

>  
> It certainly would be nice if we could avoid introducing a new .debug_info header... 
> 
> >
> > Sort of inclined to have the imported module stuff just for LLDB, but I've lost some of the context for that in the ensuing weeks.
> 
> -- adrian
> 
> >
> >>
> >>
> >>
> >>
> >> MachO (no typeunits, no comdats, with imports)
> >> ----------------------------------------------
> >>
> >> Since we don’t have comdat sections in Mach-O and we don’t have the tool support for type units, the way that external types can be referenced necessarily needs to be a bit different. The design that Greg and I came up with for Mach-O relies on llvm-dsymutil to fix up the DWARF for non-module-aware consumers. Just as ELF DWARF consumers need not be able to tell the difference between module debugging an split DWARF, on Mach-O the .dSYM bundle generated by llvm-dsymutil looks like traditional DWARF.
> >>
> >> There are three differences in the DWARF output that make this possible:
> >>   - Refer to external types by UID rather than by type signature.
> >>     (This doubles as the key that allows a debugger to look import the type
> >>      directly from the AST and protects us against hash collisions)
> >>   - Add an index to the .o file that maps UID -> module file.
> >>     (Fast lookup + UIDs for C and ObjC are only unique within a module)
> >>   - Add an entry for each type’s UID to the types accelerator table.
> >>     (Fast lookup)
> >>
> >> bar.o
> >> ~~~~~
> >>
> >> .debug_info:
> >>   DW_TAG_compile_unit
> >>     DW_AT_name(“bar.c”)
> >>     DW_TAG_imported_module
> >>       DW_AT_import(DW_FORM_ref_addr 0x40)
> >>
> >>     DW_TAG_variable
> >>       DW_AT_name(“MyFoo”)
> >>       DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”)  // We could use a custom FORM here
> >>
> >>   // Skeleton unit.
> >>   DW_TAG_compile_unit
> >>     DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>     ...
> >> 0x40:
> >>     DW_TAG_module
> >>       DW_AT_name(“FooLib”)
> >>       DW_AT_LLVM_sysroot(“/“)
> >>       DW_AT_LLVM_include_dirs(“-I/path”)
> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
> >>
> >> // This index uses the usual accelerator table format.
> >> .apple_exttypes:
> >> { “_ZTS3Foo” => debug_str offset of ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }
> >>
> >> FooLib-XYZ.pcm
> >> ~~~~~~~~~~~~~~
> >>
> >> .debug_info
> >>   DW_TAG_compile_unit
> >>     DW_AT_dwo_id(“0xFEDB9876”)
> >>
> >> 0x80:
> >>   DW_TAG_structure_type
> >>     DW_AT_name (“Foo”)
> >>     DW_AT_signature
> >>     ...
> >>
> >> // In addition to the entry for “Foo”, there is also an entry for the type’s UID “_ZTS3Foo” pointing to the type definition DIE.
> >> .apple_types
> >> { “Foo” => 0x80 }
> >> { “_ZTS3Foo” => 0x80 }
> >>
> >>
> >>
> >> When the debug info linker (llvm-dsymutil) is run, it first pulls in the .debug_info section from the clang module and fixes up all the DW_FORM_strp external type references by turning them into a DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled in from the module. To find the correct type DIE it looks up the UID in the .apple_exttypes index, finds the module, looks up the UID in the regular .apple_types accelerator table and replaces the temporary DW_FROM_strp with a DW_FORM_ref_addr (which incidentally takes up the same amount of space in the DIE).
> >>
> >>
> >> Thoughts?
> >> --
> >> adrian
> >>
> >
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150501/6c7fd142/attachment.html>


More information about the cfe-commits mailing list