[PATCH] Have clang list the imported modules in the debug info
Adrian Prantl
aprantl at apple.com
Thu Apr 30 16:31:29 PDT 2015
> On Mar 19, 2015, at 5:37 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>> > On Mar 16, 2015, at 2:55 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> >
>> >
>> >
>> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul <Paul_Robinson at playstation.sony.com> wrote:
>> > Beyond the above (that using a new tag would mean this would go from 'free' to 'not free' for GDB) having a new top level tag is pretty substantial (we only have two at the moment, and with our talk of modules being a "bag of dwarf" might go back to having one top level tag? (it's not clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag, I don't think it is?)
>> >
>> >> The .debug_info section contains one or more compilation units, partial units, or in DWARF 5, type units. DW_TAG_module isn't a unit, if you want it to be handled independently then it would need to be wrapped in a DW_TAG_partial_unit. You would probably then use DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module.
>> >>
>> >
>> > This makes a fair bit of sense - though the terminology's never going to quite line up with modules, I suspect, and this would still require modifying existing consumers (well, GDB) that can handle split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe that does work? - and still don't know how existing consumers would handle imported_unit either - could be worth some testing, as it sounds sort of right out of several less right options).
>>
>> Thanks for all the input so far!
>> To concretize this end of the discussion up let’s sketch some dwarf of how this could look like in practice.
>>
>> ELF (no imports)
>> ----------------
>>
>> On ELF or COFF a foo.c referencing types from the module Foundation looks like this:
>>
>> .debug_info:
>> DW_TAG_compile_unit
>> DW_AT_name(“foo.c”)
>>
>> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
>> DW_TAG_partial_unit
>
> For now I'd suggest we use compile_unit - that way it'll just work with existing split-dwarf consumers. We can see about standardizing a top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps? I'm not sure.
>
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> DW_AT_dwo_id(“0x1234ABCDE”)
>>
>>
>> Side question: Is .debug_info.dwo the right section to put the module skeleton in, or should it be a .debug_info section like normal fission skeletons?
>
> Skeletons go in .debug_info, the dwo sections are just for the .dwo file (or the module file, in our new case - the extension isn't actually important).
>
> It might be worth you compiling an example or two of split-dwarf to see how this all works hands-on.
>
>> Mach-O (no comdat, no imports)
>> ------------------------------
>>
>> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if that option is the best discriminator) this could look like:
>>
>> .debug_info:
>> DW_TAG_compile_unit
>> DW_AT_name(“foo.c”)
>> DW_TAG_partial_unit
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> DW_AT_dwo_id(“0x1234ABCDE”)
>>
>>
>> Mach-O (no comdat, with imports)
>> ------------------------------
>>
>> If we add the module import information to this, we get:
>>
>> .debug_info:
>> DW_TAG_compile_unit
>> DW_AT_name(“foo.c”)
>> DW_TAG_imported_module
>> DW_AT_import(DW_FORM_ref_addr 0x10)
>
> Since we got went down the tangent of explaining split-dwarf many emails ago, I've forgotten (& can't readily find) what we were discussing about what ways the imported_module could work.
>
> The simplest representation I can think of would be to have it reference, by signature, the module unit (whatever tag it uses) - DW_FORM_ref_sig8, seems the simplest thing to do.
>
>>
>> DW_TAG_partial_unit
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> DW_AT_dwo_id(“0x1234ABCDE”)
>>
>> 0x10:
>
> This is inside the partial unit? I figured we'd just put these attributes on the top level (compile_unit, or whatever it might be later) - potentially conditionalized on platform, sure.
>
>> DW_TAG_module
>> DW_AT_name(“Foundation”)
>> DW_AT_LLVM_sysroot(“/“)
>> DW_AT_LLVM_include_dir(“”)
>> DW_AT_LLVM_macros(“-DNDEBUG”)
>> ...
>>
>>
>> ELF (comdat, with imports)
>> --------------------------
>>
>> But now let’s go back to ELF. Since the skeleton with the partial unit is comdat'd, I assume that this breaks the FORM_ref_addr used in the DW_AT_import. We could reuse the module hash as a signature for the module:
>>
>> .debug_info:
>> DW_TAG_compile_unit
>> DW_AT_name(“foo.c”)
>> DW_TAG_imported_module
>> DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
>
> Still only really need these imported_modules for lldb, right? I'd consider having them off-by-default for non-darwin, but I'm not strictly wedded to that notion. Wouldn't mind seeing size impact numbers of some kind - if it's really fractional % increase & GDB doesn't fall over when it sees them (in whatever FORM/tag/etc we decide on) then that's not the end of the world.
>
> Just seems nice if the default mode is the nice, standard, split-dwarf output. Doesn't need anything fancy.
>
>
>> .debug_info.dwo (group 0x1234ABCDE, comdat)
>> DW_TAG_partial_unit
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> DW_AT_dwo_id(“0x1234ABCDE”)
>>
>> DW_TAG_module
>> DW_AT_signature(“0x1234ABCDE”)
>> DW_AT_name(“Foundation”)
>
>
> The thing you haven't covered is the actual .dwo sections (.debug_info.dwo (we'll probably need a simple stub compile_unit to make this correct split-dwarf) and .debug_types.dwo being important - but all the supporting .dwo sections will be necessary) that go in the module file.
>
>> This is bending the definition of DW_AT_signature, but I guess it could be made to work. Or we could say that for now, users have to choose between the comdat optimization and having the module imports recorded in Dwarf, since GDB wouldn’t know what to do with that information anyway.
Sorry for the long delay. Here’s a more complete example that should include all the suggestions made so far. For context I also included external type references in the example although admittedly this is a bit out of scope for this thread:
ELF (typeunits, comdats, with imports)
--------------------------------------
On ELF or COFF a bar.c referencing type Foo from the module FooLib looks like this:
bar.o
~~~~~
// To keep this example focussed/readable, I'm assuming that bar.o itself was not compiled with fission.
.debug_info:
DW_TAG_compile_unit
DW_AT_name(“bar.c”)
...
DW_TAG_imported_module // <- This could be optional on ELF.
DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
DW_TAG_variable
DW_AT_name(“MyFoo”)
DW_AT_type [DW_FORM_ref4] 0x20
0x20:
DW_TAG_structure_type
DW_AT_declaration (true)
DW_AT_signature [DW_FORM_ref_sig8] (0xF00)
// Split DWARF skeleton CU for the module Foo.
DW_TAG_compile_unit
DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
DW_AT_dwo_id(“0xFEDB9876”)
...
// Comdat’d partial unit containing the optional module descriptor.
.debug_info, group 0xABCD1234, comdat
DW_TAG_partial_unit
DW_TAG_module
DW_AT_name(“FooLib”)
DW_AT_LLVM_sysroot(“/“)
DW_AT_LLVM_include_dirs(“-I/path”)
DW_AT_LLVM_macros(“-DNDEBUG”)
...
FooLib-XYZ.pcm
~~~~~~~~~~~~~~
.debug_info.dwo
DW_TAG_compile_unit
DW_AT_dwo_id(“0xFEDB9876”)
...
// Type unit for the type Foo.
.debug_types.dwo, group 0xF00, comdat
DW_TAG_type_unit
DW_TAG_structure_type
DW_AT_name (“Foo”)
...
I think it awkward to have both the skeleton compile_unit in .debug_info and the partial_unit containing the TAG_module. Personally I’d prefer putting the TAG_module into the skeleton CU and then just refer to it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat section, it looks like that’s what’s necessary.
MachO (no typeunits, no comdats, with imports)
----------------------------------------------
Since we don’t have comdat sections in Mach-O and we don’t have the tool support for type units, the way that external types can be referenced necessarily needs to be a bit different. The design that Greg and I came up with for Mach-O relies on llvm-dsymutil to fix up the DWARF for non-module-aware consumers. Just as ELF DWARF consumers need not be able to tell the difference between module debugging an split DWARF, on Mach-O the .dSYM bundle generated by llvm-dsymutil looks like traditional DWARF.
There are three differences in the DWARF output that make this possible:
- Refer to external types by UID rather than by type signature.
(This doubles as the key that allows a debugger to look import the type
directly from the AST and protects us against hash collisions)
- Add an index to the .o file that maps UID -> module file.
(Fast lookup + UIDs for C and ObjC are only unique within a module)
- Add an entry for each type’s UID to the types accelerator table.
(Fast lookup)
bar.o
~~~~~
.debug_info:
DW_TAG_compile_unit
DW_AT_name(“bar.c”)
DW_TAG_imported_module
DW_AT_import(DW_FORM_ref_addr 0x40)
DW_TAG_variable
DW_AT_name(“MyFoo”)
DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”) // We could use a custom FORM here
// Skeleton unit.
DW_TAG_compile_unit
DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
DW_AT_dwo_id(“0xFEDB9876”)
...
0x40:
DW_TAG_module
DW_AT_name(“FooLib”)
DW_AT_LLVM_sysroot(“/“)
DW_AT_LLVM_include_dirs(“-I/path”)
DW_AT_LLVM_macros(“-DNDEBUG”)
// This index uses the usual accelerator table format.
.apple_exttypes:
{ “_ZTS3Foo” => debug_str offset of ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }
FooLib-XYZ.pcm
~~~~~~~~~~~~~~
.debug_info
DW_TAG_compile_unit
DW_AT_dwo_id(“0xFEDB9876”)
0x80:
DW_TAG_structure_type
DW_AT_name (“Foo”)
DW_AT_signature
...
// In addition to the entry for “Foo”, there is also an entry for the type’s UID “_ZTS3Foo” pointing to the type definition DIE.
.apple_types
{ “Foo” => 0x80 }
{ “_ZTS3Foo” => 0x80 }
When the debug info linker (llvm-dsymutil) is run, it first pulls in the .debug_info section from the clang module and fixes up all the DW_FORM_strp external type references by turning them into a DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled in from the module. To find the correct type DIE it looks up the UID in the .apple_exttypes index, finds the module, looks up the UID in the regular .apple_types accelerator table and replaces the temporary DW_FROM_strp with a DW_FORM_ref_addr (which incidentally takes up the same amount of space in the DIE).
Thoughts?
--
adrian
More information about the cfe-commits
mailing list