[PATCH] Have clang list the imported modules in the debug info

Tue May 5 09:59:37 PDT 2015

On Mon, May 4, 2015 at 4:46 PM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On May 4, 2015, at 1:31 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
> ...
>
> >>> So you're going to need to implement fission (to at least some degree)
>> support in LLDB, then? (to support the case where you haven't linked debug
>> info with llvm-dsymutil, but you've hit one of these lookup problems where
>> you need to cross possibly-conflicting modules)
>> >>
>> >> Yes. Specifically, it won’t support type units, and it will look up
>> types by name rather than by signature. (cf. the second part of
>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html
>> )
>> >
>> > How are you going to reference the types in the module's fission CU
>> without type units/signatures? Are you going to emit type declarations into
>> the normal CU and rely on the debugger to know that these declarations can
>> be resolved by looking elsewhere? (just without the benefit of constraining
>> that search to just looking for a matching TU?)
>>
>> If you look at the example in
>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html,
>> there will be an external type index (using the usual accelerator table
>> format) that maps an external type’s UID to a pcm. In the pcm there is an
>> extra accelerator table entry that maps UID to DIE offset.
>>
>
> I mean I guess that's up to you, but seems like a relatively large
> workaround compared to supporting type units... (I mean certainly seems
> like strictly less work to do the workaround than implementing type units
> in LLDB, but a relatively large amount of work to do/throw away eventually
> once LLDB supports type units)
>
>
> It’s not primarily meant to be a workaround so LLDB doesn’t need to
> implement type units; the UIDs double as the key (decl context + name) to
> import types directly from the AST. The other advantage is that we won’t
> have to worry about MD5 hash collisions, but that’s more of a theoretical
> advantage.
>

If there's a concern about collisions, we should fix that regardless
(because we'll have the same problem with normal type units or modularized
type units).

If we've already got (& have to use in the case of DWARF fallback for
modules) to support a hash or other numeric identifier, it might be good to
use that for everything rather than having two mechanisms?

>
>
>
>>
>> >>
>> >>
>> >>>
>> >>> OK, so I think it's probably reasonable for now to just add
>> DW_TAG_modules to the CU for each referenced module (or does it have to be
>> each referenced submodule? (can two submodules within a single module be
>> contradictory/conflicting?)). Since we don't have any good way to reference
>> the module is a foreign unit while deduplicating that unit... there's not
>> much point having the imported_module - but if you think it adds anything,
>> I'm open to ideas.
>> >> It could help keeping things simpler.
>> >> Emitting it doesn’t add much semantic value because module imports
>> always occur at the top level, but it will make the transition to the
>> deduplicated TAG_modules easier — It could be easier to teach consumers
>> once about imported_module({ref to TAG_module}) rather than having them
>> also recognize top-level TAG_modules as an intermediate step. It’s also
>> slightly easier to implement in LLVM because the imported_module allows us
>> to anchor the TAG_module in the CU, but that’s not a very strong argument.
>> >
>> > Agreed on all counts (not a strong argument, but convenient enough,
>> etc, etc).
>> >
>> > I'm still not entirely sure what the right answer is here, though,
>> which is why I'm hesitant to bake anything in too strongly.
>> >
>> > To come back to one of the outstanding questions: Do you need submodule
>> import information, or just module level (if modules cannot have internal
>> conflicts and you can't avoid cross-module conflicts just by lack of
>> visibility (I have no idea if either of those things are true) then you may
>> just need per-module not per-submodule info)?
>>
>> At the moment I do not think that it makes sense for two submodules to
>> conflict, but there is nothing in the clang documentation that explicitly
>> forbids this. With this in mind, I think it is reasonable to not support
>> submodules (at least initially) and always emit an import for the parent
>> module.
>> Thats what I wanted to write ... but I as I’m browsing through our
>> documentation,
>> http://clang.llvm.org/docs/Modules.html#conflict-declarations explicitly
>> gives an example of two conflicting submodules, so maybe this is not a
>> reasonable simplification after all. On the other hand, a quick grep over
>> all system module maps on OS X doesn’t show a single conflict declaration.
>>
>> I still believe we do not need to support submodules right from the
>> start, but we should have a story for getting there if we need to.
>>
>
> Given the simple example that demonstrates the possibility, it seems fair
> to have a story for what that looks like, yes - even if a first
> pass/prototype doesn't support it.
>
>
> Sure.
>
>
>
>>
>> >
>> > Also, does each submodule need different special attributes/flags? If
>> the special codegen attributes you want are at the module level, it'd
>> probably be best to keep those on the Skeleton CU for the module (that will
>> be comdat folded, etc, on ELF - and they could be DWARF-aware deduplicated
>> by llvm-dsymutil) so they're not duplicated. The DW_TAG_module would then
>> just have a DW_AT_signature attribute or something similarly small/trivial
>> to point to the skeleton CU.
>>
>> The attributes are derived from cc1 command line arguments. Not two
>> submodules imported by one CU can have different attributes. All submodules
>> in a pcm also share their attributes. Putting them into the skeleton CU
>> appears to be the most efficient place to put them, though perhaps not the
>> most logical one.
>>
>
> Why not the most logical? It'd be nice if it were a DW_TAG_module instead
> of a DW_TAG_compile_unit - but given the limited vocabulary we have in
> DWARF top level tags, it seems as good as we can have.
>
>
> I tend to view the module configuration (include path, isysroot,
> configuration macros) to be a part of the module and not a part of the
> skeleton that points to the split debug info for that module.
>

Sure, it's a workaround for the lack of Bag O' DWARF for now, one way or
another. Either way the debugger's going to have to jump the

> A module is uniquely identified by name + configuration. That’s why I feel
> it should be part of the tag that also holds the name.
>
>
>
>> I would prefer to stick the attributes on the (top-level) DW_TAG_module
>> and later deduplicate the attributes together with the DW_TAG_module.
>> Sticking them on the skeleton won’t save any space in the .o files and
>> would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot -
>> 1x FORM_ref_sig_8) per CU and imported module.
>
>
> Seems nicer not to duplicate them, especially since not everyone will be
> using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux
> won't have that convenience). Eventually we can use Bag O' DWARF for the
> skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow that
> as a top-level tag, if desired/useful - I'm not sure it adds a lot) and
> have the imported_module reference it that way. (DW_TAG_imported_module,
> DW_AT_import, DW_FORM_ref_sig8)
>
>
> I'm not /hugely/ invested in this, but we do have people caring about LLDB
> on Linux and Windows, so avoiding tying the LLDB story to MachO and
> dsymutil, etc, seems valuable.
>
>
> I think that this would be an unnecessary intermediate step that we
> eventually want to migrate away from anyway. We already identified that the
> good solution for deduplication is going to be a skeleton TAG_module, so my
> view is that it is not worth the trouble adding a temporary indirection
> (and a new attribute name)
>

New attribute name?

> to save 4 bytes in the intermediate step.
>

The debugger's going to need to resolve the skeleton anyway (in the case of
non-AST based debugging) so I'm not sure how much it's an extra step.

> I don’t actually think there is anything about the TAG_module design tying
> this to either MachO or dsymutil, but let me know if you feel otherwise.
>

Sorry, what I was getting at was that with the Mach0/dsymutil/lldb story
you probably don't need to consult the skeleton debug info (actually I
forgot, you do - in the case where you need a name from a module that might
be incompatible (it's not referenced directly in this CU)) - pre-dsymutil,
you'd use the ASTs directly, post-dsymutil I expect you'll have inlined all
the debug info so there are no skeletons, etc, if I'm understanding your
design correctly.

- David

>
> -- adrian
>
>
>
>> >
>
> > If you need submodule import lists, then each DW_AT_module representing
>> a submodule would have a name (anything else?) and the signature refering
>> to its module skeleton CU.
>>
>> What I’m envisioning is
>>
>> .debug_info:
>>   DW_TAG_compile_unit
>>     ...
>>     DW_TAG_imported_module
>>      // import FooSubA
>>      DW_AT_import [DW_FORM_ref4] (0x60)
>>
>>     DW_TAG_module
>>       DW_AT_name(“FooLib”)
>>       DW_AT_LLVM_sysroot(“/“)
>>       DW_AT_LLVM_include_dirs(“-I/path”)
>>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> 0x60:
>>       DW_TAG_module
>>         DW_AT_name(“FooSubA”)
>>         // need not be emitted if not referenced.
>>         DW_TAG_module
>>           DW_AT_name(“FooSubASubA”)
>>
>>       // need not be emitted if not referenced.
>>       DW_TAG_module
>>         DW_AT_name(“FooSubB”)
>>
>>
>>
>> -- adrian
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150505/ae301896/attachment.html>