[PATCH] Have clang list the imported modules in the debug info

Eric Christopher echristo at gmail.com
Tue May 5 15:17:59 PDT 2015


On Tue, May 5, 2015 at 3:14 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Tue, May 5, 2015 at 3:01 PM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> > On May 5, 2015, at 9:59 AM, David Blaikie <dblaikie at gmail.com> wrote:
>> >
>> >
>> >
>> > On Mon, May 4, 2015 at 4:46 PM, Adrian Prantl <aprantl at apple.com>
>> wrote:
>> >
>> >>> On May 4, 2015, at 1:31 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> >>>
>> >> ...
>> >>>> >>> So you're going to need to implement fission (to at least some
>> degree) support in LLDB, then? (to support the case where you haven't
>> linked debug info with llvm-dsymutil, but you've hit one of these lookup
>> problems where you need to cross possibly-conflicting modules)
>> >>>> >>
>> >>>> >> Yes. Specifically, it won’t support type units, and it will look
>> up types by name rather than by signature. (cf. the second part of
>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html
>> )
>> >>>> >
>> >>>> > How are you going to reference the types in the module's fission
>> CU without type units/signatures? Are you going to emit type declarations
>> into the normal CU and rely on the debugger to know that these declarations
>> can be resolved by looking elsewhere? (just without the benefit of
>> constraining that search to just looking for a matching TU?)
>> >>>>
>> >>>> If you look at the example in
>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html,
>> there will be an external type index (using the usual accelerator table
>> format) that maps an external type’s UID to a pcm. In the pcm there is an
>> extra accelerator table entry that maps UID to DIE offset.
>> >>>
>> >>> I mean I guess that's up to you, but seems like a relatively large
>> workaround compared to supporting type units... (I mean certainly seems
>> like strictly less work to do the workaround than implementing type units
>> in LLDB, but a relatively large amount of work to do/throw away eventually
>> once LLDB supports type units)
>> >>
>> >> It’s not primarily meant to be a workaround so LLDB doesn’t need to
>> implement type units; the UIDs double as the key (decl context + name) to
>> import types directly from the AST. The other advantage is that we won’t
>> have to worry about MD5 hash collisions, but that’s more of a theoretical
>> advantage.
>> >
>> > If there's a concern about collisions, we should fix that regardless
>> (because we'll have the same problem with normal type units or modularized
>> type units).
>>
>> My understanding is that split DWARF is disambiguating hash collisions by
>> adding the full decl context and name to both the definition and the
>> forward declaration:
>>
>
> No, that's not a requirement. It's just an aspect of our implementation so
> we can put entities in the type when needed (such as for putting member
> function declarations in there to refererence from out of line member
> function definitions)
>
> In  the simplest case, you can reference a type unit directly:
>
> DW_TAG_variable
>   DW_AT_type [DW_FORM_ref_sig8] ....
>
> Without any DW_TAG_structure_type/namespaces/etc. Once we have Bag O'
> DWARF, we'd move to that representation almost entirely. (the signature
> we're using for this is the mangled name of the type so it includes the
> decl context in the hash anyway)
>
>
>>
>> so if a module defines:
>> namespace A { struct B { struct C {} } }
>> we’ll end up with
>>
>> DW_TAG_variable
>>   DW_AT_name “myC”
>>   DW_AT_type [DW_FORM_ref4] 0x20
>>
>> DW_TAG_namespace
>>   DW_AT_name “A”
>>   DW_TAG_structure_type
>>     DW_AT_name “B”
>> 0x20:
>>     DW_TAG_structure_type
>>       DW_AT_name “C”
>>       DW_AT_signature 0x1234ABCD
>>       DW_AT_declaration
>>
>>
>> and in the DWO there will be:
>>
>>
>> DW_TAG_type_unit 0x1234ABCD
>>   DW_TAG_namespace
>>     DW_AT_name “A”
>>     DW_TAG_structure_type
>>       DW_AT_name “B”
>>
>>       DW_TAG_structure_type
>>         DW_AT_name “C”
>>         ...
>>
>> ----------
>>
>> (for completeness, the UID type reference would look like this:
>>  DW_TAG_variable
>>    DW_AT_name “myC”
>>    DW_AT_type [DW_FORM_strp] “_ZTN1AS1BS1C” )
>>
>>
>>
>> So I think that hash collisions are actually handled for normal fission
>> and we don’t need to worry about them. The bag o’dwarf will still contain
>> the full decl context in the DWO, so it will be fine there, too.
>>
>> >
>> > If we've already got (& have to use in the case of DWARF fallback for
>> modules) to support a hash or other numeric identifier, it might be good to
>> use that for everything rather than having two mechanisms?
>>
>> Generally, one way to do it is of course better than maintaining two :-)
>> But using the UIDs in the .o file allows us to elide the the forward
>> declarations from the .o file
>
>
> It doesn't, though - because we need the forward declarations to put
> members into (such as member function definitions - this'll come up even
> for modules quite commonly with any inline member function, for example -
> its definition will appear in the .o file and the definition
> DW_TAG_subprogram will need to DW_AT_specification pointing to the
> declaration of the member function - which is a DW_TAG_subprogram inside
> the DW_TAG_class that is a declaration with a signature that refers to the
> type unit).
>
> Only once we've got Bag O' DWARF do we remove the need for these
> declarations.
>
> Granted we can get rid of them in /some/ cases today - when there's no
> need to reference members in the type (or nest other members - like a
> nested class, for example - though we could possibly workaround even that
> with Bag O DWARF by having the type in the type unit have a declaration of
> the nested type - then reference that from the nested type definition in
> another type unit or .o file, etc).
>
> I don't think there's anything you can leverage here that's different from
> what type units need. If you're going to do anything different from the way
> type units work today, I'd really like to discuss that further and
> understand what's different here that benefits from/allows for that
> difference that doesn't apply to type units themselves.
>
>
>> (granted the UIDs carry the same information, so it’s not actually as
>> small as it looks like) so all that dsymutil needs to do is replace the
>> string references with FORM_ref_addrs, no further stripping needed.
>>
>>
>> >
>> >>
>> >>>
>> >>>
>> >>> >>
>> >>> >>
>> >>> >>>
>> >>> >>> OK, so I think it's probably reasonable for now to just add
>> DW_TAG_modules to the CU for each referenced module (or does it have to be
>> each referenced submodule? (can two submodules within a single module be
>> contradictory/conflicting?)). Since we don't have any good way to reference
>> the module is a foreign unit while deduplicating that unit... there's not
>> much point having the imported_module - but if you think it adds anything,
>> I'm open to ideas.
>> >>> >> It could help keeping things simpler.
>> >>> >> Emitting it doesn’t add much semantic value because module imports
>> always occur at the top level, but it will make the transition to the
>> deduplicated TAG_modules easier — It could be easier to teach consumers
>> once about imported_module({ref to TAG_module}) rather than having them
>> also recognize top-level TAG_modules as an intermediate step. It’s also
>> slightly easier to implement in LLVM because the imported_module allows us
>> to anchor the TAG_module in the CU, but that’s not a very strong argument.
>> >>> >
>> >>> > Agreed on all counts (not a strong argument, but convenient enough,
>> etc, etc).
>> >>> >
>> >>> > I'm still not entirely sure what the right answer is here, though,
>> which is why I'm hesitant to bake anything in too strongly.
>> >>> >
>> >>> > To come back to one of the outstanding questions: Do you need
>> submodule import information, or just module level (if modules cannot have
>> internal conflicts and you can't avoid cross-module conflicts just by lack
>> of visibility (I have no idea if either of those things are true) then you
>> may just need per-module not per-submodule info)?
>> >>>
>> >>> At the moment I do not think that it makes sense for two submodules
>> to conflict, but there is nothing in the clang documentation that
>> explicitly forbids this. With this in mind, I think it is reasonable to not
>> support submodules (at least initially) and always emit an import for the
>> parent module.
>> >>> Thats what I wanted to write ... but I as I’m browsing through our
>> documentation,
>> http://clang.llvm.org/docs/Modules.html#conflict-declarations explicitly
>> gives an example of two conflicting submodules, so maybe this is not a
>> reasonable simplification after all. On the other hand, a quick grep over
>> all system module maps on OS X doesn’t show a single conflict declaration.
>> >>>
>> >>> I still believe we do not need to support submodules right from the
>> start, but we should have a story for getting there if we need to.
>> >>>
>> >>> Given the simple example that demonstrates the possibility, it seems
>> fair to have a story for what that looks like, yes - even if a first
>> pass/prototype doesn't support it.
>> >>
>> >> Sure.
>> >>>
>> >>>
>> >>> >
>> >>> > Also, does each submodule need different special attributes/flags?
>> If the special codegen attributes you want are at the module level, it'd
>> probably be best to keep those on the Skeleton CU for the module (that will
>> be comdat folded, etc, on ELF - and they could be DWARF-aware deduplicated
>> by llvm-dsymutil) so they're not duplicated. The DW_TAG_module would then
>> just have a DW_AT_signature attribute or something similarly small/trivial
>> to point to the skeleton CU.
>> >>>
>> >>> The attributes are derived from cc1 command line arguments. Not two
>> submodules imported by one CU can have different attributes. All submodules
>> in a pcm also share their attributes. Putting them into the skeleton CU
>> appears to be the most efficient place to put them, though perhaps not the
>> most logical one.
>> >>>
>> >>> Why not the most logical? It'd be nice if it were a DW_TAG_module
>> instead of a DW_TAG_compile_unit - but given the limited vocabulary we have
>> in DWARF top level tags, it seems as good as we can have.
>> >>
>> >> I tend to view the module configuration (include path, isysroot,
>> configuration macros) to be a part of the module and not a part of the
>> skeleton that points to the split debug info for that module.
>> >
>> > Sure, it's a workaround for the lack of Bag O' DWARF for now, one way
>> or another. Either way the debugger's going to have to jump the
>>
>> [there’s something missing towards the end]
>> No it’s not a workaround. Having the full module configuration is what
>> allows LLDB to rebuild the module if the module cache has been cleared.
>> >
>> >> A module is uniquely identified by name + configuration. That’s why I
>> feel it should be part of the tag that also holds the name.
>> >>
>> >>>
>> >>> I would prefer to stick the attributes on the (top-level)
>> DW_TAG_module and later deduplicate the attributes together with the
>> DW_TAG_module. Sticking them on the skeleton won’t save any space in the .o
>> files and would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and
>> isysroot - 1x FORM_ref_sig_8) per CU and imported module.
>> >>>
>> >>> Seems nicer not to duplicate them, especially since not everyone will
>> be using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux
>> won't have that convenience). Eventually we can use Bag O' DWARF for the
>> skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow that
>> as a top-level tag, if desired/useful - I'm not sure it adds a lot) and
>> have the imported_module reference it that way. (DW_TAG_imported_module,
>> DW_AT_import, DW_FORM_ref_sig8)
>> >>>
>> >>> I'm not /hugely/ invested in this, but we do have people caring about
>> LLDB on Linux and Windows, so avoiding tying the LLDB story to MachO and
>> dsymutil, etc, seems valuable.
>> >>
>> >> I think that this would be an unnecessary intermediate step that we
>> eventually want to migrate away from anyway. We already identified that the
>> good solution for deduplication is going to be a skeleton TAG_module, so my
>> view is that it is not worth the trouble adding a temporary indirection
>> (and a new attribute name)
>> >
>> > New attribute name?
>>
>> What attribute would we put into the TAG_module to refer to the skeleton
>> CU?
>>
>
> I imagine we'd just match by name in that instance.
>
> (& eventually use a sig8, etc)
>
>
>>
>> >
>> >> to save 4 bytes in the intermediate step.
>> >
>> > The debugger's going to need to resolve the skeleton anyway (in the
>> case of non-AST based debugging) so I'm not sure how much it's an extra
>> step.
>>
>> My “intermediate step” was referring to the non-dedup’ed TAG_module that
>> is not (yet) part of the dedup’ed skeleton CU (but will be in the next
>> iteration). I just didn’t think that it is worth to optimize a
>> representation that will be changed soon anyway.
>>
>
> It doesn't seem to cost much, but I'll give up on this.
>

I'm not so willing to give up so quickly on this. I don't necessarily
believe "changed soon" for a lot of things w.r.t. dwarf. :)

-eric


>
>
>>
>> >
>> >> I don’t actually think there is anything about the TAG_module design
>> tying this to either MachO or dsymutil, but let me know if you feel
>> otherwise.
>> >
>> > Sorry, what I was getting at was that with the Mach0/dsymutil/lldb
>> story you probably don't need to consult the skeleton debug info (actually
>> I forgot, you do - in the case where you need a name from a module that
>> might be incompatible (it's not referenced directly in this CU)) -
>> pre-dsymutil, you'd use the ASTs directly, post-dsymutil I expect you'll
>> have inlined all the debug info so there are no skeletons, etc, if I'm
>> understanding your design correctly.
>>
>> Right, we need all of it :-)
>>
>> -- adrian
>>
>> >
>> > - David
>> >
>> >>
>> >> -- adrian
>> >>>
>> >>> >
>> >>> > If you need submodule import lists, then each DW_AT_module
>> representing a submodule would have a name (anything else?) and the
>> signature refering to its module skeleton CU.
>> >>>
>> >>> What I’m envisioning is
>> >>>
>> >>> .debug_info:
>> >>>   DW_TAG_compile_unit
>> >>>     ...
>> >>>     DW_TAG_imported_module
>> >>>      // import FooSubA
>> >>>      DW_AT_import [DW_FORM_ref4] (0x60)
>> >>>
>> >>>     DW_TAG_module
>> >>>       DW_AT_name(“FooLib”)
>> >>>       DW_AT_LLVM_sysroot(“/“)
>> >>>       DW_AT_LLVM_include_dirs(“-I/path”)
>> >>>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >>> 0x60:
>> >>>       DW_TAG_module
>> >>>         DW_AT_name(“FooSubA”)
>> >>>         // need not be emitted if not referenced.
>> >>>         DW_TAG_module
>> >>>           DW_AT_name(“FooSubASubA”)
>> >>>
>> >>>       // need not be emitted if not referenced.
>> >>>       DW_TAG_module
>> >>>         DW_AT_name(“FooSubB”)
>> >>>
>> >>>
>> >>>
>> >>> -- adrian
>> >>
>> >>
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150505/82d418c1/attachment.html>


More information about the cfe-commits mailing list