[llvm] r191792 - Debug Info: remove duplication of DIEs when a DIE is part of the type system

Wed Oct 2 10:42:35 PDT 2013

On Tue, Oct 1, 2013 at 7:05 PM, Manman Ren <manman.ren at gmail.com> wrote:

>
> I am now a little confused about the mapping issue.
> The patch replaces every getDIE of a Type, a SP, and a Static Member with
> DwarfDebug.getType|SP|StaticMemberDie.
>
> getDIE was used before creating a DIE, and it was also used before
> generating a reference to the DIE via ref4 or ref_addr.
>
> For the usage before creating a DIE, if we still use the CU mapping, we
> will create multiple DIEs for the same MDNode across CUs, if we replace it
> with the DwarfDebug mapping, we will not create multiple DIEs.
>
> For the usage before generating a reference, it does not make sense to use
> CU mapping if we only have a single DIE for the MDNode (i.e using
> DwarfDebug mapping before creating a DIE); it does not make sense to use
> DwarfDebug mapping if we already
> have multiple DIEs for the MDNode.
>
> So it sounds to me, the usage of getDIE before creating a DIE determines
> whether we have multiple DIEs for a given MDNode and other usages of getDIE
> should either use CU mapping or DwarfDebug mapping, but not both for a
> given MDNode.
>

OK, I think I'm understanding you here. Just throwing around thoughts,
nothing terribly deeply analyzed yet, bu it might be easier to understand
if we simply had CompileUnit::getDIE have a list of tags for which it would
delegate to the cross-CU mapping. Then it'd be simply one mapping in
DwarfDebug used from one place (getDIE) and would be hidden from all
callers, keeping the rest of the code fairly consistent/simple... just a
thought, at least.

But specifically you're just interested in sharing types between CUs, yes?
The sharing of Subprograms is just really for sharing them when they're
members of a type, yes? (I suppose subprograms could appear in multiple CUs
if they were inline in a header and used in multiple CUs)

Would it perhaps be easier to remove this patch and just use type units
that I'm working on implementing at the moment?

While I have discussed with Eric the idea that DIEs could be referenced
cross-CU, I'm a bit hesitant to put this sort of thing in without further
thought/consideration, and it seems like type units are a more general
purpose solution to this and other issues anyway. (though they don't
immediately address the inline non-member function example I mentioned
above, those would still be duplicated across CUs - so if that's really
important we might have to do something like this anyway)

>
> Thanks,
> Manman
>
>
> On Tue, Oct 1, 2013 at 5:58 PM, Manman Ren <manman.ren at gmail.com> wrote:
>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 4:10 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>>
>>>  The idea is that we're going to have references across-CUs? Do you
>>>>> have any idea if debuggers tolerate/handle this kind of referencing?
>>>>>
>>>> Yes, the DIE will be referenced across-CUs. We have been using ref_addr
>>>> for some cases,
>>>>
>>>
>>> In which cases do we already do this?
>>>
>> In DwarfDebug::updateSubprogramScopeDIE, when we add abstract_origin of a
>> subprogram.
>> I agree that we are now using ref_addr broadly for LTO builds.
>>
>>
>>>
>>>
>>>>  and lldb is fine with it.
>>>>
>>>
>>> lldb isn't the only DWARF consumer.
>>>
>>  Other debuggers should work with ref_addr.
>>
>>
>>>
>>> I tried to move the mapping for MDNodes related to the type system, from
>>>> CompileUnit to DwarfDebug, and I didn't include namespace in this commit.
>>>> Theoretically, for everything that can be shared across CUs, we should
>>>> add it to the DwarfDebug map to remove duplication.
>>>>
>>>
>>> Then it would be nice to do this as a single, generic solution over all
>>> things that can be referenced rather than piecemeal with special cases for
>>> each kind of thing, I think.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> So a strawman alternative proposal would be: add all the MDNode/DIE
>>>>> mappings to both the local CompileUnit and 'global' DwarfDebug maps and
>>>>> have different lookup functions depending on the context - using the
>>>>> DwarfDebug mapping when we want to reference anything for any reason, and
>>>>> the CompileUnit mapping when we want to add children (eg: adding new things
>>>>> to a namespace, etc) or attributes.
>>>>>
>>>> I don't quite get why we need two mappings for one MDNode.
>>>>
>>>
>>> Because, say we have a namespace node - we need a CU-specific mapping so
>>> that when we need to add entities to the namespace (another type, global
>>> variable, etc) we can lookup the CU-local instance of the namespace. But
>>> when we only need to refer to the namespace (eg: for a namespace alias) we
>>> can lookup a cross-CU mapping.
>>>
>> If we already have multiple DIEs for the same MDNode (that is why we keep
>> a CU-specific mapping, right?), when referring to the namespace DIE, can we
>> just use CU-specific mapping?
>> I am not familiar with how a namespace alias works, so don't quite know
>> why we need a cross-CU mapping when referring to the namespace.
>>
>>
>>> This is exactly what you've already implemented - the mappings you added
>>> to DwarfDebug are the second mapping - the first mapping already exists
>>> within the CompileUnit and is still used for the cases where we want to add
>>> members (at least I hope we are, because we need to do that in some cases).
>>>
>> Oops, I moved the mapping from CompileUnit to DwarfDebug. Are you
>> referring to "the namespace alias" by "some cases"?
>>
>>
>>>
>>>
>>>> I tried to categorize MDNodes to two groups, one group belonging to
>>>> CompileUnit (Variables etc that will be referenced in a single CU), the
>>>> other group belonging to DwarfDebug (type system that can be shared across
>>>> CUs).
>>>>
>>>
>>> I don't believe that will be sufficient, but I could be wrong.
>>> Namespaces seem like a pretty solid example where a thing would need to be
>>> mapped in both a CU-specific and cross-CU way.
>>>
>>> For types, currently there are cases where we insert member DIEs into
>>> type DIEs lazily (related to the vtable debug info optimization, wherein we
>>> emit only a declaration of a type when that type is dynamic (has a vtable)
>>> and the vtable isn't emitted in this translation unit).
>>>
>> Do we have an existing testing case for this? I can try to extend it to
>> multiple CUs to check what will happen.
>>
>>
>>>  It might not be wrong to have us insert those member DIEs in an
>>> existing instance of the type in another compile_unit - but I'm hesitant to
>>> assume that's good and right.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> But I'm not sure how effective this would be - like I said, might
>>>>>>> need some thought/discussion to understand your solution & consider
>>>>>>> alternatives.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Sometimes, when we try to add an attribute to a DIE, the DIE is not
>>>>>>>> yet added
>>>>>>>> to its owner yet, so we don't know whether we should use ref_addr
>>>>>>>> or ref4.
>>>>>>>> We create a worklist that will be processed during finalization to
>>>>>>>> add
>>>>>>>> attributes with the correct form (ref_addr or ref4).
>>>>>>>>
>>>>>>>
>>>>>>> Do you have an example of this as a test case? (I really wish we had
>>>>>>> an easy way to see test coverage - so I could just tell at a glance whether
>>>>>>> the worklist code is being exercised by your test cases... oh well)
>>>>>>>
>>>>>> Yes, we don't always immediately add a newly-created DIE to a parent.
>>>>>> So it is possible to have DIEs without an owner when calling addDIEEntry.
>>>>>> One interesting example is:
>>>>>> getOrCreateTypeDIE --> addToContextOwner --> getOrCreateContextDIE
>>>>>> (the original type DIE can be referenced during the construction of the
>>>>>> context, and when that happens, the type DIE does not have an owner yet)
>>>>>>
>>>>>
>>>>> But you know that the reference to that type DIE will be from within
>>>>> the same CU as the type itself - since it's a child. So you know the
>>>>> reference will be a ref4, no?
>>>>>
>>>> For this case, yes, it is going to be ref4. But I am not sure whether
>>>> we can say the same for all cases when a DIE is not added to an owner.
>>>> It also means we need to pass in an extra flag telling addDIEEntry that
>>>> we are sure it is a ref4 and it is not an easy task to track the flag
>>>> across the chain of
>>>> function calls: getOrCreateTypeDIE --> addToContextOwner -->
>>>> getOrCreateContextDIE --> ... --> addDIEEntry.
>>>>
>>>
>>>  Why would you need an extra flag? If we know that any time we come back
>>> around to the same entity that isn't already inserted (ie: the condition
>>> you already know because you use it to decide to add things to a work list)
>>> it must be a local reference, so you answer the question immediately rather
>>> than using a work list.
>>>
>> How can we tell that we are coming back around to the same entity? Maybe
>> I misunderstood something here.
>> When we see a DIE without an owner and add things to a work list, I don't
>> think there is a guarantee that the DIE and the Entry belong to the same CU.
>>
>> Thanks,
>> Manman
>>
>>
>>>
>>> I could be wrong, but I want to have a good reason to do it some other
>>> way not just "maybe sometimes this might not happen".
>>>
>>> - David
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131002/99e8f569/attachment.html>