[llvm] r191792 - Debug Info: remove duplication of DIEs when a DIE is part of the type system

Wed Oct 2 16:05:48 PDT 2013

On Wed, Oct 2, 2013 at 10:42 AM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
>
> On Tue, Oct 1, 2013 at 7:05 PM, Manman Ren <manman.ren at gmail.com> wrote:
>
>>
>> I am now a little confused about the mapping issue.
>> The patch replaces every getDIE of a Type, a SP, and a Static Member with
>> DwarfDebug.getType|SP|StaticMemberDie.
>>
>> getDIE was used before creating a DIE, and it was also used before
>> generating a reference to the DIE via ref4 or ref_addr.
>>
>> For the usage before creating a DIE, if we still use the CU mapping, we
>> will create multiple DIEs for the same MDNode across CUs, if we replace it
>> with the DwarfDebug mapping, we will not create multiple DIEs.
>>
>> For the usage before generating a reference, it does not make sense to
>> use CU mapping if we only have a single DIE for the MDNode (i.e using
>> DwarfDebug mapping before creating a DIE); it does not make sense to use
>> DwarfDebug mapping if we already
>> have multiple DIEs for the MDNode.
>>
>> So it sounds to me, the usage of getDIE before creating a DIE determines
>> whether we have multiple DIEs for a given MDNode and other usages of getDIE
>> should either use CU mapping or DwarfDebug mapping, but not both for a
>> given MDNode.
>>
>
> OK, I think I'm understanding you here. Just throwing around thoughts,
> nothing terribly deeply analyzed yet, bu it might be easier to understand
> if we simply had CompileUnit::getDIE have a list of tags for which it would
> delegate to the cross-CU mapping. Then it'd be simply one mapping in
> DwarfDebug used from one place (getDIE) and would be hidden from all
> callers, keeping the rest of the code fairly consistent/simple... just a
> thought, at least.
>

Sounds reasonable, I will work on that.

>
> But specifically you're just interested in sharing types between CUs, yes?
> The sharing of Subprograms is just really for sharing them when they're
> members of a type, yes?
>
Yes.

> (I suppose subprograms could appear in multiple CUs if they were inline in
> a header and used in multiple CUs)
>
> Would it perhaps be easier to remove this patch and just use type units
> that I'm working on implementing at the moment?
>

This patch does two things:
1> move the map from CompileUnit to DwarfDebug
2> correctly use ref_addr if necessary (by using a worklist)

There are some functionality overlapping between this patch and the type
units, but there are certainly differences as well.

Both need to have a map in DwarfDebug that maps type MDNodes to DIE(s), in
your proposed patch,
DenseMap<const MDNode *, std::pair<uint64_t, SmallVectorImpl<DIE*>* > >
TypeUnits
  --> I am not quite sure why we need the pair, can you add some comments
in your proposed patch? :)
In my patch, we have a DenseMap<const MDNode *, DIE*> MDTypeNodeToDieMap

With type units, the across-CU references will be replaced with references
to type units.
With this patch, we use ref_addr.

For LTO's purpose, we don't need to calculate the hash since all we need is
to make sure we only generate a single DIE for a given type MDNode.
Calculating the hash costs time :)

I would like to revisit this (maybe revert this) later on when type units
are working.

+    /// Similar to getCompileUnit, returns null when DIE is not added to an
>> +    /// owner yet.
>> +    DIE *checkCompileUnit();
>>
>
> Why is this not just "getParent() == NULL"? Why do you need a walk for the
> DIE?
>
Yeah, that should work.
<-- It is possible that a DIE is added to a parent but the parent is not
yet added to an owner yet, so we still have to walk along the parent chain.

Thanks,
Manman

>
> While I have discussed with Eric the idea that DIEs could be referenced
> cross-CU, I'm a bit hesitant to put this sort of thing in without further
> thought/consideration, and it seems like type units are a more general
> purpose solution to this and other issues anyway. (though they don't
> immediately address the inline non-member function example I mentioned
> above, those would still be duplicated across CUs - so if that's really
> important we might have to do something like this anyway)
>
>
>>
>> Thanks,
>> Manman
>>
>>
>> On Tue, Oct 1, 2013 at 5:58 PM, Manman Ren <manman.ren at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Tue, Oct 1, 2013 at 4:10 PM, David Blaikie <dblaikie at gmail.com>wrote:
>>>
>>>>
>>>>  The idea is that we're going to have references across-CUs? Do you
>>>>>> have any idea if debuggers tolerate/handle this kind of referencing?
>>>>>>
>>>>> Yes, the DIE will be referenced across-CUs. We have been using
>>>>> ref_addr for some cases,
>>>>>
>>>>
>>>> In which cases do we already do this?
>>>>
>>> In DwarfDebug::updateSubprogramScopeDIE, when we add abstract_origin of
>>> a subprogram.
>>> I agree that we are now using ref_addr broadly for LTO builds.
>>>
>>>
>>>>
>>>>
>>>>>  and lldb is fine with it.
>>>>>
>>>>
>>>> lldb isn't the only DWARF consumer.
>>>>
>>>  Other debuggers should work with ref_addr.
>>>
>>>
>>>>
>>>> I tried to move the mapping for MDNodes related to the type system,
>>>>> from CompileUnit to DwarfDebug, and I didn't include namespace in this
>>>>> commit.
>>>>> Theoretically, for everything that can be shared across CUs, we should
>>>>> add it to the DwarfDebug map to remove duplication.
>>>>>
>>>>
>>>> Then it would be nice to do this as a single, generic solution over all
>>>> things that can be referenced rather than piecemeal with special cases for
>>>> each kind of thing, I think.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> So a strawman alternative proposal would be: add all the MDNode/DIE
>>>>>> mappings to both the local CompileUnit and 'global' DwarfDebug maps and
>>>>>> have different lookup functions depending on the context - using the
>>>>>> DwarfDebug mapping when we want to reference anything for any reason, and
>>>>>> the CompileUnit mapping when we want to add children (eg: adding new things
>>>>>> to a namespace, etc) or attributes.
>>>>>>
>>>>> I don't quite get why we need two mappings for one MDNode.
>>>>>
>>>>
>>>> Because, say we have a namespace node - we need a CU-specific mapping
>>>> so that when we need to add entities to the namespace (another type, global
>>>> variable, etc) we can lookup the CU-local instance of the namespace. But
>>>> when we only need to refer to the namespace (eg: for a namespace alias) we
>>>> can lookup a cross-CU mapping.
>>>>
>>> If we already have multiple DIEs for the same MDNode (that is why we
>>> keep a CU-specific mapping, right?), when referring to the namespace DIE,
>>> can we just use CU-specific mapping?
>>> I am not familiar with how a namespace alias works, so don't quite know
>>> why we need a cross-CU mapping when referring to the namespace.
>>>
>>>
>>>> This is exactly what you've already implemented - the mappings you
>>>> added to DwarfDebug are the second mapping - the first mapping already
>>>> exists within the CompileUnit and is still used for the cases where we want
>>>> to add members (at least I hope we are, because we need to do that in some
>>>> cases).
>>>>
>>> Oops, I moved the mapping from CompileUnit to DwarfDebug. Are you
>>> referring to "the namespace alias" by "some cases"?
>>>
>>>
>>>>
>>>>
>>>>> I tried to categorize MDNodes to two groups, one group belonging to
>>>>> CompileUnit (Variables etc that will be referenced in a single CU), the
>>>>> other group belonging to DwarfDebug (type system that can be shared across
>>>>> CUs).
>>>>>
>>>>
>>>> I don't believe that will be sufficient, but I could be wrong.
>>>> Namespaces seem like a pretty solid example where a thing would need to be
>>>> mapped in both a CU-specific and cross-CU way.
>>>>
>>>> For types, currently there are cases where we insert member DIEs into
>>>> type DIEs lazily (related to the vtable debug info optimization, wherein we
>>>> emit only a declaration of a type when that type is dynamic (has a vtable)
>>>> and the vtable isn't emitted in this translation unit).
>>>>
>>> Do we have an existing testing case for this? I can try to extend it to
>>> multiple CUs to check what will happen.
>>>
>>>
>>>>  It might not be wrong to have us insert those member DIEs in an
>>>> existing instance of the type in another compile_unit - but I'm hesitant to
>>>> assume that's good and right.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> But I'm not sure how effective this would be - like I said, might
>>>>>>>> need some thought/discussion to understand your solution & consider
>>>>>>>> alternatives.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sometimes, when we try to add an attribute to a DIE, the DIE is
>>>>>>>>> not yet added
>>>>>>>>> to its owner yet, so we don't know whether we should use ref_addr
>>>>>>>>> or ref4.
>>>>>>>>> We create a worklist that will be processed during finalization to
>>>>>>>>> add
>>>>>>>>> attributes with the correct form (ref_addr or ref4).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Do you have an example of this as a test case? (I really wish we
>>>>>>>> had an easy way to see test coverage - so I could just tell at a glance
>>>>>>>> whether the worklist code is being exercised by your test cases... oh well)
>>>>>>>>
>>>>>>> Yes, we don't always immediately add a newly-created DIE to a
>>>>>>> parent. So it is possible to have DIEs without an owner when calling
>>>>>>> addDIEEntry.
>>>>>>> One interesting example is:
>>>>>>> getOrCreateTypeDIE --> addToContextOwner --> getOrCreateContextDIE
>>>>>>> (the original type DIE can be referenced during the construction of the
>>>>>>> context, and when that happens, the type DIE does not have an owner yet)
>>>>>>>
>>>>>>
>>>>>> But you know that the reference to that type DIE will be from within
>>>>>> the same CU as the type itself - since it's a child. So you know the
>>>>>> reference will be a ref4, no?
>>>>>>
>>>>> For this case, yes, it is going to be ref4. But I am not sure whether
>>>>> we can say the same for all cases when a DIE is not added to an owner.
>>>>> It also means we need to pass in an extra flag telling addDIEEntry
>>>>> that we are sure it is a ref4 and it is not an easy task to track the flag
>>>>> across the chain of
>>>>> function calls: getOrCreateTypeDIE --> addToContextOwner -->
>>>>> getOrCreateContextDIE --> ... --> addDIEEntry.
>>>>>
>>>>
>>>>  Why would you need an extra flag? If we know that any time we come
>>>> back around to the same entity that isn't already inserted (ie: the
>>>> condition you already know because you use it to decide to add things to a
>>>> work list) it must be a local reference, so you answer the question
>>>> immediately rather than using a work list.
>>>>
>>> How can we tell that we are coming back around to the same entity? Maybe
>>> I misunderstood something here.
>>> When we see a DIE without an owner and add things to a work list, I
>>> don't think there is a guarantee that the DIE and the Entry belong to the
>>> same CU.
>>>
>>> Thanks,
>>> Manman
>>>
>>>
>>>>
>>>> I could be wrong, but I want to have a good reason to do it some other
>>>> way not just "maybe sometimes this might not happen".
>>>>
>>>> - David
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131002/d1a234fb/attachment.html>