[llvm] r191792 - Debug Info: remove duplication of DIEs when a DIE is part of the type system

Tue Oct 1 19:05:20 PDT 2013

I am now a little confused about the mapping issue.
The patch replaces every getDIE of a Type, a SP, and a Static Member with
DwarfDebug.getType|SP|StaticMemberDie.

getDIE was used before creating a DIE, and it was also used before
generating a reference to the DIE via ref4 or ref_addr.

For the usage before creating a DIE, if we still use the CU mapping, we
will create multiple DIEs for the same MDNode across CUs, if we replace it
with the DwarfDebug mapping, we will not create multiple DIEs.

For the usage before generating a reference, it does not make sense to use
CU mapping if we only have a single DIE for the MDNode (i.e using
DwarfDebug mapping before creating a DIE); it does not make sense to use
DwarfDebug mapping if we already
have multiple DIEs for the MDNode.

So it sounds to me, the usage of getDIE before creating a DIE determines
whether we have multiple DIEs for a given MDNode and other usages of getDIE
should either use CU mapping or DwarfDebug mapping, but not both for a
given MDNode.

Thanks,
Manman

On Tue, Oct 1, 2013 at 5:58 PM, Manman Ren <manman.ren at gmail.com> wrote:

>
>
>
> On Tue, Oct 1, 2013 at 4:10 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>  The idea is that we're going to have references across-CUs? Do you have
>>>> any idea if debuggers tolerate/handle this kind of referencing?
>>>>
>>> Yes, the DIE will be referenced across-CUs. We have been using ref_addr
>>> for some cases,
>>>
>>
>> In which cases do we already do this?
>>
> In DwarfDebug::updateSubprogramScopeDIE, when we add abstract_origin of a
> subprogram.
> I agree that we are now using ref_addr broadly for LTO builds.
>
>
>>
>>
>>>  and lldb is fine with it.
>>>
>>
>> lldb isn't the only DWARF consumer.
>>
> Other debuggers should work with ref_addr.
>
>
>>
>> I tried to move the mapping for MDNodes related to the type system, from
>>> CompileUnit to DwarfDebug, and I didn't include namespace in this commit.
>>> Theoretically, for everything that can be shared across CUs, we should
>>> add it to the DwarfDebug map to remove duplication.
>>>
>>
>> Then it would be nice to do this as a single, generic solution over all
>> things that can be referenced rather than piecemeal with special cases for
>> each kind of thing, I think.
>>
>>
>>>
>>>
>>>>
>>>> So a strawman alternative proposal would be: add all the MDNode/DIE
>>>> mappings to both the local CompileUnit and 'global' DwarfDebug maps and
>>>> have different lookup functions depending on the context - using the
>>>> DwarfDebug mapping when we want to reference anything for any reason, and
>>>> the CompileUnit mapping when we want to add children (eg: adding new things
>>>> to a namespace, etc) or attributes.
>>>>
>>> I don't quite get why we need two mappings for one MDNode.
>>>
>>
>> Because, say we have a namespace node - we need a CU-specific mapping so
>> that when we need to add entities to the namespace (another type, global
>> variable, etc) we can lookup the CU-local instance of the namespace. But
>> when we only need to refer to the namespace (eg: for a namespace alias) we
>> can lookup a cross-CU mapping.
>>
> If we already have multiple DIEs for the same MDNode (that is why we keep
> a CU-specific mapping, right?), when referring to the namespace DIE, can we
> just use CU-specific mapping?
> I am not familiar with how a namespace alias works, so don't quite know
> why we need a cross-CU mapping when referring to the namespace.
>
>
>> This is exactly what you've already implemented - the mappings you added
>> to DwarfDebug are the second mapping - the first mapping already exists
>> within the CompileUnit and is still used for the cases where we want to add
>> members (at least I hope we are, because we need to do that in some cases).
>>
> Oops, I moved the mapping from CompileUnit to DwarfDebug. Are you
> referring to "the namespace alias" by "some cases"?
>
>
>>
>>
>>> I tried to categorize MDNodes to two groups, one group belonging to
>>> CompileUnit (Variables etc that will be referenced in a single CU), the
>>> other group belonging to DwarfDebug (type system that can be shared across
>>> CUs).
>>>
>>
>> I don't believe that will be sufficient, but I could be wrong. Namespaces
>> seem like a pretty solid example where a thing would need to be mapped in
>> both a CU-specific and cross-CU way.
>>
>> For types, currently there are cases where we insert member DIEs into
>> type DIEs lazily (related to the vtable debug info optimization, wherein we
>> emit only a declaration of a type when that type is dynamic (has a vtable)
>> and the vtable isn't emitted in this translation unit).
>>
> Do we have an existing testing case for this? I can try to extend it to
> multiple CUs to check what will happen.
>
>
>>  It might not be wrong to have us insert those member DIEs in an existing
>> instance of the type in another compile_unit - but I'm hesitant to assume
>> that's good and right.
>>
>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>> But I'm not sure how effective this would be - like I said, might
>>>>>> need some thought/discussion to understand your solution & consider
>>>>>> alternatives.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Sometimes, when we try to add an attribute to a DIE, the DIE is not
>>>>>>> yet added
>>>>>>> to its owner yet, so we don't know whether we should use ref_addr or
>>>>>>> ref4.
>>>>>>> We create a worklist that will be processed during finalization to
>>>>>>> add
>>>>>>> attributes with the correct form (ref_addr or ref4).
>>>>>>>
>>>>>>
>>>>>> Do you have an example of this as a test case? (I really wish we had
>>>>>> an easy way to see test coverage - so I could just tell at a glance whether
>>>>>> the worklist code is being exercised by your test cases... oh well)
>>>>>>
>>>>> Yes, we don't always immediately add a newly-created DIE to a parent.
>>>>> So it is possible to have DIEs without an owner when calling addDIEEntry.
>>>>> One interesting example is:
>>>>> getOrCreateTypeDIE --> addToContextOwner --> getOrCreateContextDIE
>>>>> (the original type DIE can be referenced during the construction of the
>>>>> context, and when that happens, the type DIE does not have an owner yet)
>>>>>
>>>>
>>>> But you know that the reference to that type DIE will be from within
>>>> the same CU as the type itself - since it's a child. So you know the
>>>> reference will be a ref4, no?
>>>>
>>> For this case, yes, it is going to be ref4. But I am not sure whether we
>>> can say the same for all cases when a DIE is not added to an owner.
>>> It also means we need to pass in an extra flag telling addDIEEntry that
>>> we are sure it is a ref4 and it is not an easy task to track the flag
>>> across the chain of
>>> function calls: getOrCreateTypeDIE --> addToContextOwner -->
>>> getOrCreateContextDIE --> ... --> addDIEEntry.
>>>
>>
>>  Why would you need an extra flag? If we know that any time we come back
>> around to the same entity that isn't already inserted (ie: the condition
>> you already know because you use it to decide to add things to a work list)
>> it must be a local reference, so you answer the question immediately rather
>> than using a work list.
>>
> How can we tell that we are coming back around to the same entity? Maybe I
> misunderstood something here.
> When we see a DIE without an owner and add things to a work list, I don't
> think there is a guarantee that the DIE and the Entry belong to the same CU.
>
> Thanks,
> Manman
>
>
>>
>> I could be wrong, but I want to have a good reason to do it some other
>> way not just "maybe sometimes this might not happen".
>>
>> - David
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131001/bb482872/attachment.html>