[LLVMdev] Proposal: type uniquing of debug info for LTO

David Blaikie dblaikie at gmail.com
Fri Jun 21 16:01:32 PDT 2013


On Fri, Jun 21, 2013 at 3:58 PM, Manman Ren <mren at apple.com> wrote:
>
> On Jun 21, 2013, at 3:25 PM, David Blaikie wrote:
>
>> On Fri, Jun 21, 2013 at 3:16 PM, Manman Ren <mren at apple.com> wrote:
>>>
>>> On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com> wrote:
>>>
>>>
>>> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>>>
>>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com> wrote:
>>>
>>>
>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>
>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at apple.com> wrote:
>>>
>>>
>>> A summary of options for issue #3:
>>> 3> To actually access the MDNode referenced via the hash value, we need to
>>> perform a lookup from the hash value to find the corresponding MDNode.
>>> The questions are where to store this map and how to keep it up-to-date when
>>> a MDNode is replaced.
>>> ---------------------
>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify access
>>> functions in DI classes to pass in the map.
>>> May need to modify printing functions in AsmWriter.
>>>
>>>
>>> You may not need the map in DIBuilder if you piggy back on the
>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
>>> for just debug dumps of the IR. A way around this might be nice, but
>>> I've not looked into it.
>>>
>>>
>>> Calling dump() on a MDNode inside gdb will not give us the derived type
>>> information, since we don't have a handle to the map.
>>>
>>>
>>> The only case where dump currently navigates a type link that I know
>>> is printing derived types, where we print "[from foo]". By just
>>> printing the string identifier for the type (without having to
>>> navigate the link/use the map) should be about as useful here.
>>>
>>>
>>>
>>> ---------------------
>>> Option b) I am going to expand David's description with more details,
>>> correct me if I am wrong.
>>>
>>>
>>> I'll let David continue this part of the thread with you.
>>>
>>>
>>> I personally prefer option b over a since it is much cleaner.
>>>
>>>
>>> Option b is more general and should probably work. I suggested a
>>> originally because I know it would work and the impact is limited to
>>> debug info. Also it doesn't involve touching the generic metadata
>>> interface or code and could be done quickly and incrementally. b may
>>> also require further changes to the debug info infrastructure to deal
>>> with lookup, etc so may end up being as much churn as a. b is more
>>> useful if we think other users of metadata may want/need similar
>>> functionality in the future.
>>>
>>> I'm fine with b as a choice, but it's going to involve more planning
>>> and code review and involve people outside of you, me and Dave :)
>>>
>>>
>>> Got it. David, are you okay with the details I added about option b?
>>> Let's start a new thread with title "Proposal: extend Metadata (MDNode) to
>>> support MDHash" or something like that.
>>>
>>>
>>> I'd like to hear a rationale for wanting b first. I don't know (and
>>> can think of) any other users of this functionality right now.
>>>
>>>
>>> I will try my best :]
>>> MDNode has a list of operands, one MDNode's full content can be thought of
>>> as a graph, when there is a reference to another MDNode, we have
>>> an edge. The MDHash can be used to give a unique id (or string) to the full
>>> content of the graph.
>>>
>>> The hash value applies to any kind of MDNode with complicated MDNode
>>> references.
>>>
>>> If you think this is not general enough to be in MDNode,
>>>
>>>
>>> Eric's concern/question is whether it's a feature that would be useful
>>> to any other code, I believe. (I'm less concerned about this so long
>>> as the feature is appropriately general, even if it doesn't have any
>>> other consumers at the moment - I don't think it'd drastically
>>> complicate the Metadata handling code (which I think is currently
>>> fairly simple, but I admit to not knowing it in any great detail))
>>>
>>>
>>> Yes, I tried to explain why MDHash can help with MDNodes that form a
>>> complicated graph.
>>>
>>>
>>>
>>> my earlier proposal of extending MDNode to DINode is going the other
>>> direction.
>>>
>>>
>>> As discussed, this isn't far enough in the other direction. If we're
>>> going to touch core LLVM Metadata handling code at all, it needs to be
>>> a complete/cohesive feature, not just tendrils of a debug info
>>> feature. Hence (a) or (b) not (c).
>>>
>>> Basically, MDNode will have an opcode like the opcode for Instruction, to
>>> support TBAA, DI, and other types.
>>> Inside streamers (bc ll reader), we can create the specialized class
>>> according to the opcode.
>>> That requires a lot of changes too :(
>>>
>>>
>>> We can add a weight to MDHash (other names are fine with me), to support
>>> multiple MDNodes with the same hash, but different weight.
>>>
>>>
>>> Wait, what? This sounds like a bad idea.
>>>
>>> Why?
>>> The weight is needed to differentiate a forward declaration vs. a
>>> definition, they both name the same class, but they are different.
>>>
>>>
>>> This would probably tip me over into Eric's perspective that it would
>>> be too esoteric a feature to add to a generic/core piece of
>>> infrastructure.
>>>
>>>
>>> If we allow the map to map one hash value (or string) to multiple MDNodes,
>>> without adding the weight, it should still work.
>>> But is that general enough?
>>
>> Then there would be no transparent access through the nodes - so I
>> don't know what the access API would look like anymore & it would seem
>> rather awkward as a general API.
>
> The access function will have two modes:
> 1> it does not matter which one is returned
> (This works for verification purposes and printing purposes.)
> 2> return all the MDNodes having the hash, DwarfDebug will figure out which one has priority
>
> It is not clean. I actually prefer adding a weight, it kind of makes sense when we are dealing with multiple MDNodes with the same hash.
> With weight, we can always pick one with higher weight (or priority).
>
> If we veto option b, we will only have option a, which has its own issues:
> 1> it is not really local to debug info, we have to modify AsmWriter at least.

I'm pretty sure I've kept saying we don't need to do this. Printing
asm can just print the identifier for the type node - it doesn't need
to navigate across that link.

> 2> we have to keep (and construct) one map at a few places.
> 3> we have to modify the following access function in DI classes
> getContext, getTypeDerivedFrom, getClassType, getContainingType, getType
> to take an extra map
> and of course the call sites of all these functions.
>
> Compare against putting the map in LLVMContext:
> 1> quoting from David
> "General principle of keeping things in the least scope necessary.
> Reducing scope helps code
> maintenance by making the influence of certain data is only accessed
> within a certain scope."
>
> Another issue is on how to submit the patches:
> I prefer to submit small patches and have a transition period where we use -gtype-uniquing to test the work in progress.

Incremental work doesn't require a flag. You can simply migrate one
field at a time. Is there a reason that wouldn't work?

> The advantages are
> 1> People can see our progress.
> 2> We can incrementally add more testing cases while testing the feature under the flag.
> 3> If the patch causes problem, we don't have to revert the big patch and submit another big patch that has the fix.
>
> Thanks,
> Manman
>
>>
>> The more you point out the reasonable complications in using such a
>> generic API for your task, the less it seems like it would be
>> sufficient/a good fit - the more I'd be inclined to go with Eric's
>> choice of just keeping the mapping locally (in DwarfDebug, etc) & not
>> impacting metadata APIs at all.
>



More information about the llvm-dev mailing list