[LLVMdev] Proposal: type uniquing of debug info for LTO

Manman Ren mren at apple.com
Fri Jun 21 16:30:50 PDT 2013


On Jun 21, 2013, at 4:01 PM, David Blaikie <dblaikie at gmail.com> wrote:

> On Fri, Jun 21, 2013 at 3:58 PM, Manman Ren <mren at apple.com> wrote:
>> 
>> On Jun 21, 2013, at 3:25 PM, David Blaikie wrote:
>> 
>>> On Fri, Jun 21, 2013 at 3:16 PM, Manman Ren <mren at apple.com> wrote:
>>>> 
>>>> On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>> 
>>>> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com> wrote:
>>>> 
>>>> 
>>>> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>>>> 
>>>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com> wrote:
>>>> 
>>>> 
>>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>> 
>>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at apple.com> wrote:
>>>> 
>>>> 
>>>> A summary of options for issue #3:
>>>> 3> To actually access the MDNode referenced via the hash value, we need to
>>>> perform a lookup from the hash value to find the corresponding MDNode.
>>>> The questions are where to store this map and how to keep it up-to-date when
>>>> a MDNode is replaced.
>>>> ---------------------
>>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify access
>>>> functions in DI classes to pass in the map.
>>>> May need to modify printing functions in AsmWriter.
>>>> 
>>>> 
>>>> You may not need the map in DIBuilder if you piggy back on the
>>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
>>>> for just debug dumps of the IR. A way around this might be nice, but
>>>> I've not looked into it.
>>>> 
>>>> 
>>>> Calling dump() on a MDNode inside gdb will not give us the derived type
>>>> information, since we don't have a handle to the map.
>>>> 
>>>> 
>>>> The only case where dump currently navigates a type link that I know
>>>> is printing derived types, where we print "[from foo]". By just
>>>> printing the string identifier for the type (without having to
>>>> navigate the link/use the map) should be about as useful here.
>>>> 
>>>> 
>>>> 
>>>> ---------------------
>>>> Option b) I am going to expand David's description with more details,
>>>> correct me if I am wrong.
>>>> 
>>>> 
>>>> I'll let David continue this part of the thread with you.
>>>> 
>>>> 
>>>> I personally prefer option b over a since it is much cleaner.
>>>> 
>>>> 
>>>> Option b is more general and should probably work. I suggested a
>>>> originally because I know it would work and the impact is limited to
>>>> debug info. Also it doesn't involve touching the generic metadata
>>>> interface or code and could be done quickly and incrementally. b may
>>>> also require further changes to the debug info infrastructure to deal
>>>> with lookup, etc so may end up being as much churn as a. b is more
>>>> useful if we think other users of metadata may want/need similar
>>>> functionality in the future.
>>>> 
>>>> I'm fine with b as a choice, but it's going to involve more planning
>>>> and code review and involve people outside of you, me and Dave :)
>>>> 
>>>> 
>>>> Got it. David, are you okay with the details I added about option b?
>>>> Let's start a new thread with title "Proposal: extend Metadata (MDNode) to
>>>> support MDHash" or something like that.
>>>> 
>>>> 
>>>> I'd like to hear a rationale for wanting b first. I don't know (and
>>>> can think of) any other users of this functionality right now.
>>>> 
>>>> 
>>>> I will try my best :]
>>>> MDNode has a list of operands, one MDNode's full content can be thought of
>>>> as a graph, when there is a reference to another MDNode, we have
>>>> an edge. The MDHash can be used to give a unique id (or string) to the full
>>>> content of the graph.
>>>> 
>>>> The hash value applies to any kind of MDNode with complicated MDNode
>>>> references.
>>>> 
>>>> If you think this is not general enough to be in MDNode,
>>>> 
>>>> 
>>>> Eric's concern/question is whether it's a feature that would be useful
>>>> to any other code, I believe. (I'm less concerned about this so long
>>>> as the feature is appropriately general, even if it doesn't have any
>>>> other consumers at the moment - I don't think it'd drastically
>>>> complicate the Metadata handling code (which I think is currently
>>>> fairly simple, but I admit to not knowing it in any great detail))
>>>> 
>>>> 
>>>> Yes, I tried to explain why MDHash can help with MDNodes that form a
>>>> complicated graph.
>>>> 
>>>> 
>>>> 
>>>> my earlier proposal of extending MDNode to DINode is going the other
>>>> direction.
>>>> 
>>>> 
>>>> As discussed, this isn't far enough in the other direction. If we're
>>>> going to touch core LLVM Metadata handling code at all, it needs to be
>>>> a complete/cohesive feature, not just tendrils of a debug info
>>>> feature. Hence (a) or (b) not (c).
>>>> 
>>>> Basically, MDNode will have an opcode like the opcode for Instruction, to
>>>> support TBAA, DI, and other types.
>>>> Inside streamers (bc ll reader), we can create the specialized class
>>>> according to the opcode.
>>>> That requires a lot of changes too :(
>>>> 
>>>> 
>>>> We can add a weight to MDHash (other names are fine with me), to support
>>>> multiple MDNodes with the same hash, but different weight.
>>>> 
>>>> 
>>>> Wait, what? This sounds like a bad idea.
>>>> 
>>>> Why?
>>>> The weight is needed to differentiate a forward declaration vs. a
>>>> definition, they both name the same class, but they are different.
>>>> 
>>>> 
>>>> This would probably tip me over into Eric's perspective that it would
>>>> be too esoteric a feature to add to a generic/core piece of
>>>> infrastructure.
>>>> 
>>>> 
>>>> If we allow the map to map one hash value (or string) to multiple MDNodes,
>>>> without adding the weight, it should still work.
>>>> But is that general enough?
>>> 
>>> Then there would be no transparent access through the nodes - so I
>>> don't know what the access API would look like anymore & it would seem
>>> rather awkward as a general API.
>> 
>> The access function will have two modes:
>> 1> it does not matter which one is returned
>> (This works for verification purposes and printing purposes.)
>> 2> return all the MDNodes having the hash, DwarfDebug will figure out which one has priority
>> 
>> It is not clean. I actually prefer adding a weight, it kind of makes sense when we are dealing with multiple MDNodes with the same hash.
>> With weight, we can always pick one with higher weight (or priority).
>> 
>> If we veto option b, we will only have option a, which has its own issues:
>> 1> it is not really local to debug info, we have to modify AsmWriter at least.
> 
> I'm pretty sure I've kept saying we don't need to do this. Printing
> asm can just print the identifier for the type node - it doesn't need
> to navigate across that link.

Okay, if we are fine with no access to the link in a gdb | lldb session.

> 
>> 2> we have to keep (and construct) one map at a few places.
>> 3> we have to modify the following access function in DI classes
>> getContext, getTypeDerivedFrom, getClassType, getContainingType, getType
>> to take an extra map
>> and of course the call sites of all these functions.
>> 
>> Compare against putting the map in LLVMContext:
>> 1> quoting from David
>> "General principle of keeping things in the least scope necessary.
>> Reducing scope helps code
>> maintenance by making the influence of certain data is only accessed
>> within a certain scope."
>> 
>> Another issue is on how to submit the patches:
>> I prefer to submit small patches and have a transition period where we use -gtype-uniquing to test the work in progress.
> 
> Incremental work doesn't require a flag. You can simply migrate one
> field at a time. Is there a reason that wouldn't work?

The reason I want a flag is to avoid the need to update the existing testing cases while this is a work in progress.
I believe migrating one field means updating the existing testing cases?

Thanks,
Manman

> 
>> The advantages are
>> 1> People can see our progress.
>> 2> We can incrementally add more testing cases while testing the feature under the flag.
>> 3> If the patch causes problem, we don't have to revert the big patch and submit another big patch that has the fix.
>> 
>> Thanks,
>> Manman
>> 
>>> 
>>> The more you point out the reasonable complications in using such a
>>> generic API for your task, the less it seems like it would be
>>> sufficient/a good fit - the more I'd be inclined to go with Eric's
>>> choice of just keeping the mapping locally (in DwarfDebug, etc) & not
>>> impacting metadata APIs at all.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130621/5effac04/attachment.html>


More information about the llvm-dev mailing list