[LLVMdev] Proposal: type uniquing of debug info for LTO

Fri Jun 21 15:25:53 PDT 2013

On Fri, Jun 21, 2013 at 3:16 PM, Manman Ren <mren at apple.com> wrote:
>
> On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com> wrote:
>
>
> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>
> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com> wrote:
>
>
> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>
> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at apple.com> wrote:
>
>
> A summary of options for issue #3:
> 3> To actually access the MDNode referenced via the hash value, we need to
> perform a lookup from the hash value to find the corresponding MDNode.
> The questions are where to store this map and how to keep it up-to-date when
> a MDNode is replaced.
> ---------------------
> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify access
> functions in DI classes to pass in the map.
> May need to modify printing functions in AsmWriter.
>
>
> You may not need the map in DIBuilder if you piggy back on the
> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
> for just debug dumps of the IR. A way around this might be nice, but
> I've not looked into it.
>
>
> Calling dump() on a MDNode inside gdb will not give us the derived type
> information, since we don't have a handle to the map.
>
>
> The only case where dump currently navigates a type link that I know
> is printing derived types, where we print "[from foo]". By just
> printing the string identifier for the type (without having to
> navigate the link/use the map) should be about as useful here.
>
>
>
> ---------------------
> Option b) I am going to expand David's description with more details,
> correct me if I am wrong.
>
>
> I'll let David continue this part of the thread with you.
>
>
> I personally prefer option b over a since it is much cleaner.
>
>
> Option b is more general and should probably work. I suggested a
> originally because I know it would work and the impact is limited to
> debug info. Also it doesn't involve touching the generic metadata
> interface or code and could be done quickly and incrementally. b may
> also require further changes to the debug info infrastructure to deal
> with lookup, etc so may end up being as much churn as a. b is more
> useful if we think other users of metadata may want/need similar
> functionality in the future.
>
> I'm fine with b as a choice, but it's going to involve more planning
> and code review and involve people outside of you, me and Dave :)
>
>
> Got it. David, are you okay with the details I added about option b?
> Let's start a new thread with title "Proposal: extend Metadata (MDNode) to
> support MDHash" or something like that.
>
>
> I'd like to hear a rationale for wanting b first. I don't know (and
> can think of) any other users of this functionality right now.
>
>
> I will try my best :]
> MDNode has a list of operands, one MDNode's full content can be thought of
> as a graph, when there is a reference to another MDNode, we have
> an edge. The MDHash can be used to give a unique id (or string) to the full
> content of the graph.
>
> The hash value applies to any kind of MDNode with complicated MDNode
> references.
>
> If you think this is not general enough to be in MDNode,
>
>
> Eric's concern/question is whether it's a feature that would be useful
> to any other code, I believe. (I'm less concerned about this so long
> as the feature is appropriately general, even if it doesn't have any
> other consumers at the moment - I don't think it'd drastically
> complicate the Metadata handling code (which I think is currently
> fairly simple, but I admit to not knowing it in any great detail))
>
>
> Yes, I tried to explain why MDHash can help with MDNodes that form a
> complicated graph.
>
>
>
> my earlier proposal of extending MDNode to DINode is going the other
> direction.
>
>
> As discussed, this isn't far enough in the other direction. If we're
> going to touch core LLVM Metadata handling code at all, it needs to be
> a complete/cohesive feature, not just tendrils of a debug info
> feature. Hence (a) or (b) not (c).
>
> Basically, MDNode will have an opcode like the opcode for Instruction, to
> support TBAA, DI, and other types.
> Inside streamers (bc ll reader), we can create the specialized class
> according to the opcode.
> That requires a lot of changes too :(
>
>
> We can add a weight to MDHash (other names are fine with me), to support
> multiple MDNodes with the same hash, but different weight.
>
>
> Wait, what? This sounds like a bad idea.
>
> Why?
> The weight is needed to differentiate a forward declaration vs. a
> definition, they both name the same class, but they are different.
>
>
> This would probably tip me over into Eric's perspective that it would
> be too esoteric a feature to add to a generic/core piece of
> infrastructure.
>
>
> If we allow the map to map one hash value (or string) to multiple MDNodes,
> without adding the weight, it should still work.
> But is that general enough?

Then there would be no transparent access through the nodes - so I
don't know what the access API would look like anymore & it would seem
rather awkward as a general API.

The more you point out the reasonable complications in using such a
generic API for your task, the less it seems like it would be
sufficient/a good fit - the more I'd be inclined to go with Eric's
choice of just keeping the mapping locally (in DwarfDebug, etc) & not
impacting metadata APIs at all.