[Type Uniquing Patch] Modify functions in DebugInfo.h to take an extra map argument.

Thu Jul 18 22:10:00 PDT 2013

On Thu, Jul 18, 2013 at 10:05 PM, Manman Ren <mren at apple.com> wrote:
>
> On Jul 18, 2013, at 9:58 PM, David Blaikie wrote:
>
>
> On Jul 18, 2013 6:31 PM, "Eric Christopher" <echristo at gmail.com> wrote:
>>
>> > Somehow I can't put inline comments.
>>
>> Beta software? ;)
>>
>> From David:
>>
>> >
>> > Rather than having foo.getThing(map) we'd have something like
>> > lookup(foo.getThing()) - we'd need a return type from getThing that
>> > represent as the type identifier (the thing we keep calling a 'hash' but
>> > it
>> > isn't), we could just use stringref or something more specific.
>> >
>>
>> From you:
>> > Is lookup here a global function like "llvm::isSubprogramContext" or a
>> > member function of the DIMap?
>> > If the former, then it should be lookup(StringRef TypeName, DIMap &),
>> > right?
>> > We also need to modify foo.getThing() to return a StringRef instead of a
>> > DIType, right?
>> > In terms of lines of change, two options should be similar.
>> >
>>
>> I was thinking the former yes.
>
> I was picturing just a helper member function of dwarfdebug so it could be
> written in that context as lookup(foo.getThing()), but short of that, if we
> actually have a dimap type, it probably makes sense as a member of that. If
> we don't then it should be fine as a free function.
>
>> It should probably return a StringRef,
>> or an DITypeHandle, or something. Leave the actual choice there to
>> you.
>
> A typedef might be healthy here at least, so we can change it later of we
> need to.
>
>>
>> > About the Verify functions, the patch modifies from Verify() to
>> > Verify(DIMap
>> > &). Is this interface change okay?
>>
>> Again, I was hoping not to change the signature of these functions if
>> possible.
>
> Agreed. Let's drop anything from verify that needs type resolution, unless
> there's a good reason not to. We should move verification to a single pass
> at some point which could use resolution there,
>
>>
>> > What about isUnsignedDIType, getOriginalTypeSize, isBlockByrefVariable,
>> > and
>> > isSubprogramContext? Should they take an additional DIMap?
>> > -    bool isUnsignedDIType();
>> > +    bool isUnsignedDIType(const DITypeHashMap &Map);
>> >
>> > Are the changes okay for
>> > -  bool isArtificial()                const {
>> > +  bool isArtificial(const DITypeHashMap &Map) const {
>> > and other member functions in DbgVariable?
>> >
>>
>> From David:
>>
>> > As we discussed before, we shouldn't need conditional lookup (so we
>> > shouldn't be in a state where 'getThing' returns something that might be
>> > a
>> > type identifier or might be the type itself). We should just migrate one
>> > field at a time (or a bunch or all of them, but not switching the
>> > consuming
>> > code before the production code, otherwise were just adding a bunch of
>> > dead/untested code)
>> >
>>
>> From you:
>>
>> > There are some cases where a thing can be either a StringRef or a
>> > DIDescriptor such as getContext() or getElement() of a DIArray.
>> > What type should getElement() or getContext() return? A Value*?
>> >
>> > Downside for what is implemented in the patch:
>> > What if someone just wants the StringRef? (the patch added
>> > getNameDerivedFrom, Value* getType)
>> >
>> > Downside for the suggested lookup:
>> > Verify takes a DIMap, getThing does not
>> > getThing returns StringRef, getContext, getElement returns a Value
>> > It requires updating the testing cases since the thing must be a
>> > StringRef.
>> >
>> > The first two items are not really downsides, I am just worried about
>> > the
>> > last item, see below.
>> >
>>
>> I think your plan above there will work. It'll come down to which one
>> is cleaner ultimately. Our feeling was that it'd be cleaner not
>> passing the map everywhere and doing the lookup separately to keep the
>> concerns separate of "field" and "underlying value".
>>
>> The below is thinking out loud:
>>
>> As far as using Value * here that seems reasonable... or add a new
>> class of DIType that is a "reference to" type that just contains the
>> string and can be passed around like everything else, then lookup just
>> needs to happen when we want to change/emit it. How's that sound?
>
> Gut reaction: value* is top general and deriving from ditype is incorrect,
> since the whole api would be invalid as the type couldnt be implicitly
> resolved on demand (adding an extra state to this special ditype, which
> seems unfortunate). I think a simple type with a 'resolve(map)' function
> (could be written as a non member, in which case the type would need members
> to differentiate and return the ditype or string/ditypehandle) that returns
> the ditype one way or another, internally if use a pointerunion of string
> value or mdnode, perhaps? (Yeah, just a value pointer would do, with dyn
> cast, I suppose, but at least it'd be hidden away in that type)
>
>>
>> > There are additional code changes required to migrate a single field:
>> > 1> update the backend to support one MDNode (DIE) used in two different
>> > CUs.
>> > 2> modify DIBuilder to use StringRef instead of DIType
>> > 3> to generate the unique String name in clang.
>>
>> #3 is definitely a requirement for this. The "name" of a type
>> absolutely must be in the IR in place of any DIType fields. I'm not
>> quite sure what you mean by #1 and #2.
>
> 1 and 2 make sense to me, I think, I'll try to explain them to you in
> person, as I understand them.
>
> But these changes can be made incrementally - the backend and frontend need
> to change the schema atomically (including updating test cases), but once
> per field not for all the fields together (yes, this means thrashing the
> test cases as I did in the past, by that's OK). But 2 can then be done
> separately for each of those changes - adding the merging support.
>
>
> Attaching patches for 1, 2 and 3.
> 1 is needed at least for performance, to avoid creating multiple DIEs for a
> single MDNode that is shared across CUs.

But this is the /feature/ you're implementing, right? That's always
optional. We want to commit the patches that enable you to implement
that feature first, separately, if at all possible. Is there a reason
that's not possible? Does switching to a string lookup somehow degrade
performance we already have? If not, then (1) is incremental -
separate from changing any field to use the string lookup approach.

> We used to have a map in DwarfCompileUnit which maps from a MDNode to a DIE.
> I am not sure whether it is required for correctness.
>
> 2 can be incremental if we are going to update a batch of fields at a time.
>
> I will get back to you on other issues soon.
>
> Thanks
> Manman
>
>
>
>>
>> >      Any suggestion on how to generate a unique name for internal types
>> > defined within a lexical scope?
>> >      int a(int i) {
>> >         if (i) {
>> >            struct A {
>> >               …
>> >            };
>> >         }
>> >      }
>>
>> Yep. Walk up the scope appending the name of enclosing
>> functions/namespace/etc to the name.
>
> We should have mangling support for a such names (as they can be used for
> template instantiations), plus adding the cu filename for anything with
> internal linkage.
>
>>
>> >
>> > The existing testing cases need major cleanup, they are not in
>> > verifiable
>> > states. As I mentioned earlier, there are 110 testing cases that failed
>> > to
>> > verify.
>> > It is hard to correctly update the testing cases since we don't have the
>> > source code to regenerate the debug info from.
>>
>> Yep. This is why we started putting in the C code if we could.
>
> Yes. We can't leave the test cases to rot. We must update them as the schema
> changes or we lose a lot of test coverage because it no longer tests what
> users are using.
>
> If/when we implement a full verifier we will find lots of
> outdated/incomplete/broken tests and we will have to update or delete them.
> I suggest we do that incrementally, verifying new parts and updating tests
> cases go pass each time. Yes, this will be a lot of work.
>
>>
>> > The only way I can think of is to go back to the state before David's
>> > series
>> > of changes (assuming that state is correct, which I am not sure about),
>> > then
>> > re-apply the changes manually or using script.
>> > I think David submitted > 5 patches to change the format of the DI
>> > MDNodes.
>> >
>> > Before cleaning up the testing cases, I don't think it is a good idea to
>> > perform another series of update to the testing cases.
>
> We can't let the test cases rot. If we need to add stronger verification
> now, then we need to update the existing tests to pass such verification.
>
>> >
>>
>> That's fair, but it seems like we don't need a lot of update. Just
>> about any format change is going to involve a lot of changing of debug
>> info test cases. It's unfortunate but true.
>>
>> > If we want to go with the lookup option, we need to combine all above
>> > (additional code changes and testing cases update) in a single patch.
>> > I am not against the lookup option, but does not like submitting a big
>> > patch: 80K code change + 229k added testing cases + xxx existing testing
>> > cases.
>
> Yeah, I don't see why this is true. Why can't we move one type field over at
> a time?
>
> And if we need to add stronger verification (not sure why that need is
> happening now, but I'm pk if it is, we should do it as soon as we can) then,
> as I said, we should do so incrementally - verifying one feature, updating
> testsm etc.
>
>> >
>>
>> Gah. You sure now after your patch of earlier?
>>
>> > Let me know your thoughts on how to move this forward.
>> >
>>
>> Hope this helps.
>>
>> -eric
>>
>> > Thanks,
>> > Manman
>> >
>> > On Jul 15, 2013, at 10:13 PM, David Blaikie wrote:
>> >
>> >
>> > On Jul 15, 2013 10:06 PM, "Eric Christopher" <echristo at gmail.com> wrote:
>> >>
>> >> As a note, pretty sick right now. I'll get to this shortly. As a quick
>> >> look I think we don't want to pass in the map, but rather look it up
>> >> in the uses?
>> >
>> > Manman- just to expand on this, since Eric and I have tossed the idea
>> > around
>> > a bit...
>> >
>> > Rather than having foo.getThing(map) we'd have something like
>> > lookup(foo.getThing()) - we'd need a return type from getThing that
>> > represent as the type identifier (the thing we keep calling a 'hash' but
>> > it
>> > isn't), we could just use stringref or something more specific.
>> >
>> > Does that make sense? What do you think (we didn't really have a strong
>> > feeling, but I understand/can agree wit Eric erring this way)
>> >
>> > As we discussed before, we shouldn't need conditional lookup (so we
>> > shouldn't be in a state where 'getThing' returns something that might be
>> > a
>> > type identifier or might be the type itself). We should just migrate one
>> > field at a time (or a bunch or all of them, but not switching the
>> > consuming
>> > code before the production code, otherwise were just adding a bunch of
>> > dead/untested code)
>> >
>> >>
>> >> -eric
>> >>
>> >> On Fri, Jul 12, 2013 at 3:13 PM, Manman Ren <mren at apple.com> wrote:
>> >> >
>> >> > Thanks,
>> >> > Manman
>> >> >
>> >> > On Jul 12, 2013, at 2:44 PM, Eric Christopher wrote:
>> >> >
>> >> >> Getting to it shortly, week has been roughly crazy.
>> >> >>
>> >> >> -eric
>> >> >>
>> >> >> On Thu, Jul 11, 2013 at 9:24 PM, Manman Ren <mren at apple.com> wrote:
>> >> >>>
>> >> >>> Private ping :)
>> >> >>>
>> >> >>> Did you get a chance to look at the patch?
>> >> >>>
>> >> >>> Things on my list:
>> >> >>> 1> refactor a helper function to query module flag
>> >> >>> 2> usage of getDICompositeType on dragonegg
>> >> >>> 3> try to use assert in r185847
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Manman
>> >> >>>
>> >> >>> On Jul 9, 2013, at 9:48 AM, Manman Ren wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> Modify access functions that can return a DIType to take an extra
>> >> >>>> map
>> >> >>>> argument.
>> >> >>>> Add helper function
>> >> >>>> +  void collectDITypeHashMap(const NamedMDNode *CU_Nodes,
>> >> >>>> DITypeHashMap &Map);
>> >> >>>> to update DITypeHashMap by going through RetainedTypes of each CU.
>> >> >>>> Testing cases are added to verify llvm-link can correctly unique
>> >> >>>> types if the input .ll files use string instead of DIType.
>> >> >>>>
>> >> >>>> To make sure we can dump MDNode comments even without a
>> >> >>>> DITypeHashMap, isDebugInfo is used instead of
>> >> >>>> Verify in WriteMDNodeComment. Neither is accurate for checking
>> >> >>>> whether MDNode is indeed debug info.
>> >> >>>>
>> >> >>>> Thanks,
>> >> >>>> Manman
>> >> >>>>
>> >> >>>> <type_unique_di.patch><type_unique_test.patch>
>> >> >>>
>> >> >
>> >
>> >
>
>
>