[Type Uniquing Patch] Modify functions in DebugInfo.h to take an extra map argument.

Thu Jul 18 22:47:43 PDT 2013

On Jul 18, 2013, at 10:10 PM, David Blaikie wrote:

> On Thu, Jul 18, 2013 at 10:05 PM, Manman Ren <mren at apple.com> wrote:
>> 
>> On Jul 18, 2013, at 9:58 PM, David Blaikie wrote:
>> 
>> 
>> On Jul 18, 2013 6:31 PM, "Eric Christopher" <echristo at gmail.com> wrote:
>>> 
>>>> Somehow I can't put inline comments.
>>> 
>>> Beta software? ;)
>>> 
>>> From David:
>>> 
>>>> 
>>>> Rather than having foo.getThing(map) we'd have something like
>>>> lookup(foo.getThing()) - we'd need a return type from getThing that
>>>> represent as the type identifier (the thing we keep calling a 'hash' but
>>>> it
>>>> isn't), we could just use stringref or something more specific.
>>>> 
>>> 
>>> From you:
>>>> Is lookup here a global function like "llvm::isSubprogramContext" or a
>>>> member function of the DIMap?
>>>> If the former, then it should be lookup(StringRef TypeName, DIMap &),
>>>> right?
>>>> We also need to modify foo.getThing() to return a StringRef instead of a
>>>> DIType, right?
>>>> In terms of lines of change, two options should be similar.
>>>> 
>>> 
>>> I was thinking the former yes.
>> 
>> I was picturing just a helper member function of dwarfdebug so it could be
>> written in that context as lookup(foo.getThing()), but short of that, if we
>> actually have a dimap type, it probably makes sense as a member of that. If
>> we don't then it should be fine as a free function.
>> 
>>> It should probably return a StringRef,
>>> or an DITypeHandle, or something. Leave the actual choice there to
>>> you.
>> 
>> A typedef might be healthy here at least, so we can change it later of we
>> need to.
>> 
>>> 
>>>> About the Verify functions, the patch modifies from Verify() to
>>>> Verify(DIMap
>>>> &). Is this interface change okay?
>>> 
>>> Again, I was hoping not to change the signature of these functions if
>>> possible.
>> 
>> Agreed. Let's drop anything from verify that needs type resolution, unless
>> there's a good reason not to. We should move verification to a single pass
>> at some point which could use resolution there,
>> 
>>> 
>>>> What about isUnsignedDIType, getOriginalTypeSize, isBlockByrefVariable,
>>>> and
>>>> isSubprogramContext? Should they take an additional DIMap?
>>>> -    bool isUnsignedDIType();
>>>> +    bool isUnsignedDIType(const DITypeHashMap &Map);
>>>> 
>>>> Are the changes okay for
>>>> -  bool isArtificial()                const {
>>>> +  bool isArtificial(const DITypeHashMap &Map) const {
>>>> and other member functions in DbgVariable?
>>>> 
>>> 
>>> From David:
>>> 
>>>> As we discussed before, we shouldn't need conditional lookup (so we
>>>> shouldn't be in a state where 'getThing' returns something that might be
>>>> a
>>>> type identifier or might be the type itself). We should just migrate one
>>>> field at a time (or a bunch or all of them, but not switching the
>>>> consuming
>>>> code before the production code, otherwise were just adding a bunch of
>>>> dead/untested code)
>>>> 
>>> 
>>> From you:
>>> 
>>>> There are some cases where a thing can be either a StringRef or a
>>>> DIDescriptor such as getContext() or getElement() of a DIArray.
>>>> What type should getElement() or getContext() return? A Value*?
>>>> 
>>>> Downside for what is implemented in the patch:
>>>> What if someone just wants the StringRef? (the patch added
>>>> getNameDerivedFrom, Value* getType)
>>>> 
>>>> Downside for the suggested lookup:
>>>> Verify takes a DIMap, getThing does not
>>>> getThing returns StringRef, getContext, getElement returns a Value
>>>> It requires updating the testing cases since the thing must be a
>>>> StringRef.
>>>> 
>>>> The first two items are not really downsides, I am just worried about
>>>> the
>>>> last item, see below.
>>>> 
>>> 
>>> I think your plan above there will work. It'll come down to which one
>>> is cleaner ultimately. Our feeling was that it'd be cleaner not
>>> passing the map everywhere and doing the lookup separately to keep the
>>> concerns separate of "field" and "underlying value".
>>> 
>>> The below is thinking out loud:
>>> 
>>> As far as using Value * here that seems reasonable... or add a new
>>> class of DIType that is a "reference to" type that just contains the
>>> string and can be passed around like everything else, then lookup just
>>> needs to happen when we want to change/emit it. How's that sound?
>> 
>> Gut reaction: value* is top general and deriving from ditype is incorrect,
>> since the whole api would be invalid as the type couldnt be implicitly
>> resolved on demand (adding an extra state to this special ditype, which
>> seems unfortunate). I think a simple type with a 'resolve(map)' function
>> (could be written as a non member, in which case the type would need members
>> to differentiate and return the ditype or string/ditypehandle) that returns
>> the ditype one way or another, internally if use a pointerunion of string
>> value or mdnode, perhaps? (Yeah, just a value pointer would do, with dyn
>> cast, I suppose, but at least it'd be hidden away in that type)
>> 
>>> 
>>>> There are additional code changes required to migrate a single field:
>>>> 1> update the backend to support one MDNode (DIE) used in two different
>>>> CUs.
>>>> 2> modify DIBuilder to use StringRef instead of DIType
>>>> 3> to generate the unique String name in clang.
>>> 
>>> #3 is definitely a requirement for this. The "name" of a type
>>> absolutely must be in the IR in place of any DIType fields. I'm not
>>> quite sure what you mean by #1 and #2.
>> 
>> 1 and 2 make sense to me, I think, I'll try to explain them to you in
>> person, as I understand them.
>> 
>> But these changes can be made incrementally - the backend and frontend need
>> to change the schema atomically (including updating test cases), but once
>> per field not for all the fields together (yes, this means thrashing the
>> test cases as I did in the past, by that's OK). But 2 can then be done
>> separately for each of those changes - adding the merging support.
>> 
>> 
>> Attaching patches for 1, 2 and 3.
>> 1 is needed at least for performance, to avoid creating multiple DIEs for a
>> single MDNode that is shared across CUs.
> 
> But this is the /feature/ you're implementing, right? That's always
> optional. We want to commit the patches that enable you to implement
> that feature first, separately, if at all possible. Is there a reason
> that's not possible? Does switching to a string lookup somehow degrade
> performance we already have? If not, then (1) is incremental -
> separate from changing any field to use the string lookup approach.

I should rephrase "1 is needed at least to improve performance" :)

As I mentioned in my earlier reply, I am not sure whether 1 is needed for correctness.
I am going to find out soon.

Once we replace MDNode with String, we may have a single MDNode shared by two CUs when
two CUs refer to the same class (same unique name).
That may cause issue in the backend.

Manman

> 
> 
>> We used to have a map in DwarfCompileUnit which maps from a MDNode to a DIE.
>> I am not sure whether it is required for correctness.
>> 
>> 2 can be incremental if we are going to update a batch of fields at a time.
>> 
>> I will get back to you on other issues soon.
>> 
>> Thanks
>> Manman
>> 
>> 
>> 
>>> 
>>>>     Any suggestion on how to generate a unique name for internal types
>>>> defined within a lexical scope?
>>>>     int a(int i) {
>>>>        if (i) {
>>>>           struct A {
>>>>              …
>>>>           };
>>>>        }
>>>>     }
>>> 
>>> Yep. Walk up the scope appending the name of enclosing
>>> functions/namespace/etc to the name.
>> 
>> We should have mangling support for a such names (as they can be used for
>> template instantiations), plus adding the cu filename for anything with
>> internal linkage.
>> 
>>> 
>>>> 
>>>> The existing testing cases need major cleanup, they are not in
>>>> verifiable
>>>> states. As I mentioned earlier, there are 110 testing cases that failed
>>>> to
>>>> verify.
>>>> It is hard to correctly update the testing cases since we don't have the
>>>> source code to regenerate the debug info from.
>>> 
>>> Yep. This is why we started putting in the C code if we could.
>> 
>> Yes. We can't leave the test cases to rot. We must update them as the schema
>> changes or we lose a lot of test coverage because it no longer tests what
>> users are using.
>> 
>> If/when we implement a full verifier we will find lots of
>> outdated/incomplete/broken tests and we will have to update or delete them.
>> I suggest we do that incrementally, verifying new parts and updating tests
>> cases go pass each time. Yes, this will be a lot of work.
>> 
>>> 
>>>> The only way I can think of is to go back to the state before David's
>>>> series
>>>> of changes (assuming that state is correct, which I am not sure about),
>>>> then
>>>> re-apply the changes manually or using script.
>>>> I think David submitted > 5 patches to change the format of the DI
>>>> MDNodes.
>>>> 
>>>> Before cleaning up the testing cases, I don't think it is a good idea to
>>>> perform another series of update to the testing cases.
>> 
>> We can't let the test cases rot. If we need to add stronger verification
>> now, then we need to update the existing tests to pass such verification.
>> 
>>>> 
>>> 
>>> That's fair, but it seems like we don't need a lot of update. Just
>>> about any format change is going to involve a lot of changing of debug
>>> info test cases. It's unfortunate but true.
>>> 
>>>> If we want to go with the lookup option, we need to combine all above
>>>> (additional code changes and testing cases update) in a single patch.
>>>> I am not against the lookup option, but does not like submitting a big
>>>> patch: 80K code change + 229k added testing cases + xxx existing testing
>>>> cases.
>> 
>> Yeah, I don't see why this is true. Why can't we move one type field over at
>> a time?
>> 
>> And if we need to add stronger verification (not sure why that need is
>> happening now, but I'm pk if it is, we should do it as soon as we can) then,
>> as I said, we should do so incrementally - verifying one feature, updating
>> testsm etc.
>> 
>>>> 
>>> 
>>> Gah. You sure now after your patch of earlier?
>>> 
>>>> Let me know your thoughts on how to move this forward.
>>>> 
>>> 
>>> Hope this helps.
>>> 
>>> -eric
>>> 
>>>> Thanks,
>>>> Manman
>>>> 
>>>> On Jul 15, 2013, at 10:13 PM, David Blaikie wrote:
>>>> 
>>>> 
>>>> On Jul 15, 2013 10:06 PM, "Eric Christopher" <echristo at gmail.com> wrote:
>>>>> 
>>>>> As a note, pretty sick right now. I'll get to this shortly. As a quick
>>>>> look I think we don't want to pass in the map, but rather look it up
>>>>> in the uses?
>>>> 
>>>> Manman- just to expand on this, since Eric and I have tossed the idea
>>>> around
>>>> a bit...
>>>> 
>>>> Rather than having foo.getThing(map) we'd have something like
>>>> lookup(foo.getThing()) - we'd need a return type from getThing that
>>>> represent as the type identifier (the thing we keep calling a 'hash' but
>>>> it
>>>> isn't), we could just use stringref or something more specific.
>>>> 
>>>> Does that make sense? What do you think (we didn't really have a strong
>>>> feeling, but I understand/can agree wit Eric erring this way)
>>>> 
>>>> As we discussed before, we shouldn't need conditional lookup (so we
>>>> shouldn't be in a state where 'getThing' returns something that might be
>>>> a
>>>> type identifier or might be the type itself). We should just migrate one
>>>> field at a time (or a bunch or all of them, but not switching the
>>>> consuming
>>>> code before the production code, otherwise were just adding a bunch of
>>>> dead/untested code)
>>>> 
>>>>> 
>>>>> -eric
>>>>> 
>>>>> On Fri, Jul 12, 2013 at 3:13 PM, Manman Ren <mren at apple.com> wrote:
>>>>>> 
>>>>>> Thanks,
>>>>>> Manman
>>>>>> 
>>>>>> On Jul 12, 2013, at 2:44 PM, Eric Christopher wrote:
>>>>>> 
>>>>>>> Getting to it shortly, week has been roughly crazy.
>>>>>>> 
>>>>>>> -eric
>>>>>>> 
>>>>>>> On Thu, Jul 11, 2013 at 9:24 PM, Manman Ren <mren at apple.com> wrote:
>>>>>>>> 
>>>>>>>> Private ping :)
>>>>>>>> 
>>>>>>>> Did you get a chance to look at the patch?
>>>>>>>> 
>>>>>>>> Things on my list:
>>>>>>>> 1> refactor a helper function to query module flag
>>>>>>>> 2> usage of getDICompositeType on dragonegg
>>>>>>>> 3> try to use assert in r185847
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Manman
>>>>>>>> 
>>>>>>>> On Jul 9, 2013, at 9:48 AM, Manman Ren wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Modify access functions that can return a DIType to take an extra
>>>>>>>>> map
>>>>>>>>> argument.
>>>>>>>>> Add helper function
>>>>>>>>> +  void collectDITypeHashMap(const NamedMDNode *CU_Nodes,
>>>>>>>>> DITypeHashMap &Map);
>>>>>>>>> to update DITypeHashMap by going through RetainedTypes of each CU.
>>>>>>>>> Testing cases are added to verify llvm-link can correctly unique
>>>>>>>>> types if the input .ll files use string instead of DIType.
>>>>>>>>> 
>>>>>>>>> To make sure we can dump MDNode comments even without a
>>>>>>>>> DITypeHashMap, isDebugInfo is used instead of
>>>>>>>>> Verify in WriteMDNodeComment. Neither is accurate for checking
>>>>>>>>> whether MDNode is indeed debug info.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Manman
>>>>>>>>> 
>>>>>>>>> <type_unique_di.patch><type_unique_test.patch>
>>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 
>>