[PATCH] First set of patches for type uniquing

Mon Aug 26 12:58:16 PDT 2013

Generally we include test changes in patches that require them, rather
than splitting them out. I assume in this instance you split the
patches for ease of reviewing, but intend to commit them together (so
as not to break the buildbots)

It looks like we're constraining the DICompositeType schema a little
further than it was before - as you can see from your patch, some
DICompositeTypes had 13 fields (no template argument field) and some
have 14. Now they all have 15. I think this is OK, but just want to
highlight it as a change in schema. (personally I don't like the vague
optionality of trailing fields like that - if we're going to try to
reduce the size of these records (there is a lot of redundancy in them
- some type descriptions don't need nearly so many fields) we should
sit down & do that deliberately rather than with little hacks, so
making things more consistent is good as it'll make it easier to
decide where to try to remove such redundancy)

While you're here, you could change those lines that use
"Constant::getNullValue" to just use NULL as we do in other places (I
assume the "getNullValue" is how we end up with i32 0 as 'null' values
sometimes? when we really want null values anyway, and it's shorter to
write) such as in createSubroutineType, createEnumerationType,
createArrayType - just on the lines you're already touching (because
you have to add a ',') anyway.

But otherwise this looks correct - Please commit. (I haven't tested
this, so leaving it up to you to provide assurances that your test
changes are necessary & sufficient - I glanced at a few, but didn't
rigorously investigate them all).

On Mon, Aug 26, 2013 at 12:50 PM, Manman Ren <manman.ren at gmail.com> wrote:
>
> Eric, David and I chatted on IRC and agreed to submit smaller patches.
> We will add an identifier field to DICompositeType. For DICompositeTypes
> with a non-null identifier field, its reference can
> be replaced with the identifier to remove cycles and to help type uniquing.
>
> As a first step, I am going to add a field to DICompositeType, the attached
> patches implement that:
> DICompositeType will have an identifier field at position 14. For now, the
> field is set to null by DIBuilder.
> Update verifier to check that DICompositeType has 15 fields and the last
> field is null or a MDString.
> Update testing cases to include an extra field for DICompositeType.
> The identifier field will be used by type uniquing so a front end can
> generate a DICompositeType with a unique identifier.
>
> 3 patches are attached, two of them update testing cases.
>
> Thanks,
> Manman
>
>
>
> On Fri, Aug 23, 2013 at 10:49 AM, Eric Christopher <echristo at gmail.com>
> wrote:
>>
>> >>
>> >> I'm still looking at the patch but one comment that will affect a lot
>> >> of the text of the patch is that I think the unique name should be
>> >> passed into the type creation functions, not created inside the
>> >> backend. This means that each front end can use the definition of
>> >> "unique" that most appeals to them.
>> >
>> >
>> >
>> > I know dragonegg calls those type creation functions, and there are
>> > maybe
>> > other out-of-tree usages.
>> > This patch provides default value for the unique name (an empty string)
>> > to
>> > the type creation functions, so
>> > it is not going to break dragonegg or out-of-tree projects. Inside the
>> > type
>> > creation functions, if the passed-in
>> > unique name is empty, DIBuilder will generate a unique name. I am not
>> > sure
>> > if it is necessary for dragonegg
>> > to have its own implementation of generating the unique name, and how
>> > hard
>> > it is.
>>
>> It shouldn't be too hard, the question is what should go into the
>> field. I don't know how comfortable I am with emitting a "unique" name
>> out of the backend if one isn't provided. I think the best method here
>> would be this:
>>
>> The type field can either be an MDString or an MDNode. The MDNode will
>> be the referenced type as is currently used. The MDString would be the
>> unique name for the type that will be used to disambiguate as we've
>> discussed (also to remove cycles and references so we can unique).
>> This will avoid the need to change clients except for those that wish
>> to take advantage of this.
>>
>> Here's a very incremental way of doing the patch I think:
>>
>> 1) Add a field to DIComposite type which is initially just a direct
>> reference back to itself.
>> 2) Ensure that these end up in the retain types list.
>> 3) Then build the map based on that key, which is just a node
>> reference at the moment.
>>  - Each field can thus be done incrementally and mapped one at a time.
>> 4) Then start passing in strings one at a time.
>>
>> This will make it easier to review etc.
>>
>> Then we can get to the declaration deduplication and
>> declaration/defined type merging.
>>
>> What do you think?
>>
>> >> Also I'm not sure why you've got an actual hashing algorithm in there.
>> >> Can you explain?
>> >
>> >
>> > Which hashing algorithm are you referring to?
>> > I don't recall a hashing algorithm in the patch.
>> >
>>
>> +    static unsigned getHashValue(const Pair& PairVal) {
>> +      uint64_t Key = (uint64_t)hash_value(PairVal.first) << 32
>> +                     | (uint64_t)PairVal.second;
>> +      Key += ~(Key << 32);
>> +      Key ^= (Key >> 22);
>> +      Key += ~(Key << 13);
>> +      Key ^= (Key >> 8);
>> +      Key += (Key << 3);
>> +      Key ^= (Key >> 15);
>> +      Key += ~(Key << 27);
>> +      Key ^= (Key >> 31);
>> +      return (unsigned)Key;
>> +    }
>>
>> Related to the many pairs that we talk about below.
>>
>> >> Couple other things I noticed:
>> >>
>> >> +  // If the consumer of DIBuilder does not provide a unique name,
>> >> generate one
>> >> +  // here.
>> >> +  MDString *UniqueStr = UniqueName.empty() ?
>> >> +
>> >> getUniqueTypeName(dwarf::DW_TAG_structure_type,
>> >> Name) :
>> >> +                        MDString::get(VMContext, UniqueName);
>> >>
>> >> I don't think we should do this. Have an example where we'd want to?
>> >
>> >
>> > The patch has a default value of an empty string for the type creation
>> > function, so if the caller of the type
>> > creation function does not provide a name, DIBuilder will come up with
>> > one.
>>
>> Yeah, not such a fan. Either we should do this in such a way that we
>> don't require outside clients to update, or decide that we don't care.
>>
>> >> +        Map.insert(std::make_pair(
>> >> +                   std::make_pair(Ty.getUniqueName(),
>> >> Ty.isForwardDecl()),
>> >> +                   Ty));
>> >>
>> >> A pair of a pair is a bit icky here as well.
>> >
>> > Any suggestion here?
>> >
>>
>> I'm not sure what each field is being used for in the map so I can't
>> provide suggestions. Perhaps you can explain what's going on? (The
>> StringRef and the DIType are obvious, the bool not so much.)
>>
>> -eric
>>
>> > Thanks,
>> > Manman
>> >
>> >>
>> >>
>> >> This should be enough to start discussion/revision of the patch.
>> >>
>> >> -eric
>> >>
>> >> On Wed, Jul 31, 2013 at 4:37 PM, Manman Ren <manman.ren at gmail.com>
>> >> wrote:
>> >> >
>> >> > After cleaning up usage of the verify function and all the testing
>> >> > cases,
>> >> > the first set of patches are ready for review :)
>> >> >
>> >> > 3 patches are included: one for clang (tu_clang_07312013.patch), one
>> >> > for
>> >> > code change of llvm (tu_07312013.patch), and one for updating all the
>> >> > testing cases (tu_test_07312013.patch).
>> >> >
>> >> > The following are implemented:
>> >> > 1> DIType will have a unique name at field 9.
>> >> >      For DIBasicType, DICompositeType, DIDerivedType, the fields
>> >> > after
>> >> > will
>> >> > be shifted by 1.
>> >> > 2> I am going to update usage of DIType in phases:
>> >> >      At the first batch, getContainingType of DICompositeType and
>> >> > DISubprogram will be replaced with a MDString, as well as
>> >> > getClassType
>> >> > of
>> >> > DIDerivedType.
>> >> > 3> The verifier is updated according to 1 and 2.
>> >> > 4> DwarfDebug has a TypeMap.
>> >> >       It also has a helper function lookupType to find the
>> >> > corresponding
>> >> > DIType given a type name.
>> >> > 5> how to generate the unique name?
>> >> >       The clang frontend has two helper functions:
>> >> > getUniqueTagTypeName
>> >> > and
>> >> > getUniqueTypeName
>> >> >           The first one handles TagType and it uses ODR (One
>> >> > Definition
>> >> > Rule) for external c++ types. It is currently called for struct,
>> >> > class,
>> >> > union and enum.
>> >> >           The second one can be used for general types, it uses
>> >> > mangled
>> >> > name
>> >> > appended with CU's directory and file name to provide uniqueness
>> >> > across
>> >> > CUs.
>> >> > It is currently used to generate unique name for subroutine type. For
>> >> > other
>> >> > types, DIBuilder will provide a unique name. Clang can handle more
>> >> > types
>> >> > in
>> >> > follow-on patches if it can provide a better name than DIBuilder.
>> >> >       DIBuilder also has a helper function getUniqueTypeName, this is
>> >> > called
>> >> > when consumer of DIBuilder does not provide a unique type name. This
>> >> > is
>> >> > needed for dragonegg, before it provides a unique name (not sure
>> >> > whether
>> >> > it
>> >> > is needed).
>> >> > 6> All DITypes created by DIBuilder will be added to RetainTypes.
>> >> >
>> >> > Comments and feedback are welcome.
>> >> >
>> >> > Manman
>> >> >
>> >
>> >
>
>