[PATCH] First set of patches for type uniquing

Mon Aug 26 16:00:48 PDT 2013

On Mon, Aug 26, 2013 at 3:46 PM, Manman Ren <manman.ren at gmail.com> wrote:
> In r189282 and r189283.
>
> I will work on clang front-end to generate the unique identifier for
> DICompositeType.

For each field that uses a DIType that may refer to a DICompositeType,
we'll see two patches - one for the backend support (to do the
necessary indirect access through a map lookup) and frontend support
to emit a name. The backend patch can be committed prior to the
frontend patch (though the frontend patch would be a nice way to
generate the backend test to ensure it's what we intend to
support/works end-to-end).

It'd be good to see exactly one (a small one, where possible) use of
this first so we can checkin the map creation & lookup plumbing with a
single simple field to motivate/demonstrate/test it, then
incrementally commit support for each field after that.

>
> Thanks,
> Manman
>
>
> On Mon, Aug 26, 2013 at 1:40 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> On Mon, Aug 26, 2013 at 1:20 PM, Manman Ren <manman.ren at gmail.com> wrote:
>> > Hi David,
>> >
>> > Thanks for the fast reviewing.
>>
>> Sure thing - thanks for making it easy to review quickly!
>>
>> >
>> >
>> > On Mon, Aug 26, 2013 at 12:58 PM, David Blaikie <dblaikie at gmail.com>
>> > wrote:
>> >>
>> >> Generally we include test changes in patches that require them, rather
>> >> than splitting them out. I assume in this instance you split the
>> >> patches for ease of reviewing, but intend to commit them together (so
>> >> as not to break the buildbots)
>> >
>> > Yes, I separated to make reviewing easier. The three patches will be
>> > submitted together.
>> >>
>> >>
>> >> It looks like we're constraining the DICompositeType schema a little
>> >> further than it was before - as you can see from your patch, some
>> >> DICompositeTypes had 13 fields (no template argument field) and some
>> >> have 14. Now they all have 15. I think this is OK, but just want to
>> >> highlight it as a change in schema.
>> >
>> > Yes, all DICompositeTypes will have 15 fields, for types where template
>> > argument field was omitted, I
>> > added an extra null. I will expand my commit message to emphasize the
>> > schema
>> > change.
>> >
>> >>
>> >> (personally I don't like the vague
>> >> optionality of trailing fields like that - if we're going to try to
>> >> reduce the size of these records (there is a lot of redundancy in them
>> >> - some type descriptions don't need nearly so many fields) we should
>> >> sit down & do that deliberately rather than with little hacks, so
>> >> making things more consistent is good as it'll make it easier to
>> >> decide where to try to remove such redundancy)
>> >>
>> >> While you're here, you could change those lines that use
>> >> "Constant::getNullValue" to just use NULL as we do in other places (I
>> >> assume the "getNullValue" is how we end up with i32 0 as 'null' values
>> >> sometimes?
>> >
>> > Yes, that is how we get "i32 0".
>> >>
>> >> when we really want null values anyway, and it's shorter to
>> >> write) such as in createSubroutineType, createEnumerationType,
>> >> createArrayType - just on the lines you're already touching (because
>> >> you have to add a ',') anyway.
>> >
>> > I will make the change.
>> >
>> > Thanks,
>> > Manman
>> >>
>> >>
>> >> But otherwise this looks correct - Please commit. (I haven't tested
>> >> this, so leaving it up to you to provide assurances that your test
>> >> changes are necessary & sufficient - I glanced at a few, but didn't
>> >> rigorously investigate them all).
>> >>
>> >> On Mon, Aug 26, 2013 at 12:50 PM, Manman Ren <manman.ren at gmail.com>
>> >> wrote:
>> >> >
>> >> > Eric, David and I chatted on IRC and agreed to submit smaller
>> >> > patches.
>> >> > We will add an identifier field to DICompositeType. For
>> >> > DICompositeTypes
>> >> > with a non-null identifier field, its reference can
>> >> > be replaced with the identifier to remove cycles and to help type
>> >> > uniquing.
>> >> >
>> >> > As a first step, I am going to add a field to DICompositeType, the
>> >> > attached
>> >> > patches implement that:
>> >> > DICompositeType will have an identifier field at position 14. For
>> >> > now,
>> >> > the
>> >> > field is set to null by DIBuilder.
>> >> > Update verifier to check that DICompositeType has 15 fields and the
>> >> > last
>> >> > field is null or a MDString.
>> >> > Update testing cases to include an extra field for DICompositeType.
>> >> > The identifier field will be used by type uniquing so a front end can
>> >> > generate a DICompositeType with a unique identifier.
>> >> >
>> >> > 3 patches are attached, two of them update testing cases.
>> >> >
>> >> > Thanks,
>> >> > Manman
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Aug 23, 2013 at 10:49 AM, Eric Christopher
>> >> > <echristo at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> >>
>> >> >> >> I'm still looking at the patch but one comment that will affect a
>> >> >> >> lot
>> >> >> >> of the text of the patch is that I think the unique name should
>> >> >> >> be
>> >> >> >> passed into the type creation functions, not created inside the
>> >> >> >> backend. This means that each front end can use the definition of
>> >> >> >> "unique" that most appeals to them.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > I know dragonegg calls those type creation functions, and there
>> >> >> > are
>> >> >> > maybe
>> >> >> > other out-of-tree usages.
>> >> >> > This patch provides default value for the unique name (an empty
>> >> >> > string)
>> >> >> > to
>> >> >> > the type creation functions, so
>> >> >> > it is not going to break dragonegg or out-of-tree projects. Inside
>> >> >> > the
>> >> >> > type
>> >> >> > creation functions, if the passed-in
>> >> >> > unique name is empty, DIBuilder will generate a unique name. I am
>> >> >> > not
>> >> >> > sure
>> >> >> > if it is necessary for dragonegg
>> >> >> > to have its own implementation of generating the unique name, and
>> >> >> > how
>> >> >> > hard
>> >> >> > it is.
>> >> >>
>> >> >> It shouldn't be too hard, the question is what should go into the
>> >> >> field. I don't know how comfortable I am with emitting a "unique"
>> >> >> name
>> >> >> out of the backend if one isn't provided. I think the best method
>> >> >> here
>> >> >> would be this:
>> >> >>
>> >> >> The type field can either be an MDString or an MDNode. The MDNode
>> >> >> will
>> >> >> be the referenced type as is currently used. The MDString would be
>> >> >> the
>> >> >> unique name for the type that will be used to disambiguate as we've
>> >> >> discussed (also to remove cycles and references so we can unique).
>> >> >> This will avoid the need to change clients except for those that
>> >> >> wish
>> >> >> to take advantage of this.
>> >> >>
>> >> >> Here's a very incremental way of doing the patch I think:
>> >> >>
>> >> >> 1) Add a field to DIComposite type which is initially just a direct
>> >> >> reference back to itself.
>> >> >> 2) Ensure that these end up in the retain types list.
>> >> >> 3) Then build the map based on that key, which is just a node
>> >> >> reference at the moment.
>> >> >>  - Each field can thus be done incrementally and mapped one at a
>> >> >> time.
>> >> >> 4) Then start passing in strings one at a time.
>> >> >>
>> >> >> This will make it easier to review etc.
>> >> >>
>> >> >> Then we can get to the declaration deduplication and
>> >> >> declaration/defined type merging.
>> >> >>
>> >> >> What do you think?
>> >> >>
>> >> >> >> Also I'm not sure why you've got an actual hashing algorithm in
>> >> >> >> there.
>> >> >> >> Can you explain?
>> >> >> >
>> >> >> >
>> >> >> > Which hashing algorithm are you referring to?
>> >> >> > I don't recall a hashing algorithm in the patch.
>> >> >> >
>> >> >>
>> >> >> +    static unsigned getHashValue(const Pair& PairVal) {
>> >> >> +      uint64_t Key = (uint64_t)hash_value(PairVal.first) << 32
>> >> >> +                     | (uint64_t)PairVal.second;
>> >> >> +      Key += ~(Key << 32);
>> >> >> +      Key ^= (Key >> 22);
>> >> >> +      Key += ~(Key << 13);
>> >> >> +      Key ^= (Key >> 8);
>> >> >> +      Key += (Key << 3);
>> >> >> +      Key ^= (Key >> 15);
>> >> >> +      Key += ~(Key << 27);
>> >> >> +      Key ^= (Key >> 31);
>> >> >> +      return (unsigned)Key;
>> >> >> +    }
>> >> >>
>> >> >> Related to the many pairs that we talk about below.
>> >> >>
>> >> >> >> Couple other things I noticed:
>> >> >> >>
>> >> >> >> +  // If the consumer of DIBuilder does not provide a unique
>> >> >> >> name,
>> >> >> >> generate one
>> >> >> >> +  // here.
>> >> >> >> +  MDString *UniqueStr = UniqueName.empty() ?
>> >> >> >> +
>> >> >> >> getUniqueTypeName(dwarf::DW_TAG_structure_type,
>> >> >> >> Name) :
>> >> >> >> +                        MDString::get(VMContext, UniqueName);
>> >> >> >>
>> >> >> >> I don't think we should do this. Have an example where we'd want
>> >> >> >> to?
>> >> >> >
>> >> >> >
>> >> >> > The patch has a default value of an empty string for the type
>> >> >> > creation
>> >> >> > function, so if the caller of the type
>> >> >> > creation function does not provide a name, DIBuilder will come up
>> >> >> > with
>> >> >> > one.
>> >> >>
>> >> >> Yeah, not such a fan. Either we should do this in such a way that we
>> >> >> don't require outside clients to update, or decide that we don't
>> >> >> care.
>> >> >>
>> >> >> >> +        Map.insert(std::make_pair(
>> >> >> >> +                   std::make_pair(Ty.getUniqueName(),
>> >> >> >> Ty.isForwardDecl()),
>> >> >> >> +                   Ty));
>> >> >> >>
>> >> >> >> A pair of a pair is a bit icky here as well.
>> >> >> >
>> >> >> > Any suggestion here?
>> >> >> >
>> >> >>
>> >> >> I'm not sure what each field is being used for in the map so I can't
>> >> >> provide suggestions. Perhaps you can explain what's going on? (The
>> >> >> StringRef and the DIType are obvious, the bool not so much.)
>> >> >>
>> >> >> -eric
>> >> >>
>> >> >> > Thanks,
>> >> >> > Manman
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> This should be enough to start discussion/revision of the patch.
>> >> >> >>
>> >> >> >> -eric
>> >> >> >>
>> >> >> >> On Wed, Jul 31, 2013 at 4:37 PM, Manman Ren
>> >> >> >> <manman.ren at gmail.com>
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> > After cleaning up usage of the verify function and all the
>> >> >> >> > testing
>> >> >> >> > cases,
>> >> >> >> > the first set of patches are ready for review :)
>> >> >> >> >
>> >> >> >> > 3 patches are included: one for clang
>> >> >> >> > (tu_clang_07312013.patch),
>> >> >> >> > one
>> >> >> >> > for
>> >> >> >> > code change of llvm (tu_07312013.patch), and one for updating
>> >> >> >> > all
>> >> >> >> > the
>> >> >> >> > testing cases (tu_test_07312013.patch).
>> >> >> >> >
>> >> >> >> > The following are implemented:
>> >> >> >> > 1> DIType will have a unique name at field 9.
>> >> >> >> >      For DIBasicType, DICompositeType, DIDerivedType, the
>> >> >> >> > fields
>> >> >> >> > after
>> >> >> >> > will
>> >> >> >> > be shifted by 1.
>> >> >> >> > 2> I am going to update usage of DIType in phases:
>> >> >> >> >      At the first batch, getContainingType of DICompositeType
>> >> >> >> > and
>> >> >> >> > DISubprogram will be replaced with a MDString, as well as
>> >> >> >> > getClassType
>> >> >> >> > of
>> >> >> >> > DIDerivedType.
>> >> >> >> > 3> The verifier is updated according to 1 and 2.
>> >> >> >> > 4> DwarfDebug has a TypeMap.
>> >> >> >> >       It also has a helper function lookupType to find the
>> >> >> >> > corresponding
>> >> >> >> > DIType given a type name.
>> >> >> >> > 5> how to generate the unique name?
>> >> >> >> >       The clang frontend has two helper functions:
>> >> >> >> > getUniqueTagTypeName
>> >> >> >> > and
>> >> >> >> > getUniqueTypeName
>> >> >> >> >           The first one handles TagType and it uses ODR (One
>> >> >> >> > Definition
>> >> >> >> > Rule) for external c++ types. It is currently called for
>> >> >> >> > struct,
>> >> >> >> > class,
>> >> >> >> > union and enum.
>> >> >> >> >           The second one can be used for general types, it uses
>> >> >> >> > mangled
>> >> >> >> > name
>> >> >> >> > appended with CU's directory and file name to provide
>> >> >> >> > uniqueness
>> >> >> >> > across
>> >> >> >> > CUs.
>> >> >> >> > It is currently used to generate unique name for subroutine
>> >> >> >> > type.
>> >> >> >> > For
>> >> >> >> > other
>> >> >> >> > types, DIBuilder will provide a unique name. Clang can handle
>> >> >> >> > more
>> >> >> >> > types
>> >> >> >> > in
>> >> >> >> > follow-on patches if it can provide a better name than
>> >> >> >> > DIBuilder.
>> >> >> >> >       DIBuilder also has a helper function getUniqueTypeName,
>> >> >> >> > this
>> >> >> >> > is
>> >> >> >> > called
>> >> >> >> > when consumer of DIBuilder does not provide a unique type name.
>> >> >> >> > This
>> >> >> >> > is
>> >> >> >> > needed for dragonegg, before it provides a unique name (not
>> >> >> >> > sure
>> >> >> >> > whether
>> >> >> >> > it
>> >> >> >> > is needed).
>> >> >> >> > 6> All DITypes created by DIBuilder will be added to
>> >> >> >> > RetainTypes.
>> >> >> >> >
>> >> >> >> > Comments and feedback are welcome.
>> >> >> >> >
>> >> >> >> > Manman
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>