[lldb-dev] [RFC][PATCH] Keep un-canonicalized template types in the debug information

Thu Sep 18 13:08:55 PDT 2014

On Thu, Sep 18, 2014 at 1:05 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Thu, Sep 18, 2014 at 1:00 PM, Robinson, Paul <
> Paul_Robinson at playstation.sony.com> wrote:
>
>> >From: David Blaikie [mailto:dblaikie at gmail.com]
>> >On Wed, Sep 17, 2014 at 11:54 AM, Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>> >> From: David Blaikie [mailto:dblaikie at gmail.com]
>> >> > On Wed, Sep 17, 2014 at 7:45 AM, Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>> >> > > From: David Blaikie [mailto:dblaikie at gmail.com]
>> >> > > On Sat, Sep 13, 2014 at 6:54 PM, Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>> >> > > > > > On 09 Sep 2014, at 00:01, jingham at apple.com wrote:
>> >> > > > > > >
>> >> > > > > > > From the debugger's standpoint, the functional concern is
>> that if you do
>> >> > > > > > something more real, like:
>> >> > > > > > >
>> >> > > > > > > typedef int A;
>> >> > > > > > > template <typename T>
>> >> > > > > > > struct S
>> >> > > > > > > {
>> >> > > > > > >  T my_t;
>> >> > > > > > > };
>> >> > > > > > >
>> >> > > > > > > I want to make sure that the type of my_t is given as "A"
>> not as "int".
>> >> > > > > > The reason for that is that it is not uncommon to have data
>> formatters
>> >> > > > > > that trigger off the typedef name.  This happens when you
>> use some common
>> >> > > > > > underlying type like "int" but the value has some special
>> meaning when it
>> >> > > > > > is formally an "A", and you want to use the data formatters
>> to give it an
>> >> > > > > > appropriate presentation. Since the data formatters work by
>> matching type
>> >> > > > > > name, starting from the most specific on down, it is
>> important that the
>> >> > > > > > typedef name be preserved.
>> >> > > > > > >
>> >> > > > > > > However, it would be really odd to see:
>> >> > > > > > >
>> >> > > > > > > (lldb) expr -T -- my_s
>> >> > > > > > > (S<int>) $1 = {
>> >> > > > > > >  (A) my_t = 5
>> >> > > > > > > }
>> >> > > > > > >
>> >> > > > > > > instead of:
>> >> > > > > > >
>> >> > > > > > > (lldb) expr -T -- my_s
>> >> > > > > > > (S<A>) $1 = {
>> >> > > > > > >  (A) my_t = 5
>> >> > > > > > > }
>> >> > > > > > >
>> >> > > > > > > so I am in favor of presenting the template parameter type
>> with the most
>> >> > > > > > specific name it was given in the overall template type name.
>> >> > > > > >
>> >> > > > > > OK, we get this wrong today. I’ll try to look into it.
>> >> > > > > >
>> >> > > > > > What’s your take on the debug info representation for the
>> templated class
>> >> > > > > > type? The tentative patch introduces a typedef that declares
>> S<A> as a
>> >> > > > > > typedef for S<int>. The typedef doesn’t exist in the code,
>> thus I find it
>> >> > > > > > a bit of a lie to the debugger. I was more in favour of
>> something like :
>> >> > > > > >
>> >> > > > > > DW_TAG_variable
>> >> > > > > > DW_AT_type: -> DW_TAG_structure_type
>> >> > > > > >                DW_AT_name: S<A>
>> >> > > > > >                DW_AT_specification: -> DW_TAG_structure_type
>> >> > > > > >                                          DW_AT_name: S<int>
>> >> > > > > >
>> >> > > > > > This way the canonical type is kept in the debug
>> information, and the
>> >> > > > > > declaration type is a real class type aliasing the canonical
>> type. But I’m
>> >> > > > > > not sure debuggers can digest this kind of aliasing.
>> >> > > > > >
>> >> > > > > > Fred
>> >> > > > >
>> >> > > > > Why introduce the extra typedef? S<A> should have a template
>> parameter
>> >> > > > > entry pointing to A which points to int.  The info should all
>> be there
>> >> > > > > without any extra stuff.  Or if you think something is
>> missing, please
>> >> > > > > provide a more complete example.
>> >> > > > My immediate concern here would be either loss of information or
>> bloat
>> >> > > > when using that with type units (either bloat because each
>> instantiation
>> >> > > > with differently spelled (but identical) parameters is treated
>> as a separate
>> >> > > > type - or loss when the types are considered the same and all
>> but one are
>> >> > > > dropped at link time)
>> >> > > You'll need to unpack that more because I'm not following the
>> concern.
>> >> > > If the typedefs are spelled differently, don't they count as
>> different types?
>> >> > > DWARF wants to describe the program as-written, and there's no
>> S<int> written
>> >> > > in the program.
>> >> > >
>> >> > > Maybe not in this TU, but possibly in another TU? Or by the user.
>> >> > >
>> >> > >  void func(S<int>);
>> >> > >  ...
>> >> > >  typedef int A;
>> >> > >  S<A> s;
>> >> > >  func(s); // calls the same function
>> >> > >
>> >> > > The user probably wants to be able to call void func with S<int>
>> or S<A>
>> >> > Sure.
>> >> >
>> >> > > (and, actually, in theory, with S<B> where B is another typedef of
>> int, but
>> >> > > that'll /really/ require DWARF consumer support and/or new DWARF
>> wording).
>> >> >
>> >> > Not DWARF wording. DWARF doesn't say when you can and can't call
>> something;
>> >> > that's a debugger feature and therefore a debugger decision.
>> >> >
>> >> What I mean is we'd need some new DWARF to help explain which types are
>> >> equivalent (or the debugger would have to do a lot of spelunking to try
>> >> to find structurally equivalent types - "S<B>" and "S<A>", go look
>> through
>> >> their DW_TAG_template_type_params, see if they are typedefs to the same
>> >> underlying type, etc... )
>> >> >
>> >> >
>> >> > > We can't emit these as completely independent types - it would be
>> verbose
>> >> > > (every instantiation with different typedefs would be a whole
>> separate type
>> >> > > in the DWARF, not deduplicated by type units, etc) and wrong
>> >> >
>> >> > Yes, "typedef int A;" creates a synonym/alias not a new type, so
>> S<A> and S<int>
>> >> > describe the same type from the C++ perspective, so you don't want
>> two complete
>> >> > descriptions with different names, because that really would be
>> describing them
>> >> > as separate types.  What wrinkles my brow is having S<int> be the
>> "real"
>> >> > description even though it isn't instantiated that way in the
>> program.  I wonder
>> >> > if it should be marked artificial... but if you do instantiate
>> S<int> in another
>> >> > TU then you don't want that.  Huh.  It also seems weird to have this:
>> >> >  DW_TAG_typedef
>> >> >    DW_AT_name "S<A>"
>> >> >    DW_AT_type -> S<int>
>> >> > but I seem to be coming around to thinking that's the most viable
>> way to have
>> >> > a single actual instantiated type, and still have the correct names
>> of things
>> >*mostly* correct; this still loses "A" as the type of the data member.
>> >
>> >For the DW_TAG_template_type_parameter, you mean? No, it wouldn't.
>> >
>> > (as a side note, if you do actually have a data member (or any other
>> mention) of
>> >the template parameter type, neither Clang nor GCC really get that
>> 'right' -
>> >"template<typename T> struct foo { T t; }; foo<int> f;" - in both Clang
>> and GCC,
>> >the type of the 't' member of foo<int> is a direct reference to the
>> "int" DIE, not
>> >to the DW_TAG_template_type_parameter for "T" -> int)
>>
>> Huh. And DWARF doesn't say you should point to the
>> template_type_parameter...
>> I thought it did, but no.  Okay, so nothing is lost, but it feels
>> desirable
>> to me, that uses of the template parameter should cite it in the DWARF as
>> well.
>> But I guess we can leave that part of the debate for another time.
>>
>> >
>> >Crud.
>> >But I haven't come up with a way to get that back without basically
>> instantiating
>> >S<A> and S<int> separately.
>> >
>> >> >
>> >> Yep - it's the only way I can think of giving this information in a
>> way that's
>> >> likely to work with existing consumers. It would probably be harmless
>> to add
>> >> DW_AT_artificial to the DW_TAG_typedef, if that's any help to any
>> debug info
>> >> consumer.
>> >
>> >Hmmm no, S<A> is not the artificial name;
>> >
>> >It's not the artificial name, but it is an artificial typedef.
>>
>> If the source only says S<A>, then the entire S<int> description is
>> artificial,
>> because *that's not what the user wrote*.  So both the typedef and the
>> class type
>> are artificial.  Gah.  Let's forget artificial here.
>>
>> >
>> >some debuggers treat DW_AT_artificial
>> >as meaning "don't show this to the user."
>> >
>> >In some sense that's what I want - we never wrote the typedef in the
>> source
>> >so I wouldn't want to see it rendered in the "list of typedefs" (or even
>> >probably in the list of types, maybe).
>> >
>> >But S<A> is the name we *do* want to
>> >show to the user.
>> >
>> >Maybe. Sometimes. But there could be many such aliases for the type. (&
>> many
>> >more that were never written in the source code, but are still valid in
>> the
>> >source language (every other typedef of int, every other way to name the
>> int
>> >type (decltype, etc)))
>>
>> But you *lose* cases where the typedef is the *same* *everywhere*.  And in
>> many cases that typedef is a valuable thing, not the trivial rename we've
>> been bandying about.  This is a more real example:
>>
>> typedef int int4 __attribute__((ext_vector_type(4)));
>> template<typename T> struct TypeTraits {};
>> template<>
>> struct TypeTraits<int4> {
>>   static unsigned MysteryNumber;
>> };
>> unsigned TypeTraits<int4>::MysteryNumber = 3U;
>>
>> Displaying "TypeTraits<int __attribute__((ext_vector_type(4)))>" is much
>> worse than "TypeTraits<int4>" (and not just because it's shorter).
>> More to the point, having the debugger *complain* when the user says
>> something like "ptype TypeTraits<int4>" is a problem.
>>
>> Reducing debug-info size is a worthy goal, but don't degrade the debugging
>> experience to get there.
>>
>
> I'm not sure which part of what I've said seemed like a suggestion to
> degrade the debugging experience to minimize debug info size (the
> proposition that we should use a typedef or other alias on top of the
> canonical type? It wouldn't cause "ptype TypeTraits<int4>" to complain -
> indeed for GDB ptyping a typedef gives /exactly/ the same output as if you
> ptype the underlying type - it doesn't even mention that there's a typedef
> involved:
>
> typedef fooA foo<int>;
>

(keyboard shortcuts are hard - accidentally sent before I finished)
(gdb) ptype fooA
type = struct foo<int> [with T = int] {
    <no data fields>
}

But in any case, I think what I'm saying boils down to:

Short of changing debug info consumers, I think the only thing we can do is
DW_TAG_typedef. That'll work for existing consumers.

Anything else will need possibly new DWARF wording, or at least an
agreement between a variety of debug info consumers and producers that some
new cliche/use of existing DWARF be used to describe these situations.

I could be wrong - if someone wants to try prototyping the
DW_TAG_structure_type proposal Fred had and see if existing debuggers work
with that, sure.

I'm not opposed to someone coming up with a standardizable more descriptive
form than DW_TAG_typedef, but that conversation probably needs to happen
with the DWARF Committee more than the LLVM community.

- David

>
>
>
>
>> --paulr
>>
>> >
>> >
>> >> That said, I'm not opposed to proposing something to DWARF to define
>> some more
>> >> 'proper' way to describe this.
>> >
>> >Yah.  I've been thinking about the DW_AT_specification idea too, which
>> would be
>> >something like this:
>> >    DW_TAG_class_type
>> >      DW_AT_name "S<A>"
>> >      DW_AT_specification -> S<int>
>> >
>> >      DW_TAG_template_type_parameter
>> >        DW_AT_name "T"
>> >        DW_AT_type -> A
>> >
>> >The problem with this is you don't know where T is used in the template,
>> so
>> >you *still* don't know when to use A as the type of "field". Also it's
>> kind
>> >of an abuse of DW_AT_specification.  If we can't get A as the type of
>> "field"
>> >then the typedef is more straightforward and understandable.
>> >
>> >It's still a lot of DWARF to emit for every way the user has named the
>> template
>> >& I'm not sure how much value it provides - are there use cases you have
>> in mind
>> >that would benefit from the increased fidelity of knowing which template
>> argument
>> >corresponds to the way the user wrote the type.
>> >
>> > (& what would we emit if the user named the type in some other more
>> exotic way:
>> >int func(); template<typename T> struct S { }; ... S<decltype(func())>
>> s; )
>> >
>> >
>> >Maybe I'll pop a note to the DWARF committee for a broader set of
>> opinions.
>> >
>> >>
>> >> One other open question is then, when, if ever, to reference the
>> DW_TAG_typedef
>> >> rather than the underlying type? Do we just reference it whenever the
>> user
>> >> writes using that name?
>> >>
>> >>  void f(S<A>);
>> >>  ...
>> >>  void f(S<B>) { ... }
>> >>
>> >> etc... (this would be just as possible/we could maybe treat it the
>> same as if
>> >> the user wrote "void f(A); ... void f(B) { ... }")
>> >
>> >That's what I would do, and I think is more conformant to the DWARF spec.
>> >--paulr
>> >
>> >>
>> >> > (because DWARF is all about the name "as it appears in the source
>> program.")
>> >> >
>> >> > > (the debugger wouldn't know these are actually the same type so
>> wouldn't
>> >> > > allow function calls, etc).
>> >> > >
>> >> > > - David
>> >> > >
>> >> > > >
>> >> > > >
>> >> > > > > Jim
>> >> > > > >
>> >> > > >> On Sep 8, 2014, at 12:38 PM, Frédéric Riss <friss at apple.com>
>> wrote:
>> >> > > >>
>> >> > > >>
>> >> > > >>> On 08 Sep 2014, at 19:31, Greg Clayton <gclayton at apple.com>
>> wrote:
>> >> > > >>>
>> >> > > >>> This means you will see "S<A>" as the type for your variables
>> in the
>> >> > > debugger when you view variables or children of
>> structs/unions/classes. I
>> >> > > think this is not what the user would want to see. I would rather
>> see
>> >> > > "S<int>" as the type for my variable than see "S<A>”.
>> >> > > >>
>> >> > > >> I find it more accurate for the debugger to report what has
>> actually
>> >> > > been put in the code. Moreover when a typedef is used, it’s
>> usually to
>> >> > > make things more readable not to hide information, thus I guess it
>> would
>> >> > > usually be as informative while being more compact. The debugger
>> needs to
>> >> > > have a way to describe the real type behind the abbreviated name
>> though,
>> >> > > we must not have less information compared to what we have today.
>> >> > > >>
>> >> > > >> Another point: this allows the debugger to know what S<A>
>> actually is.
>> >> > > Without it, the debugger only knows the canonical type. This means
>> that
>> >> > > currently you can’t copy/paste a piece of code that references
>> that kind
>> >> > > of template names and have it parse correctly. I /think/ that
>> having this
>> >> > > information in the debug info will allow more of this to work.
>> >> > > >>
>> >> > > >> But we can agree to disagree :-) It would be great to have more
>> people
>> >> > > chime and give their opinion.
>> >> > > >>
>> >> > > >> Fred
>> >> > > >>
>> >> > > >>>> On Sep 5, 2014, at 4:00 PM, Adrian Prantl <aprantl at apple.com>
>> wrote:
>> >> > > >>>>
>> >> > > >>>>
>> >> > > >>>>> On Sep 5, 2014, at 3:49 PM, Eric Christopher <
>> echristo at gmail.com>
>> >> > > wrote:
>> >> > > >>>>>
>> >> > > >>>>>
>> >> > > >>>>>
>> >> > > >>>>>
>> >> > > >>>>> On Fri, Sep 5, 2014 at 3:43 PM, Duncan P. N. Exon Smith
>> >> > > <dexonsmith at apple.com> wrote:
>> >> > > >>>>>
>> >> > > >>>>>> On 2014 Sep 5, at 16:01, Frédéric Riss <friss at apple.com>
>> wrote:
>> >> > > >>>>>>
>> >> > > >>>>>> I couldn’t even find a subject expressing exactly what this
>> patch
>> >> > > is about… First of all, it’s meant to start a discussion, and I’m
>> not
>> >> > > proposing it for inclusion right now.
>> >> > > >>>>>>
>> >> > > >>>>>> The issue I’m trying to address is that template types are
>> always
>> >> > > canonicalized when emitted in the debug information (this is the
>> desugar()
>> >> > > call in UnwrapTypeForDebugInformation).
>> >> > > >>>>>>
>> >> > > >>>>>> This means that if the developer writes:
>> >> > > >>>>>>
>> >> > > >>>>>> typedef int A;
>> >> > > >>>>>> template <typename T>
>> >> > > >>>>>> struct S {};
>> >> > > >>>>>>
>> >> > > >>>>>> S<A> var;
>> >> > > >>>>>>
>> >> > > >>>>>> The variable var will have type S<int> and not S<A>. In
>> this simple
>> >> > > example, it’s not that much of an issue, but for heavily templated
>> code,
>> >> > > the full expansion might be really different from the original
>> >> > > declaration.
>> >> > > >>>>>>
>> >> > > >>>>>> The attached patch makes us emit an intermediate typedef
>> for the
>> >> > > variable’s type:
>> >> > > >>>>>>
>> >> > > >>>>>> 0x0000002a:   DW_TAG_variable [2]
>> >> > > >>>>>>             DW_AT_name [DW_FORM_strp]       (
>> >> > > .debug_str[0x00000032] = “var")
>> >> > > >>>>>>             DW_AT_type [DW_FORM_ref4]       (cu + 0x0040 =>
>> >> > > {0x00000040})
>> >> > > >>>>>>             DW_AT_external [DW_FORM_flag]   (0x01)
>> >> > > >>>>>>             DW_AT_decl_file [DW_FORM_data1] (0x01)
>> >> > > >>>>>>             DW_AT_decl_line [DW_FORM_data1] (8)
>> >> > > >>>>>>             DW_AT_location [DW_FORM_block1] (<0x09> 03 70
>> 6c 00 00
>> >> > > 00 00 00 00 )
>> >> > > >>>>>>
>> >> > > >>>>>> 0x00000040:   DW_TAG_typedef [3]
>> >> > > >>>>>>             DW_AT_type [DW_FORM_ref4]       (cu + 0x004b =>
>> >> > > {0x0000004b})
>> >> > > >>>>>>             DW_AT_name [DW_FORM_strp]       (
>> >> > >.debug_str[0x00000035] = “S<A>")
>> >> > > >>>>>>             DW_AT_decl_file [DW_FORM_data1] (0x01)
>> >> > > >>>>>>             DW_AT_decl_line [DW_FORM_data1] (6)
>> >> > > >>>>>>
>> >> > > >>>>>> 0x0000004b:   DW_TAG_structure_type [4] *
>> >> > > >>>>>>             DW_AT_name [DW_FORM_strp]       (
>> >> > >.debug_str[0x0000003e] = “S<int>")
>> >> > > >>>>>>             DW_AT_byte_size [DW_FORM_data1] (0x01)
>> >> > > >>>>>>             DW_AT_decl_file [DW_FORM_data1] (0x01)
>> >> > > >>>>>>             DW_AT_decl_line [DW_FORM_data1] (6)
>> >> > > >>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>> Which basically is what I want, although I don’t like that
>> it
>> >> > > introduces a typedef where there is none in the code. I’d prefer
>> that to
>> >> > > be:
>> >> > > >>>>>>
>> >> > > >>>>>> DW_TAG_variable
>> >> > > >>>>>> DW_AT_type: -> DW_TAG_structure_type
>> >> > > >>>>>>                DW_AT_name: S<A>
>> >> > > >>>>>>                DW_AT_specification: -> DW_TAG_structure_type
>> >> > > >>>>>>                                          DW_AT_name: S<int>
>> >> > > >>>>>>                                          …
>> >> > > >>>>>>
>> >> > > >>>>>> The patch also has the nice property of omitting the
>> defaulted
>> >> > > template arguments in the first level typedef. For example you get
>> >> > > vector<A> instead of vector<int, std::__1::allocator<int> >.
>> >> > > >>>>>
>> >> > > >>>>> If you specify `vector<int>` in C++ do you get that instead
>> of
>> >> > > >>>>> `vector<int, std::__1::allocator<int>>`?
>> >> > > >>>>>
>> >> > > >>>>> Yeah, I mentioned this as possibly causing problems with
>> debuggers
>> >> > > or other consumers, but I don't have any proof past "ooooo scary!”.
>> >> > > >>>>
>> >> > > >>>> Well, [+lldb-dev], could this confuse debuggers? :-)
>> >> > > >>>>
>> >> > > >>>> -- adrian
>> >> > > >>>>>
>> >> > > >>>>> That said, I like the idea personally :)
>> >> > > >>>>>
>> >> > > >>>>> -eric
>> >> > > >>>>>
>> >> > > >>>>>
>> >> > > >>>>>> Now there is one thing I really don’t like about the patch.
>> In
>> >> > > order not to emit typedefs for types that don’t need it, I use
>> string
>> >> > > comparison between the desugared and the original type. I haven’t
>> >> > > quantified anything, but doing the construction of the type name
>> for every
>> >> > > template type and then comparing it to decide to use it or not
>> seems like
>> >> > > a big waste.
>> >> > > >>>>>
>> >> > > >>>>> Maybe someone on cfe-dev knows a better way.
>> >> > > >>>>>
>> >> > > >>>>>>
>> >> > > >>>>>> Thoughts?
>> >> > > >>>>>>
>> >> > > >>>>>> <template-arg-typedefs.diff>
>> >> > > >>>>>>
>> >> > > >>>>>> Fred
>> >> > > >>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>> _______________________________________________
>> >> > > >>>>>> llvm-commits mailing list
>> >> > > >>>>>> llvm-commits at cs.uiuc.edu
>> >> > > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >> > > >>>>>
>> >> > > >>>>>
>> >> > > >>>>
>> >> > > >>>> _______________________________________________
>> >> > > >>>> lldb-dev mailing list
>> >> > > >>>> lldb-dev at cs.uiuc.edu
>> >> > > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>> >> > > >>>
>> >> > > >>
>> >> > > >>
>> >> > > >> _______________________________________________
>> >> > > >> lldb-dev mailing list
>> >> > > >> lldb-dev at cs.uiuc.edu
>> >> > > >> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>> >> > > >
>> >> > >
>> >> > >
>> >> > > _______________________________________________
>> >> > > llvm-commits mailing list
>> >> > > llvm-commits at cs.uiuc.edu
>> >> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >> >
>> >> > _______________________________________________
>> >> > llvm-commits mailing list
>> >> > llvm-commits at cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140918/35a87059/attachment.html>