[lldb-dev] [RFC][PATCH] Keep un-canonicalized template types in the debug information

Wed Sep 17 07:45:41 PDT 2014

> From: David Blaikie [mailto:dblaikie at gmail.com] 
> On Sat, Sep 13, 2014 at 6:54 PM, Robinson, Paul <Paul_Robinson at playstation.sony.com> wrote:
> > > > On 09 Sep 2014, at 00:01, jingham at apple.com wrote:
> > > > >
> > > > > From the debugger's standpoint, the functional concern is that if you do
> > > > something more real, like:
> > > > >
> > > > > typedef int A;
> > > > > template <typename T>
> > > > > struct S
> > > > > {
> > > > >  T my_t;
> > > > > };
> > > > >
> > > > > I want to make sure that the type of my_t is given as "A" not as "int".
> > > > The reason for that is that it is not uncommon to have data formatters
> > > > that trigger off the typedef name.  This happens when you use some common
> > > > underlying type like "int" but the value has some special meaning when it
> > > > is formally an "A", and you want to use the data formatters to give it an
> > > > appropriate presentation. Since the data formatters work by matching type
> > > > name, starting from the most specific on down, it is important that the
> > > > typedef name be preserved.
> > > > >
> > > > > However, it would be really odd to see:
> > > > >
> > > > > (lldb) expr -T -- my_s
> > > > > (S<int>) $1 = {
> > > > >  (A) my_t = 5
> > > > > }
> > > > >
> > > > > instead of:
> > > > >
> > > > > (lldb) expr -T -- my_s
> > > > > (S<A>) $1 = {
> > > > >  (A) my_t = 5
> > > > > }
> > > > >
> > > > > so I am in favor of presenting the template parameter type with the most
> > > > specific name it was given in the overall template type name.
> > > >
> > > > OK, we get this wrong today. I’ll try to look into it.
> > > >
> > > > What’s your take on the debug info representation for the templated class
> > > > type? The tentative patch introduces a typedef that declares S<A> as a
> > > > typedef for S<int>. The typedef doesn’t exist in the code, thus I find it
> > > > a bit of a lie to the debugger. I was more in favour of something like :
> > > >
> > > > DW_TAG_variable
> > > > DW_AT_type: -> DW_TAG_structure_type
> > > >                DW_AT_name: S<A>
> > > >                DW_AT_specification: -> DW_TAG_structure_type
> > > >                                          DW_AT_name: S<int>
> > > >
> > > > This way the canonical type is kept in the debug information, and the
> > > > declaration type is a real class type aliasing the canonical type. But I’m
> > > > not sure debuggers can digest this kind of aliasing.
> > > >
> > > > Fred
> > >
> > > Why introduce the extra typedef? S<A> should have a template parameter
> > > entry pointing to A which points to int.  The info should all be there
> > > without any extra stuff.  Or if you think something is missing, please
> > > provide a more complete example.
> > My immediate concern here would be either loss of information or bloat
> > when using that with type units (either bloat because each instantiation
> > with differently spelled (but identical) parameters is treated as a separate
> > type - or loss when the types are considered the same and all but one are
> > dropped at link time)
> You'll need to unpack that more because I'm not following the concern.
> If the typedefs are spelled differently, don't they count as different types?
> DWARF wants to describe the program as-written, and there's no S<int> written
> in the program.
>
> Maybe not in this TU, but possibly in another TU? Or by the user.
>
>  void func(S<int>);
>  ...
>  typedef int A;
>  S<A> s;
>  func(s); // calls the same function
>
> The user probably wants to be able to call void func with S<int> or S<A> 

Sure.

> (and, actually, in theory, with S<B> where B is another typedef of int, but 
> that'll /really/ require DWARF consumer support and/or new DWARF wording). 

Not DWARF wording. DWARF doesn't say when you can and can't call something;
that's a debugger feature and therefore a debugger decision.

> We can't emit these as completely independent types - it would be verbose 
> (every instantiation with different typedefs would be a whole separate type 
> in the DWARF, not deduplicated by type units, etc) and wrong

Yes, "typedef int A;" creates a synonym/alias not a new type, so S<A> and S<int>
describe the same type from the C++ perspective, so you don't want two complete
descriptions with different names, because that really would be describing them
as separate types.  What wrinkles my brow is having S<int> be the "real" 
description even though it isn't instantiated that way in the program.  I wonder
if it should be marked artificial... but if you do instantiate S<int> in another
TU then you don't want that.  Huh.  It also seems weird to have this:
  DW_TAG_typedef
    DW_AT_name "S<A>"
    DW_AT_type -> S<int>
but I seem to be coming around to thinking that's the most viable way to have
a single actual instantiated type, and still have the correct names of things
(because DWARF is all about the name "as it appears in the source program.")

> (the debugger wouldn't know these are actually the same type so wouldn't 
> allow function calls, etc).
>
> - David
>
> >
> >
> > > Jim
> > >
> > >> On Sep 8, 2014, at 12:38 PM, Frédéric Riss <friss at apple.com> wrote:
> > >>
> > >>
> > >>> On 08 Sep 2014, at 19:31, Greg Clayton <gclayton at apple.com> wrote:
> > >>>
> > >>> This means you will see "S<A>" as the type for your variables in the
> > debugger when you view variables or children of structs/unions/classes. I
> > think this is not what the user would want to see. I would rather see
> > "S<int>" as the type for my variable than see "S<A>”.
> > >>
> > >> I find it more accurate for the debugger to report what has actually
> > been put in the code. Moreover when a typedef is used, it’s usually to
> > make things more readable not to hide information, thus I guess it would
> > usually be as informative while being more compact. The debugger needs to
> > have a way to describe the real type behind the abbreviated name though,
> > we must not have less information compared to what we have today.
> > >>
> > >> Another point: this allows the debugger to know what S<A> actually is.
> > Without it, the debugger only knows the canonical type. This means that
> > currently you can’t copy/paste a piece of code that references that kind
> > of template names and have it parse correctly. I /think/ that having this
> > information in the debug info will allow more of this to work.
> > >>
> > >> But we can agree to disagree :-) It would be great to have more people
> > chime and give their opinion.
> > >>
> > >> Fred
> > >>
> > >>>> On Sep 5, 2014, at 4:00 PM, Adrian Prantl <aprantl at apple.com> wrote:
> > >>>>
> > >>>>
> > >>>>> On Sep 5, 2014, at 3:49 PM, Eric Christopher <echristo at gmail.com>
> > wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Sep 5, 2014 at 3:43 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> > >>>>>
> > >>>>>> On 2014 Sep 5, at 16:01, Frédéric Riss <friss at apple.com> wrote:
> > >>>>>>
> > >>>>>> I couldn’t even find a subject expressing exactly what this patch
> > is about… First of all, it’s meant to start a discussion, and I’m not
> > proposing it for inclusion right now.
> > >>>>>>
> > >>>>>> The issue I’m trying to address is that template types are always
> > canonicalized when emitted in the debug information (this is the desugar()
> > call in UnwrapTypeForDebugInformation).
> > >>>>>>
> > >>>>>> This means that if the developer writes:
> > >>>>>>
> > >>>>>> typedef int A;
> > >>>>>> template <typename T>
> > >>>>>> struct S {};
> > >>>>>>
> > >>>>>> S<A> var;
> > >>>>>>
> > >>>>>> The variable var will have type S<int> and not S<A>. In this simple
> > example, it’s not that much of an issue, but for heavily templated code,
> > the full expansion might be really different from the original
> > declaration.
> > >>>>>>
> > >>>>>> The attached patch makes us emit an intermediate typedef for the
> > variable’s type:
> > >>>>>>
> > >>>>>> 0x0000002a:   DW_TAG_variable [2]
> > >>>>>>             DW_AT_name [DW_FORM_strp]       (
> > .debug_str[0x00000032] = “var")
> > >>>>>>             DW_AT_type [DW_FORM_ref4]       (cu + 0x0040 =>
> > {0x00000040})
> > >>>>>>             DW_AT_external [DW_FORM_flag]   (0x01)
> > >>>>>>             DW_AT_decl_file [DW_FORM_data1] (0x01)
> > >>>>>>             DW_AT_decl_line [DW_FORM_data1] (8)
> > >>>>>>             DW_AT_location [DW_FORM_block1] (<0x09> 03 70 6c 00 00
> > 00 00 00 00 )
> > >>>>>>
> > >>>>>> 0x00000040:   DW_TAG_typedef [3]
> > >>>>>>             DW_AT_type [DW_FORM_ref4]       (cu + 0x004b =>
> > {0x0000004b})
> > >>>>>>             DW_AT_name [DW_FORM_strp]       (
> > .debug_str[0x00000035] = “S<A>")
> > >>>>>>             DW_AT_decl_file [DW_FORM_data1] (0x01)
> > >>>>>>             DW_AT_decl_line [DW_FORM_data1] (6)
> > >>>>>>
> > >>>>>> 0x0000004b:   DW_TAG_structure_type [4] *
> > >>>>>>             DW_AT_name [DW_FORM_strp]       (
> > .debug_str[0x0000003e] = “S<int>")
> > >>>>>>             DW_AT_byte_size [DW_FORM_data1] (0x01)
> > >>>>>>             DW_AT_decl_file [DW_FORM_data1] (0x01)
> > >>>>>>             DW_AT_decl_line [DW_FORM_data1] (6)
> > >>>>>>
> > >>>>>>
> > >>>>>> Which basically is what I want, although I don’t like that it
> > introduces a typedef where there is none in the code. I’d prefer that to
> > be:
> > >>>>>>
> > >>>>>> DW_TAG_variable
> > >>>>>> DW_AT_type: -> DW_TAG_structure_type
> > >>>>>>                DW_AT_name: S<A>
> > >>>>>>                DW_AT_specification: -> DW_TAG_structure_type
> > >>>>>>                                          DW_AT_name: S<int>
> > >>>>>>                                          …
> > >>>>>>
> > >>>>>> The patch also has the nice property of omitting the defaulted
> > template arguments in the first level typedef. For example you get
> > vector<A> instead of vector<int, std::__1::allocator<int> >.
> > >>>>>
> > >>>>> If you specify `vector<int>` in C++ do you get that instead of
> > >>>>> `vector<int, std::__1::allocator<int>>`?
> > >>>>>
> > >>>>> Yeah, I mentioned this as possibly causing problems with debuggers
> > or other consumers, but I don't have any proof past "ooooo scary!”.
> > >>>>
> > >>>> Well, [+lldb-dev], could this confuse debuggers? :-)
> > >>>>
> > >>>> -- adrian
> > >>>>>
> > >>>>> That said, I like the idea personally :)
> > >>>>>
> > >>>>> -eric
> > >>>>>
> > >>>>>
> > >>>>>> Now there is one thing I really don’t like about the patch. In
> > order not to emit typedefs for types that don’t need it, I use string
> > comparison between the desugared and the original type. I haven’t
> > quantified anything, but doing the construction of the type name for every
> > template type and then comparing it to decide to use it or not seems like
> > a big waste.
> > >>>>>
> > >>>>> Maybe someone on cfe-dev knows a better way.
> > >>>>>
> > >>>>>>
> > >>>>>> Thoughts?
> > >>>>>>
> > >>>>>> <template-arg-typedefs.diff>
> > >>>>>>
> > >>>>>> Fred
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> llvm-commits mailing list
> > >>>>>> llvm-commits at cs.uiuc.edu
> > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >>>>>
> > >>>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> lldb-dev mailing list
> > >>>> lldb-dev at cs.uiuc.edu
> > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> > >>>
> > >>
> > >>
> > >> _______________________________________________
> > >> lldb-dev mailing list
> > >> lldb-dev at cs.uiuc.edu
> > >> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> > >
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits