[PATCH] D143967: [DebugInfo][BPF] Add 'btf:type_tag' annotation in DWARF

Wed Mar 29 11:25:15 PDT 2023

dfaust added a comment.

In D143967#4220233 <https://reviews.llvm.org/D143967#4220233>, @jemarch wrote:

>> eddyz87 added a comment.
>>
>> In D143967#4200331 <https://reviews.llvm.org/D143967#4200331> https://reviews.llvm.org/D143967#4200331, @dfaust wrote:
>>
>>> In D143967#4197288 <https://reviews.llvm.org/D143967#4197288> https://reviews.llvm.org/D143967#4197288, @eddyz87 wrote:
>>>
>>>> ...
>>>> I think there are two sides to this:
>>>>
>>>> - conceptual: is it ok to allow annotations for CVR modifiers?
>>>
>>> Maybe someone more an expert in DWARF could chime in whether this is problematic?
>>> As far as I know it would be "ok" in the sense that it should not
>>> break DWARF or cause issues for DWARF consumers which do not know
>>> about the tags.
>>>
>>> My initial reaction was that placing the annotations on CVR
>>> modifiers makes less sense conceptually than placing them on the
>>> types proper.
>>
>> I agree conceptually.
>
> That was my initial reaction too.  But see below.
>
>>> But, I suppose it is a question of what precisely the annotated type
>>> is. e.g. in `volatile int __attribute__((btf_type_tag("__b"))) b;`
>>> is it the case that:
>>>
>>> - type tag "__b" applies to the type `volatile int` (i.e. cvr quals  "bind more closely" than the tag), or
>>> - type tag "__b" and `volatile` both apply to `int` (i.e. cvr quals and type tags are "equal")
>>>
>>> For all other cases the new "btf:type_tag" format places the
>>> annotation as a child of exactly the type that is annotated, so I
>>> think the answer determines where the annotation logically belongs
>>> in DWARF:
>>>
>>> - If the tag applies to `volatile int` then it should be a child of the volatile DIE.
>>> - If the tag applies to `int` it should be a child of the integer type DIE.
>
> I think that is the main point.  DWARF composes types by chaining DIEs
> via DW_AT_type attributes, like in:
>
>   const -> volatile -> pointer-to -> int
>
> Conceptually, there are four different types in that chain, not just
> one:
>
> 1. const volatile pointer-to int
> 2. volatile pointer-to int
> 3. pointer-to int
> 4. int
>
> Ideally speaking, we would have liked to implement the annotations just
> like the qualifiers, so we could have something like:
>
>   const -> volatile -> annotated -> pointer-to -> int
>
> That is what would suite us best conceptually speaking.  But since that
> would break DWARF consumers, we decided to go with a parent-child
> relationship instead, having:
>
>   const -> volatile -> pointer-to -> int
>                            |
>                        annotated
>
> Which would be a different situation than, say:
>
>   const -> volatile -> pointer-to -> int
>     |
>   annotated
>
> So, I think it only makes sense to have the child annotation node in the
> node that starts the type to which the annotation refers.  Both
> qualifiers and type DIEs are types on that sense.

Ok, I understand your point. 
For example if we have:

  volatile int __tag1 b;

The tag applies to the type (volatile int) and we have 3 types total:

1. __tag1 volatile int
2. volatile int
3. int

And in DWARF the tag must be child of the qualifier:

  var(b) -> volatile -> int
               |
              __tag1

Doing it the other way would be incorrect - if we instead did this:

  var(b) -> volatile -> int
                          |
                       __tag1

We encode a different chain of types:

  var(b) -> volatile -> __tag1 -> int

Which reflects a different set of types than the ones we actually have:

1. volatile __tag1 int
2. __tag1 int
3. int

So I guess it is clear, conceptually the tags do belong on the qualifiers.

> And now that I think about this.  In this conceptual situation:
>
>   const -> volatile -> annotated (foo) -> annotated (bar) -> pointer-to -> int
>
> I guess we are encoding it with several DW_TAG_LLVM_annotation children:
>
>   const -> volatile -> pointer-to -> int
>                            |
>                  +---------+---------+
>                  |                   |
>             anotated (foo)    annotated (bar)
>
> Basically accumulating/reducing what are conceptually two different
> types into one.  Since typedefs are explicitly encoded in the chain as
> their own node, this situation will only happen when there are several
> tags specified consecutively in the source code (this is another reason
> why putting the annotation DIEs as children of the qualifiers is
> necessary, because otherwise we would be accumulating/reducing
> annotation across these nodes and we could end with situations where
> annotations get lost.)
>
> When accumulating/reducign tags like above:
>
> Does the ordering matter here?  Does it matter?  Should we require
> keeping the order among the annotation siblings as part of the
> DW_TAG_LLVM_annotation specification?  Even if we don't, we probably
> want to behave the same in both compilers.

A quick check suggests order does not matter for DWARF for normal qualifiers:

const volatile int x;
volatile const int y;

Both compilers emit something like this ('x' and 'y' share type):

  0x0000001e:   DW_TAG_variable
                  DW_AT_name	("x")
                  DW_AT_type	(0x00000029 "const volatile int")

  0x00000029:   DW_TAG_const_type
                  DW_AT_type	(0x0000002e "volatile int")

  0x0000002e:   DW_TAG_volatile_type
                  DW_AT_type	(0x00000033 "int")

  0x00000033:   DW_TAG_base_type
                  DW_AT_name	("int")

  0x00000037:   DW_TAG_variable
                  DW_AT_name	("y")
                  DW_AT_type	(0x00000029 "const volatile int")

Interestingly, GCC and LLVM do not behave exactly the same in that in 
GCC the DWARF types for both variables are

  x, y   -> volatile -> const -> int

while in LLVM (shown above) it is

  x, y -> const -> volatile -> int

So I'm not sure we necessarily need both compilers to be exactly the same.

> In case two identical annotations are applied to the same type, are we
> deduplicating them?  Should we actually require that?  Even if we don't,
> we probably want to behave the same in both compilers.

FWIW, if you write something like:

  int __attribute__((btf_type_tag("__c"))) __attribute__((btf_type_tag("__c"))) c;

GCC de-duplicates attributes with the same value, so the tag "__c" will only appear
once. I don't know if this would be easy to change if we wanted to.

> (Note that since DW_TAG_typedef is also a type in the type chain, we
>  shouldn't worry about

Looks like this got cutoff.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143967/new/

https://reviews.llvm.org/D143967