[PATCH] D41697: [DebugInfo][Metadata] Add support for a DIExpression as 'count' field of DISubrange.

Fri Jan 5 09:56:56 PST 2018

aprantl added a comment.

In https://reviews.llvm.org/D41697#968450, @sdesmalen wrote:

> Hi @aprantl, I understand the confusion now, sorry for not making this more clear earlier on! Let me clarify the use-cases by answering your questions.
>
> > If you use DIExpressions for the pseudo-register, why do you still need to reference other metadata nodes in DISubranges, is that orthogonal?
>
> That is indeed orthogonal. I basically see two separate reasons to extend DISubrange:
>
> 1. Make 'count' more flexible to allow specifying with a DIExpression.
> 2. Make 'count' more flexible to be able to reference another metadata node.
>
>   Reason 1 is to express the type of an SVE vector, we can generate (in Clang) a DISubrange that expresses the number of elements in the vector (i.e. 'count') as a function of Reg46, defined as the number of 64bit 'granules' in a scalable vector. So for a 128 bit vector reg46 will be '2', for a 256bit vector '4', and so on. So, for a type '<n x 2 x i64>' (meaning "a scalable vector containing 'n x 2' i64 elements"), the resulting DWARF expression would be (DW_OP_reg46). For type '<n x 4 x i32>' the DWARF expression would be twice the amount of elements, so (DW_OP_reg46, DW_OP_constu, 2, DW_OP_mul).
>
>   Reason 2 is to express the type/size of a variable length array by referencing a size expression that may reside in memory or in a register.
>
>   As you pointed out, there is also a third case that ties together a Metadata reference and a DIExpression (e.g. DIExpression(!1, !2, DW_OP_plus)), which would be required for more complicated cases. I have not tried to address something like that in this patch-series since it is not required to implement the C99 VLA support, but we confirmed that there are cases where this is useful/required.
>
> > Were you thinking of generating the DW_OP_breg46 right in the frontend? That would be certainly doable, but it is a departure from how we currently deal with DIExpressions, where DW_OP_reg operands are only generated in the backend.
>
> Not sure if there is a better way, but we indeed have a downstream implementation to do this in Clang, since this is the place where the type Metadata for vectors/arrays is created.
>
> > But will these expressions consist of a single DW_OP_reg46 or do you need to generate more complex expressions?
>
> These expressions will be slightly more involved than just DW_OP_reg46, as shown above.

Thanks, that helped!
Let's discuss reason 1 first. The two approaches to expressing Reg46 that I see are:

1. hard-coding a DW_OP_breg46 DW_OP_constu, 2, DW_OP_mul, DW_OP_stack_value in the frontend Note that you can't combine DW_OP_regX with other operators, you have to use a combination of DW_OP_bregX DW_OP_stack_value instead (cf. DWARF5 section 2.6.1ff). You will have to extend the backend (DwarfExpression.cpp) and the Verifier to accept a DW_OP_(b)reg inside a DIExpression.
2. Use an approach similar to llvm.dbg.value to bind the register and the DIExpression. But since Reg46 isn't actually used in the program, you would have to, e.g., create an intrinsic that returns Reg46, which is probably a lot more work for no clear benefit.

So I think your plan to generate the DW_OP_(b)reg in the frontend is good.

Now for reason 2. I also agree that this is an interesting use-case to support, and other language frontends, such as Fortran will benefit from this, too.
The interesting question here is again how to best bind the location to the rest of the DIExpression. For regular variables we use the dbg.value intrinsic to do this:

  %ptr = ...  ; in our example, a pointer to the variable
  call @llvm.dbg.value(metadata %ptr, metadata !DIVariable(name: "x", ...), metadata !DIExpression(DW_OP_deref))

This way metadata never points back to IR, and DIExpressions can be shared by multiple intrinsics.

As you proposed, we can create a pseudo-variable to inject a location into a DIRange:

  %length = load i32 %array...
  call @llvm.dbg.value(metadata %length, metadata !1, metadata !DIExpression())
  ...
  !1 = DIVariable(name: "$count", ...)
  !2 = !DIRange(count: metadata !1)

A couple of questions now pop up:

- It looks like we are dropping the DIExpression because we link the DIVariable to the DIRange, not the dbg.value. Note that many LLVM optimizations augment DIExpressions.
- I think this is a problem you brought up earlier, too: What happens when optimizations cause multiple dbg.value's to appear throughout the function? We only have one array type — which dbg.value holds the "right" location?

Okay, dug through the DWARF specification some more:
DW_AT_count/lower_bound/upper_bound can have a "reference" as value, which may point to another DIE, as usual the spec is vague on how exactly this should be used, but it looks like it could point to a DW_TAG_variable. If that is the case, then there is no reason for us to allow a combination of DIVariable and DIExpression in DIRange, because if you need both then the DIExpression should apply to the variable and be bound via llvm.dbg.values.

This doesn't solve the problem of how to combine two variables in one expression, but to me it looks like this question can be resolved independently by providing a mechanism for doing so in llvm.dbg.value.

So long story short, I think your proposal is good.

https://reviews.llvm.org/D41697