[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

Fri Sep 4 08:59:23 PDT 2020

> On Sep 4, 2020, at 3:00 AM, Tozer, Stephen <stephen.tozer at sony.com> wrote:
> 
> > Yeah, because that decision can only be made much later in LLVM in AsmPrinter/DwarfExpression.cpp.
> > In DWARF, DW_OP_reg(x) is a register l-value, all others can either be l-values or r-values depending on whether there is a DW_OP_stack_value/DW_OP_implicit* at the end. 
> 
> Yes, it might not be clear but that's what I'm trying to say. Out of the non-empty DWARF locations, register and memory locations are l-values, implicit locations are r-values. You can technically use DW_OP_breg in an l-value, but not for register locations. This is why when we have a DBG_VALUE that has a single register location operand with an otherwise empty DIExpression, we need some indicator to determine whether we want to produce the register location [DW_OP_reg] or the memory location [DW_OP_breg] (currently this indicator is the indirectness flag). 
> 
> > I think it would be confusing to talk about registers at the LLVM IR / DIExpression level. "SSA-Values"?
> 
> I think terminology is a bit difficult here because this work concerns both the llvm.dbg.value intrinsic and the DBG_VALUE instruction, which operate on different kinds of arguments. I think "location operands" is probably the best description for them, since they are operands to a DIExpression which is used to compute the variable location.
> 
> > I don't think that's correct, because a DW_OP_stack_value is an rvalue. But maybe I misunderstood what you were trying to say.
> > We should start be defining what DW_OP_stack_value really means in LLVM debug info metadata. I believe it should just mean "r-value".
> 
> Having given it some more thought, I've changed my mind - I agree that we shouldn't use DW_OP_stack_value in this case, because it would be changing its meaning which is to explicitly declare the expression to be an implicit location/r-value. My current line of thinking is that it would be better to introduce a new operator, named DW_OP_LLVM_direct or something similar, which has the meaning "the variable's exact value is produced by the preceding expression", and would replace DW_OP_stack_value as it is currently used within LLVM.

Can you elaborate what "direct" means? I'm having trouble understanding what the opposite (a non-exact value) would be.

> 
> To summarise the logic behind using this operator: LLVM debug info does not need to explicitly care about r-values or l-values before DWARF emission,

I don't think that statement is correct. Based on the semantics, LLVM IR knows that a dbg.declare is an l-value — the debugger can write to it and the value will be changed when continuing the program execution. It can also decide that a "working copy" of the value, described by a dbg.value is a legit read-only representation of the variable, but can't be written to because, e.g., the value exists in more than one place at once.

At the moment we don't make the lvalue/rvalue distinction in LLVM at all. We make an educated guess in AsmPrinter. But that's wrong and something we should strive to fix during this redesigning.

> only whether we're describing a variable's memory location, a variable's exact value, or some other implicit location (such as implicit_pointer). Whether an expression is an r-value or l-value can be trivially determined at the end of the pipeline (addMachineRegExpression already does this).

As stated above, I don't think we can trivially determine this, because (at least for dbg.values) this info was lost already in LLVM IR. Unless we say the dbg.declare / dbg.value distinction is what determines lvalues vs. rvalues.

> 
> For an expression ending with DW_OP_LLVM_direct: if the preceding expression is only a single register then we emit a register location, if the preceding expression ends with DW_OP_deref then we can remove the deref and emit a memory location, and otherwise we emit the expression with DW_OP_stack_value. In expression syntax it would behave like an implicit operator, in that it can only appear at the end of an expression and is incompatible with any implicit operators, including DW_OP_stack_value. 
> 
> The alternative I see for this is using a flag or a new DIExpression operator that explicitly declares a single register DBG_VALUE to be a register location, while it would otherwise be treated as a memory location, and use stack_value for all other cases. The main reason I prefer the "direct" operator is that LLVM doesn't need to know whether a DIExpression results in an l-value location or an r-value location; it only needs to know how to compute the variable's location and then determine whether that computation resolves to an l-value or r-value at the end. Maintaining two separate representations for stack value locations and register locations when we don't need to is an unnecessary burden, especially when it may be possible for a given dbg.value/DBG_VALUE to switch back and forth between them.

I do think that your insight that we need one (or more?) additional discriminator of some kind is correct — we just need to find the right semantics for it.

thanks,
adrian