[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

Tue Sep 15 10:56:40 PDT 2020

> That sounds to me to be the same concept that I am calling r-value vs. l-value. Do you agree, or is there some subtlety that I am missing?

I've been assuming that the l-value vs r-value distinction is analogous to C++: an l-value can be written to by the debugger, an r-value cannot. A memory location and a register location are both l-values, while implicit locations (stack value/implicit pointer) are r-values. Directness on the other hand, I've been using to mean "the variable's value is equal to the value computed by the DIExpression". Applied to a DWARF expression this description would only refer to a stack value. LLVM's DIExpressions are different however, because we don't have DW_OP_reg or DW_OP_breg: we simply refer to the register and use context to determine which it should be. The best example of this is from this example[0], expanded on here with the location type:

$noreg, (plus_uconst, 8),                -> DW_OP_breg7 RSP+8)                    Memory    l-value  Indirect
     0, (plus_uconst, 8),                -> DW_OP_breg7 RSP+8)                    Memory    l-value  Indirect
$noreg, (plus_uconst, 8, stackval),      -> DW_OP_breg7 RSP+8, stackval           Stack     r-value  Direct
     0, (plus_uconst, 8, stackval),      -> DW_OP_breg7 RSP+8, stackval           Stack     r-value  Direct
$noreg, (plus_uconst, 8, deref),         -> DW_OP_breg7 RSP+8                     Memory    l-value  Indirect
     0, (plus_uconst, 8, deref),         -> DW_OP_breg7 RSP+8, deref              Memory    l-value  Indirect
$noreg, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref, stackval    Stack     r-value  Direct
     0, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref, stackval    Stack     r-value  Direct
$noreg, (),                              -> DW_OP_reg7 RSP                        Register  l-value  Direct
     0, (),                              -> DW_OP_breg7 RSP+0                     Memory    l-value  Indirect
$noreg, (deref),                         -> DW_OP_breg7 RSP+0                     Memory    l-value  Indirect
     0, (deref),                         -> DW_OP_breg7 RSP+0, DW_OP_deref        Memory    l-value  Indirect

The point of using DW_OP_LLVM_direct is that it supplants the current directness flag, without the redundancy or unintuitive behaviour that the flag does. I believe the only reason that the indirectness flag is necessary right now is to allow register locations to be emitted, by delineating the register location "$rsp, $noreg, ()" -> DW_OP_reg7 RSP" from the memory location "$rsp, 0, () -> DW_OP_breg7 RSP+0". Outside of this case the existing representation is sufficient for all other locations, and ideally the indirectness flag would have no effect (although unfortunately it does). The justification for DW_OP_LLVM_direct rests on the idea that we will generally choose to produce "DW_OP_reg7 RSP" instead of "DW_OP_breg7 RSP, DW_OP_stack_value". It doesn't prevent us from using DW_OP_stack_value instead if we have an exception, but I don't believe there are any. Using DW_OP_LLVM_direct instead of directness and stackval for the table above, we get this:

(plus_uconst, 8),                     -> DW_OP_breg7 RSP+8                     Memory    l-value  Indirect
(plus_uconst, 8, LLVM_direct),        -> DW_OP_breg7 RSP+8, stackval           Stack     r-value  Direct
(plus_uconst, 8, deref),              -> DW_OP_breg7 RSP+8, deref              Memory    l-value  Indirect
(plus_uconst, 8, deref, LLVM_direct), -> DW_OP_breg7 RSP+8, deref, stackval    Stack     r-value  Direct
(),                                   -> DW_OP_breg7 RSP+0                     Memory    l-value  Inirect
(LLVM_direct),                        -> DW_OP_reg7 RSP                        Register  l-value  Direct
(deref),                              -> DW_OP_breg7 RSP+0, deref              Memory    l-value  Indirect
(deref, LLVM_direct),                 -> DW_OP_breg7 RSP+0, deref, stackval    Stack     r-value  Direct

Two of the examples in this table should be excluded from actual use: the two rows that end with "deref, LLVM_direct" shouldn't be produced within LLVM, because we can cancel the two operators out to give a memory location rather than producing a "deref, stackval" expression. This can be done in LLVM itself through the DIExpression interface, so that we don't hold DIExpressions in an incorrect intermediate state. I'm currently operating under the belief that if a dbg.value can be an l-value, it always should be; if not, then we can use DW_OP_stack_value instead in all cases where we require a given dbg.value to be an r-value.

To give an example of why having this could be more useful than just applying stack value, consider a hypothetical "DIExpression optimizer" pass applied to the following code:

    // `int a` is live...
    int b = a + 5;
    int c = b - 5;

If both b and c are optimized out and salvaged, then we end up with the following dbg.values:

    @llvm.dbg.value(i32 %a, !"b", !DIExpression(DW_OP_plus_uconst, 5, DW_OP_LLVM_direct))
    @llvm.dbg.value(i32 %a, !"c", !DIExpression(DW_OP_plus_uconst, 5, DW_OP_constu, 5, DW_OP_minus, DW_OP_LLVM_direct))
    ; DIExpressions optimized...
    @llvm.dbg.value(i32 %a, !"b", !DIExpression(DW_OP_plus_uconst, 5, DW_OP_LLVM_direct))
    @llvm.dbg.value(i32 %a, !"c", !DIExpression(DW_OP_LLVM_direct))

In this admittedly strange case, we start with b and c as l-values (before they are optimized out), they then become r-values due to optimization, and finally c is a valid l-value again. If we instead applied DW_OP_stack_value when we salvage, then c would not be recovered as an l-value. If we had DW_OP_implicit_ptr instead of DW_OP_LLVM_direct, then the result would be an r-value either way; likewise if we were referencing a memory location, the result would be an l-value regardless of how we modified it.

I suspect there may be disagreements over whether c should share an l-value location with a, since this means that a user could write to either c or a, and that doing so would assign to both of them. My personal belief is that even if it seems confusing, we shouldn't arbitrarily restrict write-access to variables on criteria that will not always be clear to a debug user; whether or not to apply such a restriction should be left to the debugger, rather than being baked into the information we produce for it. Personal opinions aside, the other reason I'm taking this approach right now is that it matches LLVM's existing behaviour. If we have the source code:

    int a = ...
    int b = a;

We will produce the IR:

    call void @llvm.dbg.value(metadata i32 %a, metadata !13, metadata !DIExpression()), !dbg !15
    call void @llvm.dbg.value(metadata i32 %a, metadata !14, metadata !DIExpression()), !dbg !15

This IR will in turn produce register locations for both variables at the same location. Based on that, I believe that the current expected behaviour is that if two source variables map to the same actual location then they should share a DWARF location.

> Here is an example of why I think that an optimization pass must have the ability to downgrade an l-value to an r-value:
> ...
> The optimization eliminated the store of constant 0 to %mem and replaced all loads of %mem in the subsequent block with a constant "i32 0". This means that we need mark the first dbg.value (that would otherwise look like an l-value) as an r-value, because writing a new value to %mem there would not affect the code that is now hardcoded to use a constant 0 value for x.

The behaviour seen by the user in this case is:

; BEFORE
call %llvm.dbg.declare(%mem, !DILocalVariable("x"))
store i32* %mem, i32 %foo
; "x" is set to %foo, and can be written to
...
store i32* %mem, i32 0
; "x" is set to 0, and can be written to
...
store i32* %mem, i32 %foo
; "x" is set to %foo, and can be written to
...

; AFTER
call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_deref))
store i32* %mem, i32 %foo
; "x" is set to %foo, and can be written to
...
call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_constu 0, DW_OP_stack_value))
; "x" is set to %foo, and is read-only
...
call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_deref))
; "x" is set to its value prior to 0, and can be written to
...

I don't think there's any issue with a variable being an r-value at some points in its live range and an l-value at others; in this case I think it's correct that the first dbg.value should be an l-value. Any write to x will not affect the code after the eliminated store, but even without optimizations x would be set to 0 (overriding any debugger assignment) at that point anyway. I do agree that the code produced may be slightly confusing to a user; this code likely maps to something along the lines of:

    int x = foo;
    ...
    x = 0;
    ...
    x = foo;

If the user breaks just after the first "x = foo" and assigns `x = 5` in the debugger, they will see the correct result for all subsequent uses of x until the next assignment to x. When they step over "x = 0", x has the value 0 (as expected) and it becomes read-only. Finally after stepping over the next "x = foo" they will see that `x == 5`, which might not make a lot of sense when the user is expecting it to be assigned a different value. Even so, this information is a correct representation of the program state.

The alternative of making the first dbg.value an r-value is restrictive - if the initial assignment to x is at the top of a large function that the user is debugging, and the other two assignments occur at the very end, it would likely be frustrating to the user that they have no write-access to x throughout the function. Because of this I don't believe that it would be right to make the first dbg.value an r-value; I think again that choosing to apply these restrictions should be left to the debugger.

[0] https://bugs.llvm.org/show_bug.cgi?id=41675#c8
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/c0ca235b/attachment-0001.html>