[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

Mon Oct 5 13:31:03 PDT 2020

> On Sep 16, 2020, at 9:55 AM, Tozer, Stephen <stephen.tozer at sony.com> wrote:
> 
> > That makes sense, and I think for "direct" values in your definition it is true that all direct values are r-values.
> > Why do we need DW_OP_LLVM_direct when we already have DW_OP_LLVM_stack_value? Can you give an example of something that is definitely not a stack value, but direct?
> 
> The difference in definition is the intention: DW_OP_LLVM_direct means "we'd like this to be an l-value if possible", DW_OP_stack_value means "this should never be an l-value". Because of this, an expression ending with DW_OP_LLVM_direct can be emitted as an l-value in any case where the value of the preceding expression is equal to an l-value. So for example:
> 
>   DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_LLVM_direct) => DW_OP_reg7 RSP
>   DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_deref, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+0
>   DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_plus_uconst, 4, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+4, DW_OP_stack_value
> 
> Your point about the semantics of variable assignments in the debugger is useful, that clears up my misunderstandings. I believe that even with that in mind, LLVM_direct (or whatever name it takes) would be appropriate. If we recognize that a variable must be read-only to preserve those semantics, then we can use DW_OP_stack_value to ensure that it is always an r-value. If we don't have any reason to make a variable read-only other than that we can't *currently* find an l-value location for it, then we would use DW_OP_LLVM_direct. Right now we use DW_OP_stack_value whenever we make a complex expression, but that doesn't need to be the case.

Great! It sounds like we reached mutual understanding :-)

> 
> The code below is an example program where we may eventually be able to generate a valid l-value for the variable "a" in foo(), but can't without an alternative to DW_OP_stack_value. At the end of the example, "a" is an r-value, but doesn't need to be: there is a single register that holds its exact value, and an assignment to that register would have the same semantics as an equivalent assignment to "a" in the source. The optimizations taking place in this code are analogous to if we had "a = bar() + 4 - 4;", but because we don't figure out that "a = bar()" in a single pass, we pre-emptively assume that "a" must be an r-value.
> 
> To be able to emit an l-value we would first need the ability to optimize/simplify DIExpressions so that the expression becomes just (DW_OP_stack_value) - this wouldn't be particularly difficult to implement for simple arithmetic. Even with this improvement, the definition of DW_OP_stack_value explicitly forbids the expression from being a register location. If we instead used DW_OP_LLVM_direct, then we would be free to emit the register location (DW_OP_reg0 RAX).
> 
>   // Compile with clang -O2 -g
>   int baz();
>   int bar2(int arg) {
>     return arg * 4;
>   }
>   int bar() {
>     return bar2(1);
>   }
>   int foo() {
>     int a = baz() + bar() - 4;
>     return a * 2;
>   }
> 
> ; Eventually becomes the IR...
>   %call = call i32 @_Z3bazv(), !dbg !25
>   %call1 = call i32 @_Z3barv(), !dbg !26
>   %add = add nsw i32 %call, %call1, !dbg !27
>   %sub = sub nsw i32 %add, 4, !dbg !28
>   call void @llvm.dbg.value(metadata i32 %sub, metadata !24, metadata !DIExpression()), !dbg !29
>   %mul = mul nsw i32 %sub, 2, !dbg !30
>   ret i32 %mul, !dbg !31
> 
> ; Combine redundant instructions, "a" is salvaged...
>   %call = call i32 @_Z3bazv(), !dbg !25
>   %call1 = call i32 @_Z3barv(), !dbg !26
>   %add = add nsw i32 %call, %call1, !dbg !27
>   call void @llvm.dbg.value(metadata i32 %add, metadata !24, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !28
>   %sub = shl i32 %add, 1, !dbg !29
>   %mul = add i32 %sub, -8, !dbg !29
>   ret i32 %mul, !dbg !30
>   
> ; bar() is found to always return 4
>   %call = call i32 @_Z3bazv(), !dbg !14
>   %add = add nsw i32 %call, 4, !dbg !15
>   call void @llvm.dbg.value(metadata i32 %add, metadata !13, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !16
>   %sub = shl i32 %add, 1, !dbg !17
>   %mul = add i32 %sub, -8, !dbg !17
>   ret i32 %mul, !dbg !18
> 
> ; %add is unused, optimize out and salvage...
>   %call = call i32 @_Z3bazv(), !dbg !24
>   call void @llvm.dbg.value(metadata i32 %call, metadata !23, metadata !DIExpression(DW_OP_plus_uconst, 4, DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !25
>   %add = shl i32 %call, 1, !dbg !26
>   ret i32 %add, !dbg !27
> 
>   ; Final DWARF location for "a":
>   DW_AT_location        (0x00000000:
>       [0x0000000000000029, 0x000000000000002b): DW_OP_breg0 RAX+4, DW_OP_constu 0xffffffff, DW_OP_and, DW_OP_lit4, DW_OP_minus, DW_OP_stack_value)

So in this example, if we had DW_OP_LLVM_direct, we would salvage "a" as

DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct) ?

which would mean: "this is an l-value with some additional DWARF operations. The backend should either emit this as a DW_OP_stack_value, or if the DWARF expression turns out to be a no-op, drop the entire DIExpression and emit this as a register or memory location.".
I can see how that could potentially be useful. I'm not sure how often we could practically make use of a situation like this, but I understand your motivation.

If we had DW_OP_LLVM_direct: what would be the semantics of 

DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct)

versus

DIExpression(DW_OP_constu, 4, DW_OP_minus) ?

thanks,
adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201005/1140ed64/attachment-0001.html>