[PATCH] D85555: [InstCombine] Remove dbg.values describing contents of dead allocas

Wed Sep 30 15:27:25 PDT 2020

vsk added a comment.

In D85555#2213416 <https://reviews.llvm.org/D85555#2213416>, @jmorse wrote:

> Everything to do with leaked pointers seems to have a bunch of compromises -- I think this patch alters some of them, and I don't have a strong opinion as to whether it's good or not. With `dbg.value`s placed at call sites:
>
> 1. The variable might refer to a stack location containing a stale value, if a store was DSE'd,
> 2. The stack location might be optimised out (which is what this patch addresses),
> 3. But if the callee stores to the stack location, the variable location accurately reflects this.
>
> By fixing 2, we lose some of 3: In the test case you add, if you add a:
>
>   store i32 1, i32* %addr, align 4
>
> To the 'use' function, then t2 doesn't reflect this after -instcombine -inline -instcombine:
>
>   define void @t2(i32 %x) !dbg !17 {
>     call void @llvm.dbg.value(metadata i32 %x, metadata !18, metadata !DIExpression()), !dbg !19
>     ret void
>   }
>
> Specifically, the "old" value %x would be presented as the variable value on the 'ret 'instruction, wheras at -O0 the variable location on the stack would contain the '1' stored by 'use'. Without this patch, the variable is reported as optimized-out.
>
> I don't really see this as being a bug:  we just don't cope well with leaked pointers, and stale values can appear as a result.
>
> [0] https://bugs.llvm.org/show_bug.cgi?id=34136#c25

Yes, I think that's correct. Without replacing LowerDbgDeclare I think we're forced into making compromises. And to replace LowerDbgDeclare, we might need a way to model the effect of any memory write on local variables (including memcpy's and stores that happen in a callee -- things that LowerDbgDeclare doesn't handle today).

Imo, this "dbg.value(<dead alloca>)-before-no-op-call" situation pops up enough that I think we'll want to do better than falling back to <optimized out>.

================
Comment at: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:2655
+    //     dbg.value(i32* %a, "arg0", DW_OP_deref)
+    //     call void @trivially_inlinable_no_op(i32* %a)
+    //     ret void
----------------
dblaikie wrote:
> vsk wrote:
> > vsk wrote:
> > > dblaikie wrote:
> > > > vsk wrote:
> > > > > dblaikie wrote:
> > > > > > What if there was a later store to this alloca? Wouldn't removing the dbg.value with deref then let the %0 location persist, passing through the store (without reflecting that store in the dbg.value) - changing the values described in the DWARF, possibly incorrectly?
> > > > > If there was a store to the alloca after the call, InstCombine must have inserted a dbg.value after the store during LowerDbgDeclare. Removing the dbg.value(..., DW_OP_deref) before the call would extend the range of the initial %0 location until the dbg.value for the store.
> > > > Rather than what InstCombine would've done - I'm interested in what the IR semantics/contract are.
> > > > 
> > > > I guess dbg.value only would be used to represent the load from the alloca, not the value in the alloca itself? So a future store to the alloca is unrelated to any dbg.value derived from (what must be loads) from the alloca?
> > > dbg.value can be used to indicate the value inside of an alloca starting at a specific position in the instruction stream, and continuing until the the dbg.value is superseded (e.g. by a dbg.value intrinsic describing another load/store from/to the alloca, or by the register holding the loaded value being clobbered).
> > > 
> > > When a variable backed by an alloca stops being described by a dbg.declare, and transitions to being described by a series of dbg.value intrinsics, a store to the alloca would be unrelated to any dbg.value derived from the alloca. A problem with this approach is that unless specific effort is made to insert a dbg.value after each store-like instruction which modifies the alloca, the debug info becomes less precise.
> > > 
> > > There's a FIXME describing this problem above the comment describing ShouldLowerDbgDeclare. I don't think we can get away from LowerDbgDeclare in general, but we could push towards using it only when we know an alloca is to be deleted (i.e. do it lazily, instead of eagerly like we do today). I'm not sure whether this patch would still be necessary if LowerDbgDeclare were lazy, but will look into this.
> > I looked into deferring LowerDbgDeclare to the points where llvm actually needs to erase an alloca, but found that doing so might not be practical. The problem is that llvm can delete stores into an alloca without deleting the alloca itself. Without lowering dbg.declares eagerly, we may lose track of updates to a variable that don't get stored back into an alloca. So I believe some version of this patch is necessary.
> I think I follow now - though I think it'd still be beneficial to describe this in terms of the IR contract rather than with reference to other passes in the pipeline. "Given this IR, doing X would produce this erroneous result (the IR would now describe the value of a variable as having some value it never had in the input IR, for instance)".
> 
> In any case, I'm likely not the best person to review the code change here, not being as familiar with debug variable location tracking - happy to bow out/leave it up to others even if they disagree with me on this point (though hope they take it under consideration). The test point I made elsewhere I still feel pretty strongly about.
I don't feel so strongly about keeping the full-pipeline test. I noticed that Jeremy was able to come up with a nice example based on it, so perhaps we can leave it around as a comment? It should be fairly compact (hopefully not too distracting from the more narrow instcombine-only test) and wouldn't actually run this way.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85555/new/

https://reviews.llvm.org/D85555