[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Tue Sep 5 13:00:12 PDT 2017

Debug info today handles two cases reasonably well:
1. At -O0, dbg.declare does a good job describing variables that live at
some known stack offset
2. With optimizations, variables promoted to SSA can be described with
dbg.value

This leaves behind a large hole in our optimized debug info: variables that
cannot be promoted, typically because they are address-taken. This is
https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and
heuristically inserts dbg.values where it can identify the value of the
variable in question. This prevents us from having misleading debug info,
but it throws away information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t
mix and match dbg.value with dbg.declare. If the backend sees a
dbg.declare, it accepts that information as more reliable and discards all
DBG_VALUE instructions associated with that variable. So, we need something
we can mix. We need a way to say, the variable lives in memory *at this
program point*, and it might live somewhere else later on. I propose that
we introduce DW_OP_LLVM_memory for this purpose, and then we transition
from dbg.declare to dbg.value+DW_OP_LLVM_memory.

Initially I believed that DW_OP_deref was the way to say this with existing
DWARF expression opcodes, but I implemented that in
https://reviews.llvm.org/D37311 and learned more about how DWARF
expressions work. When a debugger begins evaluating a DWARF expression, it
assumes that the resulting value will be a pointer to the variable in
memory. For a debugger, this makes sense, because debug builds put things
in memory and even after optimization many variables must be spilled. Only
the special DW_OP_regN and DW_OP_stack_value expression opcodes change the
location of the value from memory to register or stack value.

LLVM SSA values obviously do not have an address that we can take and they
don’t live in registers, so neither the default memory location model nor
DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically
repurpose DW_OP_stack_value to indicate that the SSA value passed to
llvm.dbg.value *is* the variable’s value, and if the expression lacks
DW_OP_stack_value, it must be a the address of the value. However, that is
backwards incompatible and it seems like quite a stretch.

DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It
would only be valid at the end of a DIExpression. The backend will always
remove it because the debugger will assume the variable lives in memory
unless it is told otherwise.

For the original problem of improving optimized debug info while avoiding
inaccurate information in the presence of dead store elimination, consider
this C example:
  int x = 42;  // Can DSE
  dostuff(x); // Can propagate 42
  x = computation();  // Post-dominates `x = 42` store
  escape(&x);

We should be able to do this:
  int x; // eliminate `x = 42` store
  dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug
info
  dostuff(42); // propagate 42
  dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory
again
  x = computation();
  escape(&x);

Passes that delete stores would be responsible for checking if the store
destination is part of an alloca with associated dbg.value instructions.
They would emit a new dbg.value instruction for that variable with the
stored value, and clone the dbg.value instruction that puts the variable
back in memory before the killing store. If the store is dead because
variable lifetime is ending, the second dbg.value is unnecessary.

This will also allow us to fix debug info for px in this example:
 void __attribute__((optnone, noinline)) usevar(int *x) {}
  int main(int argc, char **argv) {
    int x = 42;
    int *px = &x;
    usevar(&x);
    if (argc) usevar(px);
  }

Today, we emit a location for px like `DW_OP_breg7 RSP+12`, which gives it
the incorrect value 42. This is because our DBG_VALUE instruction for px’s
location uses a frame index, which we assume is in memory. This is not the
case, px is not in memory, it’s value is a stack object pointer.

Please reply if you have any thoughts on this proposal. Adrian and I hashed
this out over Bugzilla, IRC, and in person, so it shouldn’t be too
surprising. Let me know if you want to be CC’d on the patches.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170905/2df8a4c5/attachment.html>