[PATCH] D58238: [DebugInfo] MachineSink: Insert undef DBG_VALUEs when sinking instructions, try to forward copies

Fri Feb 15 04:25:36 PST 2019

jmorse added a comment.

Hi,

In D58238#1399186 <https://reviews.llvm.org/D58238#1399186>, @bjope wrote:

> I have some doubt about this. Mainly the part about inserting undefs. Although, some of my doubts are partially based on the big-black-hole regarding how debug info is supposed to work for heavily optimized code (and more specifically how this is supposed to work in LLVM based on what we got today).

No problemo -- my debuginfo knowledge is pretty fresh anyway, most of this is based on educated guesses of how it's supposed to work.

My understanding is that the difficulty comes from the re-ordering of DBG_VALUEs, because it re-orders how variable assignments appear. This can lead to debuggers displaying states of the program (i.e. a set of assignments to variables) that did not exist in the original program. If we take the example and put in DBG_VALUE instructions for the 'z' variable, and then sink the addition as you describe, we get:

  %1 = LOAD ...
  DBG_VALUE %1, %noreg, "x", ...
  %2 = AND ...
  DBG_VALUE %2, %noreg, "z", ...
  %4 = OR %2, ...
  DBG_VALUE %4, %noreg, "z", ...
  %3 = ADD %1, 3
  DBG_VALUE %3, %noreg, "x", ...

If a debugger landed after the OR instruction (before the sunk ADD is executed) then it would observe the value of "y" from line 4, but the value of "x" from line 1, which was overwritten in the original program. This is a state that didn't originally exist -- and it has the potential to mislead developers, for example if iterating over pairs of something and the two values displayed aren't actually paired.

For the exact example you give where there *isn't* any re-ordering, then we're probably needlessly reducing the range where we have variable locations, however that's an exception to a general rule IMHO (and this patch is implementing the general rule).

One response to reordering would be "Yeah, that's going to happen when your code is optimised", which is totally true. IMHO having variables read "optimized out" is slightly superior because it doesn't represent a nonexistant state of the program. That might force developers to dig deeper to find out what's going on (assuming they can step the debugger), but on the other hand they're guaranteed that when there is a variable location, it's accurate. Otherwise developers might not fully trust the debuginfo they're getting.

> Maybe it is a philosophical question. Is the compiler scheduling/reordering source instructions (in the debugger it will appear as source statements are executed in a random order), or are we scheduling/reordering machine instructions (and in the debugger it still should appear as if we execute variable assignments in the order of the source code). VLIW-scheduling, tail merging, etc, of course make things extra complicated, since we basically will be all over the place all the time.

To me it's the latter, my mental model (maybe wrong) is that while the compiler is optimising code, for debuginfo it has a state-machine of variable locations represented by dbg.values that it needs to try and preserve the order of, marking locations undef if they don't exist any more. The logical consequence is that if we had a function where the compiler managed to completely reverse the order of computation, no variable would have a location, which would be sound, but not useful.

> Does it matter that we actually sink into a later BB here? Is this fixing some problem where it from a debugging perspective would appear wrong if the variable does not get the new value before the end of the BB?

The sinking is irrelevant to the question I think, in that it's a general question of "should re-ordered variable assignments be visible?".

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58238/new/

https://reviews.llvm.org/D58238