[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Mon Jun 22 05:57:39 PDT 2020

Hi Adrian,

> quite a lot of work

Large amounts of credit should go to llvm-reduce, which was fundamental to
getting any of this done,

> I've skimmed your implementation and it looks nice and well-documented.

The thing that worries me is the over-complicated lattices -- I didn't
anticipate the problem, and there's a risk that it's more complex than it
needs to be. As it sounds like getting this reviewed and landed is feasible,
I'll write about that (and a worked example) in the review.

> I was wondering about potential downsides of tracking values. I noticed that
> a surprising amount of developers like to modify variables in the debugger
> [...]

This is really interesting, and as you say it's something that is already
happening today. Consider this:

  #include <stdbool.h>
  extern void ext(int);
  extern bool extb(int, int);
  void foo(int bar, int baz) {
    ext(bar);
    bool cont = true;
    while (cont) {
      cont = extb(bar, baz);
    }
  }

Compiled with a recent master -O2 -g and main() / other functions in another
file, we get this disassembly and location list for 'bar':

   0x0000000000400483 <+3>:     mov    %esi,%ebx
   0x0000000000400485 <+5>:     mov    %edi,%ebp
=> 0x0000000000400487 <+7>:     callq  0x4004d0 <ext>

DW_AT_location    (0x00000000:
  [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI
  [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP)

Stepping through the function, we stop on the call to 'ext', and can set the
value of 'bar', but because there are two locations (and LiveDebugValues picked
the long term register $ebp rather than $edi), you can set "bar" but it has
no effect on the call to 'ext'.

AFAIUI, this is never a problem at -O0 because there's only ever one location
for a variable. With optimisations, and without a way to describe multiple
locations for a variable in DWARF, I don't think it's generally solvable.
Plus, if you had two variables that have been CSE'd  into being the same value,
setting one will modify the other. These are all consequences of optimisations
changing the structure of the program.

The best guarantee that I imagine we could give, is that for any given block,
the variable location in that block is the location that instructions read
from. That way, if you modify the variable in a block, it's very likely that
the next few statements will read that modified variable. It's eminently
do-able with the new implementation, as location selection is the last thing
that happens and is done on a per-block basis, although it wouldn't be free
(in terms of performance cost). I think the "value is only read from one
location" idea is true of SelectionDAG, but it might fall apart with other
optimisations.

> When tracking values, do we keep track of the original DBG_VALUE's !dbg
> location to know when we need to stop propagating? [...] insn4 has a stale
> value for the variable "v"

I don't believe there's any relationship between DBG_VALUE locations and !dbg
source locations right now. This is actually one of my pet peeves, that
variable locations and the line program don't necessarily line up into
something coherent. It's definitely something that leads to misleading program
states being presented today; there are at least two bugs.llvm.org reports that
are caused by such problems. [Can't find them right now as bugzilla is
throwing errors]. Defining an order on !dbg locations, and building DBG_VALUEs
into that order, would be really useful for ensuring correctness

> If we had such a facility: would it be possible to integrate this into the
> new pass implementation?

In the simple case, easily: in your example, there would just be an extra
layer of mapping from source location to DBG_VALUE; and each DBG_VALUE would
have a well defined value number. We could even have insn4 set the variable
location to %y's value if it's still available somewhere.

If variable locations are defined by source location however, this undermines
the meaning of how LiveDebugValues determines variable locations when control
flow merges: there wouldn't be a simple "variable location in predecessor
block" any more. And the merged "live in" location wouldn't necessarily mean
anything to the source locations in the block.

Stepping further back, we'd also lose one of the freebies that the current
design gives us: when variable values merge at a PHI node, but there's no
actual PHI instruction in the IR (because there's no subsequent IR use), we
don't create a "dbg.phi(...)" instruction, we just ignore it and let
LiveDebugValues patch it up later, if a location can be recovered. If every
source location needed to be connected to a variable location record, we would
need to represent such debug-only PHIs much earlier in compilation
(or drop them).

On the other hand, what you're describing (plus the instruction referencing
work) is something that doesn't require debug instructions in the IR or MIR,
which is highly desirable. We could just store a set of
{instruction references, variable / fragment / expr, source locations} and
build a location list out of that. Definitely worth pursuing, and value
tracking would definitely enable such designs.

--
Thanks,
Jeremy