[llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel

Wed Feb 26 03:23:27 PST 2020

Hi Adrian,

On Tue, Feb 25, 2020 at 4:46 PM Adrian Prantl <aprantl at apple.com> wrote:
> Makes sense so far.

Great to hear,

> > A solution:
> >
> > [To be clear, I haven't tried to implement this idea yet as I wanted feedback,]
> >
> > I'd like to suggest that we can represent variable locations in the
> > codegen backend / MIR with three things:
> > * The instruction that defines the value of the variable,
> > * The operand of that instruction into which the value is written,
> > * The position in the instruction stream where the assignment of this
> > value to the variable occurs
>
> What about constants and memory locations?

Good question, and something I've not done too much thinking about. As
far as I'm aware, memory references today are all register based, with
the DIExpression expressing any memory operations. That should
translate to the proposed model naturally: memory locations would be
instruction references with a suitable DIExpression qualifying the
value.

Constants are trickier; it's probably easiest to keep DBG_VALUE
instructions to describe constants. This shouldn't be limiting at all,
as constant-valued locations aren't tied to specific program locations
in the same way register locations are.

> > That's effectively modifying a machine location from being a {v,}reg,
> > into being a "defining instruction" and operand. This is closer to the
> > LLVM-IR form of a machine location, where the SSA Value and its
> > computation are synonymous. Exactly how this is represented in-memory
> > and in-printed-MIR I haven't thought a lot about; probably by
> > attaching metadata to instructions and having DBG_VALUE use a metadata
> > operand rather than referring to a vreg. Specifying machine locations
> > like this would have the following benefits:
> > * Both DBG_VALUEs and defining instructions are independent and can
> > be moved within the function without loss of information, and without
> > needing to consider so much context,
>
> What is the difference between attaching the DBG_VALUE to the instruction and moving the DBG_VALUE together with the preceding non-debug instruction?

The latter will change the program location at which the variable
assignment takes place, while the former does not. Whenever DBG_VALUEs
are moved, we have to consider how the move changes the lifetime of
the variable location, but under the proposed model we wouldn't have
to do this at all: DBG_INSTR_REFs would never be forced to move.

I think the reproducer in PR44117 illustrates the generalised problem.
The computation of "floogie" is sunk from the entry block to the end
block, we currently chose to sink the DBG_VALUE for "badgers" with it
-- but incorrectly. The end block was not dominated by either
assignment to "badgers" and the variable should be reported "optimised
out", however sinking a DBG_VALUE into that block alters variable
lifetimes by making (part of) it dominated by one assignment.
Identifying this problem when the movement happens means running a
dominance query of what instructions are dominated by which
DBG_VALUEs, which is expensive; it might even require dataflow
knowledge if loops are present (I'm unsure if this is true).

Referring to the instruction with a DBG_INSTR_REF is effectively
deferring this analysis until LiveDebugValues, and avoiding creating
additional artefacts (DBG_VALUE $noregs) along the way.

(I know "assignment" isn't agreed nomenclature, but for all intensive
purposes I believe that is how DBG_VALUEs are interpreted, a variable
location at an instruction is defined by the most recent DBG_VALUE
that dominates it).

> What do you do with code like this:
>
> int a = x;
> int b = 23;
> ...
> b = a;
>
> mov rax, %x
> DBG_VALUE rax, "a"
> DBG_VALUE 23, "b"
> ...
> DBG_VALUE rax, "b"
>
> where the "defining instruction" is far away from the DBG_VALUE?

Essentially, leave the third DBG_VALUE unresolved and referring to the
'mov' until we reach LiveDebugValues; then follow the value the mov
writes to rax through any moves and spills (as LiveDebugValues does
today). If we can guarantee a location for the value produced by the
mov at the third DBG_VALUE, then we have a register location for that
DBG_VALUE. If we can't, it's interpreted as a DBG_VALUE $noreg,
because the value it's referring to has been optimised out at the
program location where the DBG_VALUE lies.

Thanks for the feedback!

--
Thanks,
Jeremy