[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Wed Sep 6 17:01:38 PDT 2017

On Wed, Sep 6, 2017 at 2:01 PM Reid Kleckner <rnk at google.com> wrote:

> On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>> On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> LLVM SSA values obviously do not have an address that we can take and
>>> they don’t live in registers, so neither the default memory location model
>>> nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically
>>> repurpose DW_OP_stack_value to indicate that the SSA value passed to
>>> llvm.dbg.value *is* the variable’s value, and if the expression lacks
>>> DW_OP_stack_value, it must be a the address of the value. However, that is
>>> backwards incompatible and it seems like quite a stretch.
>>>
>>
>> Seems like a stretch in what sense? The backwards incompatibility is
>> certainly something to consider (though we went through that with
>> DW_OP_bit_piece too), but this seems like the design I'd go to first so I'd
>> like to better understand why it's not the path forward if there's some
>> more detail about that aspect of the design choice here.
>>
>> I guess you described this already, but talking it through for
>> myself/maybe others will find this useful:
>>
>> So since we don't have DW_OP_regN for LLVM registers, we could sort of
>> assume the implicit first value on the stack is a pseudo-OP_regN of the
>> LLVM SSA register.
>>
>
> Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
> dbg.value and DBG_VALUE instructions are a register-like value that gets
> pushed onto the expression stack. The DWARF asmprinter does some expression
> folding to clean things up, but that's the model.
>
>
>> To support that, all existing uses would need no changes to match the
>> DWARF model of registers being implicitly direct values.
>>
>> Code that wanted to describe the register as containing the memory
>> address of the interesting thing would use DW_OP_stack_value to say "this
>> location description that is a register is really an address you should
>> follow to find the value, not a direct value itself"?
>>
>> But code that wanted to describe a variable as being 3 bytes ahead of a
>> pointer in an LLVM SSA register would only have "plus 3" in the expression
>> stack, since then it's no longer a direct value but is treated as a pointer
>> to the value. I guess this is where the ambiguity would come in - currently
>> how does "plus 3" get interpreted when seen in LLVM IR, I guess that's
>> meant to describe reg value + 3 as being the immediate value of the
>> variable? (so it's implicitly OP_stack_value? & OP_stack_value is added
>> somewhere in the DWARF backend?)
>>
>
> Our model today is inconsistent.
>

Inconsistent between what and what? LLVM and DWARF? Yeah, I guess there's
some mismatch between the semantics, though I'm still having trouble
wrapping my head around it.

> In LLVM IR today, the SSA value of the dbg.value *is* the interesting
> value, it is not the address, and we typically use empty DIExpressions. If
> the value is ultimately register allocated and the DIExpression is empty,
> we will emit a DW_OP_regN location expression. If the value is spilled, we
> usually don't need to append DW_OP_stack_value because the location is now
> a memory location, which can be described by DW_OP_[f]breg.
>
> Today, passes that want to add "plus 3" to a DIExpression go out of their
> way to add DW_OP_stack_value to the DIExpression because the backend won't
> do it for us, even though dbg.value normally describes the value, not an
> address.
>
> To explore the alternative DW_OP_stack_value model, here's how I'd go
> about it:
> 1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
> semantic change clear. It can express both an address or a value, depending
> on the DIExpression.
> 2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value
> to the DIExpression argument of the intrinsic.
> 3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression
> alone. The LHS of llvm.dbg.declare is already the address of the variable.
> 4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
> DBG_VALUES are now expressed with a DIExpression that lacks
> DW_OP_stack_value at the end.
> 5. Teach our DWARF expression emitter to combine the new expressions as
> necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
> with physical register operands. They just use DW_OP_regN, which is
> implicitly a value location.
> 6. Teach all passes that spill virtual registers used by DBG_VALUE to
> remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
> appropriate.
>
> This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF
> location expression semantics, but it has a large migration cost.
>
> ---
>
> I think part of the reason I wanted to move in the DW_OP_LLVM_memory
> direction is that I originally wanted to add a memory offset operand to it.
> Our actual use cases for complex DWARF expressions typically come from
> things like safestack, ASan, and blocks. What these all have in common is
> that they gather up a number of variables and move them off into a struct
> in heap memory. This is very similar to what happens when we spill a
> virtual register: instead of describing a register, we modify the
> expression to load the value from some FP register with an offset. I think
> the right representation for these transforms is basically a "chain of
> loads".
>

Don't think I've got any mental model of what you mean by this phrase
('chain of loads') - could you provide an example or the like?

> I was imagining that DW_OP_LLVM_memory with an offset would be that load
> chain link.
>
> The idea behind this representation is that it should make it easy for
> spilling transforms to prepend a load chain onto the expression, rather
> than them having to individually discover if DW_OP_deref is needed, or call
> some common helper like DIExpression::prepend. It should always be valid to
> push on a new load with an offset.
>

When would that not be valid today/without LLVM_memory? Sorry, again - it's
all a bit fuzzy in my head.

There'd be some canonicalization opportunities, but not seeing the
correctness issues with being able to prepend onto the location list -
 seems like that might be true with LLVM_memory too... maybe?

> It also has the advantage that it will be easier to translate to CodeView
> than arbitrary DWARF expressions, which we are currently canonicalizing
> into a load chain and then attempting to emit.
>

*nod* my worry is ending up with 3 different representations - DWARF,
CodeView, and the increasingly divergent IRDWARF (especially since it's
"sort of like DWARF" which makes the few divergences more costly/difficult).

> Does that make sense? I'm starting to feel like I should either pursue the
> more ambitious load chain design,
>

What would that look like?

> or consistently apply DW_OP_stack_value to llvm.dbg.loc (alternative names
> welcome).
>

Would have to think some more - maybe there's a way to avoid the rename?
But yeah, don't have a problem with llvm.dbg.loc - as you say/implied, it'd
match the new semantics better.

But really, your original proposal's probably OK/close enough to go ahead.
I don't feel that strongly.

- Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/8715c78d/attachment.html>