<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Sep 6, 2017 at 2:01 PM Reid Kleckner <<a href="mailto:rnk@google.com">rnk@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span class="m_-1110557353203050538gmail-"><div class="gmail_quote"><div dir="ltr">On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>LLVM SSA values obviously do not have an address that we can take and they don’t live in registers, so neither the default memory location model nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically repurpose DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value *is* the variable’s value, and if the expression lacks DW_OP_stack_value, it must be a the address of the value. However, that is backwards incompatible and it seems like quite a stretch.<br></div></div></blockquote><div><br></div></div></span></div></blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div class="gmail_quote">Seems like a stretch in what sense? The backwards incompatibility is certainly something to consider (though we went through that with DW_OP_bit_piece too), but this seems like the design I'd go to first so I'd like to better understand why it's not the path forward if there's some more detail about that aspect of the design choice here.<br><br>I guess you described this already, but talking it through for myself/maybe others will find this useful:<br><br>So since we don't have DW_OP_regN for LLVM registers, we could sort of assume the implicit first value on the stack is a pseudo-OP_regN of the LLVM SSA register.<br></div></div></div></blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"></div></blockquote><div><br></div><div>Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the dbg.value and DBG_VALUE instructions are a register-like value that gets pushed onto the expression stack. The DWARF asmprinter does some expression folding to clean things up, but that's the model.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div class="gmail_quote">To support that, all existing uses would need no changes to match the DWARF model of registers being implicitly direct values.<br><br>Code that wanted to describe the register as containing the memory address of the interesting thing would use DW_OP_stack_value to say "this location description that is a register is really an address you should follow to find the value, not a direct value itself"?<br><br>But code that wanted to describe a variable as being 3 bytes ahead of a pointer in an LLVM SSA register would only have "plus 3" in the expression stack, since then it's no longer a direct value but is treated as a pointer to the value. I guess this is where the ambiguity would come in - currently how does "plus 3" get interpreted when seen in LLVM IR, I guess that's meant to describe reg value + 3 as being the immediate value of the variable? (so it's implicitly OP_stack_value? & OP_stack_value is added somewhere in the DWARF backend?)<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Our model today is inconsistent.</div></div></div></div></blockquote><div><br>Inconsistent between what and what? LLVM and DWARF? Yeah, I guess there's some mismatch between the semantics, though I'm still having trouble wrapping my head around it.<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> In LLVM IR today, the SSA value of the dbg.value *is* the interesting value, it is not the address, and we typically use empty DIExpressions. If the value is ultimately register allocated and the DIExpression is empty, we will emit a DW_OP_regN location expression. If the value is spilled, we usually don't need to append DW_OP_stack_value because the location is now a memory location, which can be described by DW_OP_[f]breg.</div><div><br></div><div>Today, passes that want to add "plus 3" to a DIExpression go out of their way to add DW_OP_stack_value to the DIExpression because the backend won't do it for us, even though dbg.value normally describes the value, not an address.</div><div><br></div><div>To explore the alternative DW_OP_stack_value model, here's how I'd go about it:</div><div>1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the semantic change clear. It can express both an address or a value, depending on the DIExpression.</div><div>2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value to the DIExpression argument of the intrinsic.</div><div>3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression alone. The LHS of llvm.dbg.declare is already the address of the variable.</div><div>4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect DBG_VALUES are now expressed with a DIExpression that lacks DW_OP_stack_value at the end.</div><div>5. Teach our DWARF expression emitter to combine the new expressions as necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs with physical register operands. They just use DW_OP_regN, which is implicitly a value location.</div><div>6. Teach all passes that spill virtual registers used by DBG_VALUE to remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as appropriate.</div><div><br></div><div>This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF location expression semantics, but it has a large migration cost.</div><div><br></div><div>---</div><div><br></div><div>I think part of the reason I wanted to move in the DW_OP_LLVM_memory direction is that I originally wanted to add a memory offset operand to it. Our actual use cases for complex DWARF expressions typically come from things like safestack, ASan, and blocks. What these all have in common is that they gather up a number of variables and move them off into a struct in heap memory. This is very similar to what happens when we spill a virtual register: instead of describing a register, we modify the expression to load the value from some FP register with an offset. I think the right representation for these transforms is basically a "chain of loads".</div></div></div></div></blockquote><div><br>Don't think I've got any mental model of what you mean by this phrase ('chain of loads') - could you provide an example or the like?<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> I was imagining that DW_OP_LLVM_memory with an offset would be that load chain link.</div><div><br></div><div>The idea behind this representation is that it should make it easy for spilling transforms to prepend a load chain onto the expression, rather than them having to individually discover if DW_OP_deref is needed, or call some common helper like DIExpression::prepend. It should always be valid to push on a new load with an offset.</div></div></div></div></blockquote><div><br>When would that not be valid today/without LLVM_memory? Sorry, again - it's all a bit fuzzy in my head.<br><br>There'd be some canonicalization opportunities, but not seeing the correctness issues with being able to prepend onto the location list -  seems like that might be true with LLVM_memory too... maybe?<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>It also has the advantage that it will be easier to translate to CodeView than arbitrary DWARF expressions, which we are currently canonicalizing into a load chain and then attempting to emit.<br></div></div></div></div></blockquote><div><br>*nod* my worry is ending up with 3 different representations - DWARF, CodeView, and the increasingly divergent IRDWARF (especially since it's "sort of like DWARF" which makes the few divergences more costly/difficult).<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>Does that make sense? I'm starting to feel like I should either pursue the more ambitious load chain design, </div></div></div></div></blockquote><div><br>What would that look like?<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>or consistently apply DW_OP_stack_value to llvm.dbg.loc (alternative names welcome).<br></div></div></div></div></blockquote><div><br>Would have to think some more - maybe there's a way to avoid the rename? But yeah, don't have a problem with llvm.dbg.loc - as you say/implied, it'd match the new semantics better.<br><br>But really, your original proposal's probably OK/close enough to go ahead. I don't feel that strongly.<br><br>- Dave<br> </div></div></div>