<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Fri, Nov 18, 2016 at 3:49 PM Adrian Prantl <<a href="mailto:aprantl@apple.com">aprantl@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

> On Nov 18, 2016, at 1:33 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com" class="gmail_msg" target="_blank">dblaikie@gmail.com</a>> wrote:<br class="gmail_msg">

<br class="gmail_msg">

<snip><br class="gmail_msg"></blockquote><div><br></div><div>Thanks - this was getting a bit long.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br class="gmail_msg">

>><br class="gmail_msg">

>>>><br class="gmail_msg">

>>>> I don't quite understand this concern. Couldn't any transformation simply ignore the LLVM_OP_piece at the end of the DIExpression since it has no effect?<br class="gmail_msg">

>>><br class="gmail_msg">

>>> That's roughly my concern - if we have to teach things to ignore it, and otherwise special case it (when producing DWARF later)), why build it into the DIExpression byte stream in general rather than pulling it out to a separate metadata operand, etc? It doesn't seem to fit there.<br class="gmail_msg">

>><br class="gmail_msg">

>> I think you might be overstating how much work it is to "teach" a transformation to ignore it. All (current) transformations that rewrite DIExpressions (with the exception of the ones creating DW_OP_pieces) work by concatenating a new expression with the existing expression, e.g., by putting a DW_OP_deref at the beginning of the expression.<br class="gmail_msg">

><br class="gmail_msg">

> Fair point - inserting things into the stack just after the implied raw element's basically free/orthogonal to the rest of the expression.<br class="gmail_msg">

><br class="gmail_msg">

>> The DIExpression -> DWARF Expression compiler has to special-case many operations already because it wants to find the most efficient encoding for the expression. This involves things like combining the location with the first operation (i.e., merging [reg], DW_OP_deref -> DW_OP_breg) and rewriting DW_OP_constu into, e.g, DW_OP_const1u. So I don't think it complicates the expression compiler at all; it needs to handle it anyway, and it needs to handle it at the end of the expression.<br class="gmail_msg">

><br class="gmail_msg">

> It seems the handling needs to be pretty different - since it'll be wanting to look across all the DIExpressions for the fragments and finding the holes to create those, etc.<br class="gmail_msg">

><br class="gmail_msg">

> It just seems not really part of the expression - at best silently ignored in some cases. But perhaps I'm really not appreciating the current emission code.<br class="gmail_msg">

><br class="gmail_msg">

> In fact from an API perspective it looks like it's embedded as a fairly first-level concept (looking at DebugLocEntry::finalize and it calling DIExpression::getBitPiece{Offset,Size}) - sort of supports the idea that this doesn't fit into a generic piece of an expression, but a fairly special case thing.<br class="gmail_msg">

><br class="gmail_msg">

<br class="gmail_msg">

Okay, I thought about this :-)<br class="gmail_msg">

<br class="gmail_msg">

I think we need introduce a new DIExpression node format that has<br class="gmail_msg">

the correct semantics for DW_OP_bit_piece and additional support<br class="gmail_msg">

for (at least) also storing the offset into the source<br class="gmail_msg">

variable. Below are two ideas for how we could do this.<br class="gmail_msg"></blockquote><div><br></div><div>*nod*</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Regardless of which implementation we choose we will also need to<br class="gmail_msg">

introduce a new bitcode record for DIExpressions with the<br class="gmail_msg">

corrected semantics in order to safely upgrade existing bitcode.<br class="gmail_msg"></blockquote><div><br></div><div>Ah, good point.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">First let's revisit what DIExpression looks like:<br class="gmail_msg">

<br class="gmail_msg">

// Slightly abbreviated in-memory representation.<br class="gmail_msg">

class DIExpression {<br class="gmail_msg">

  std::vector<uint64_t> Elements;<br class="gmail_msg">

}<br class="gmail_msg">

<br class="gmail_msg">

IR Syntax for the last 2 bytes of a 6-byte variable:<br class="gmail_msg">

!DIExpression(..., DW_OP_bit_piece, 16, 32)<br class="gmail_msg"></blockquote><div><br></div><div>Thanks for refreshing/describing the current state - I immediately realized one benefit of encoding it as a custom op as you were suggesting: it would be free when not in use, rather than showing up as mandatory fields. (though the bitcode format is light enough that a couple of zeros would be very few bits, etc, I suppose)<br><br>anyway...</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Proposal A<br class="gmail_msg">

==========<br class="gmail_msg">

<br class="gmail_msg">

We could add two extra fields to DIExpression to explicitly store<br class="gmail_msg">

the size of the fragment this expression describes, together with<br class="gmail_msg">

offset of the fragment inside the source variable. Use of<br class="gmail_msg">

DW_OP_(bit_)piece is explicitly disallowed in this variant.<br class="gmail_msg">

<br class="gmail_msg">

// In-memory.<br class="gmail_msg">

class DIExpression {<br class="gmail_msg">

  std::vector<uint64_t> Elements;<br class="gmail_msg">

  uint64_t FragmentBitOffset = 0;<br class="gmail_msg">

  optional<uint64_t> FragmentBitSize;<br class="gmail_msg">

}<br class="gmail_msg">

<br class="gmail_msg">

IR Syntax for the last 2 bytes of a 6-byte variable:<br class="gmail_msg">

!DIExpression(..., fragmentBitSize: 16, fragmentBitOffset: 32)<br class="gmail_msg"></blockquote><div><br></div><div>Technically DW_OP_bit_piece could still be necessary/useful in this format - for specifying if the piece is not at the start of the origin. (eg: the high nibble or byte in a word-sized register). And then it would conflict with the desire to put a DW_OP_bit_piece at the top.<br><br>Oh, I guess that ^ isn't true: I suppose the high word could be done without DW_OP_bit_piece using bit twiddling with other ops? (DW_OP_shr, etc) & we could optimize them together if desired. Same goes for Proposal B, I think.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Proposal B<br class="gmail_msg">

==========<br class="gmail_msg">

<br class="gmail_msg">

We could allow expressions with DW_OP_(bit_)piece and only store<br class="gmail_msg">

the fragment offset. As shown in the second IR example, this<br class="gmail_msg">

allows us to express things that would otherwise need masking and<br class="gmail_msg">

shifting:<br class="gmail_msg"></blockquote><div><br></div><div>Right, as you were saying here ^</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">// In-memory.<br class="gmail_msg">

class DIExpression {<br class="gmail_msg">

  std::vector<uint64_t> Elements;<br class="gmail_msg">

  uint64_t FragmentBitOffset = 0;<br class="gmail_msg">

}<br class="gmail_msg">

<br class="gmail_msg">

IR Syntax for the last 2 bytes of a 6-byte variable:<br class="gmail_msg">

!DIExpression(..., DW_OP_piece, 2, fragmentBitOffset: 32)<br class="gmail_msg">

<br class="gmail_msg">

IR Syntax for bits 8-11 of a variable stored in the upper nibble of a location.<br class="gmail_msg">

!DIExpression(DW_OP_bit_piece, 4, 4, fragmentBitOffset: 8)<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

Proposal B doesn't meet your criteria for expression<br class="gmail_msg">

transformations to not need to know about DW_OP_bit_piece, but at<br class="gmail_msg">

least from my point of view that shouldn't cause a lot of trouble<br class="gmail_msg">

in practice.<br class="gmail_msg"></blockquote><div><br></div><div>*nod* your point that most of the transformations go in at the bottom of the stack, rather than the top, makes sense.<br><br>Though at some point we might want to do some optimization of those expressions on the fly (rather than pushing more things into the bottom of the stack, never simplifying the expression until we get to dwarf generation). At that point there may be value in having consistency in expression format (that's what IR is about anyway - normalization), so having a special case when a fragment is at the top level of the expression rather than in some intermediate level may not be ideal as an IR representation.<br><br>& then we can collapse the normalized representation into the OP_(bit_)piece representation when doing code generation - along with the other optimizations of the expression you referenced previously in the thread.<br><br>But I'm not intimately familiar with this stuff, so wouldn't mind some other perspectives, or just to defer to your best judgement/as you see fit here.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br class="gmail_msg">

What do you think?<br class="gmail_msg">

-- adrian<br class="gmail_msg">

</blockquote></div></div>