[PATCH] D26769: [IR] Remove the DIExpression field from DIGlobalVariable.

Fri Nov 18 16:21:44 PST 2016

On Fri, Nov 18, 2016 at 3:49 PM Adrian Prantl <aprantl at apple.com> wrote:

>
> > On Nov 18, 2016, at 1:33 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
> <snip>
>

Thanks - this was getting a bit long.

>
> >>
> >>>>
> >>>> I don't quite understand this concern. Couldn't any transformation
> simply ignore the LLVM_OP_piece at the end of the DIExpression since it has
> no effect?
> >>>
> >>> That's roughly my concern - if we have to teach things to ignore it,
> and otherwise special case it (when producing DWARF later)), why build it
> into the DIExpression byte stream in general rather than pulling it out to
> a separate metadata operand, etc? It doesn't seem to fit there.
> >>
> >> I think you might be overstating how much work it is to "teach" a
> transformation to ignore it. All (current) transformations that rewrite
> DIExpressions (with the exception of the ones creating DW_OP_pieces) work
> by concatenating a new expression with the existing expression, e.g., by
> putting a DW_OP_deref at the beginning of the expression.
> >
> > Fair point - inserting things into the stack just after the implied raw
> element's basically free/orthogonal to the rest of the expression.
> >
> >> The DIExpression -> DWARF Expression compiler has to special-case many
> operations already because it wants to find the most efficient encoding for
> the expression. This involves things like combining the location with the
> first operation (i.e., merging [reg], DW_OP_deref -> DW_OP_breg) and
> rewriting DW_OP_constu into, e.g, DW_OP_const1u. So I don't think it
> complicates the expression compiler at all; it needs to handle it anyway,
> and it needs to handle it at the end of the expression.
> >
> > It seems the handling needs to be pretty different - since it'll be
> wanting to look across all the DIExpressions for the fragments and finding
> the holes to create those, etc.
> >
> > It just seems not really part of the expression - at best silently
> ignored in some cases. But perhaps I'm really not appreciating the current
> emission code.
> >
> > In fact from an API perspective it looks like it's embedded as a fairly
> first-level concept (looking at DebugLocEntry::finalize and it calling
> DIExpression::getBitPiece{Offset,Size}) - sort of supports the idea that
> this doesn't fit into a generic piece of an expression, but a fairly
> special case thing.
> >
>
> Okay, I thought about this :-)
>
> I think we need introduce a new DIExpression node format that has
> the correct semantics for DW_OP_bit_piece and additional support
> for (at least) also storing the offset into the source
> variable. Below are two ideas for how we could do this.
>

*nod*

> Regardless of which implementation we choose we will also need to
> introduce a new bitcode record for DIExpressions with the
> corrected semantics in order to safely upgrade existing bitcode.
>

Ah, good point.

> First let's revisit what DIExpression looks like:
>
> // Slightly abbreviated in-memory representation.
> class DIExpression {
>   std::vector<uint64_t> Elements;
> }
>
> IR Syntax for the last 2 bytes of a 6-byte variable:
> !DIExpression(..., DW_OP_bit_piece, 16, 32)
>

Thanks for refreshing/describing the current state - I immediately realized
one benefit of encoding it as a custom op as you were suggesting: it would
be free when not in use, rather than showing up as mandatory fields.
(though the bitcode format is light enough that a couple of zeros would be
very few bits, etc, I suppose)

anyway...

> Proposal A
> ==========
>
> We could add two extra fields to DIExpression to explicitly store
> the size of the fragment this expression describes, together with
> offset of the fragment inside the source variable. Use of
> DW_OP_(bit_)piece is explicitly disallowed in this variant.
>
> // In-memory.
> class DIExpression {
>   std::vector<uint64_t> Elements;
>   uint64_t FragmentBitOffset = 0;
>   optional<uint64_t> FragmentBitSize;
> }
>
> IR Syntax for the last 2 bytes of a 6-byte variable:
> !DIExpression(..., fragmentBitSize: 16, fragmentBitOffset: 32)
>

Technically DW_OP_bit_piece could still be necessary/useful in this format
- for specifying if the piece is not at the start of the origin. (eg: the
high nibble or byte in a word-sized register). And then it would conflict
with the desire to put a DW_OP_bit_piece at the top.

Oh, I guess that ^ isn't true: I suppose the high word could be done
without DW_OP_bit_piece using bit twiddling with other ops? (DW_OP_shr,
etc) & we could optimize them together if desired. Same goes for Proposal
B, I think.

> Proposal B
> ==========
>
> We could allow expressions with DW_OP_(bit_)piece and only store
> the fragment offset. As shown in the second IR example, this
> allows us to express things that would otherwise need masking and
> shifting:
>

Right, as you were saying here ^

> // In-memory.
> class DIExpression {
>   std::vector<uint64_t> Elements;
>   uint64_t FragmentBitOffset = 0;
> }
>
> IR Syntax for the last 2 bytes of a 6-byte variable:
> !DIExpression(..., DW_OP_piece, 2, fragmentBitOffset: 32)
>
> IR Syntax for bits 8-11 of a variable stored in the upper nibble of a
> location.
> !DIExpression(DW_OP_bit_piece, 4, 4, fragmentBitOffset: 8)
>
>
> Proposal B doesn't meet your criteria for expression
> transformations to not need to know about DW_OP_bit_piece, but at
> least from my point of view that shouldn't cause a lot of trouble
> in practice.
>

*nod* your point that most of the transformations go in at the bottom of
the stack, rather than the top, makes sense.

Though at some point we might want to do some optimization of those
expressions on the fly (rather than pushing more things into the bottom of
the stack, never simplifying the expression until we get to dwarf
generation). At that point there may be value in having consistency in
expression format (that's what IR is about anyway - normalization), so
having a special case when a fragment is at the top level of the expression
rather than in some intermediate level may not be ideal as an IR
representation.

& then we can collapse the normalized representation into the
OP_(bit_)piece representation when doing code generation - along with the
other optimizations of the expression you referenced previously in the
thread.

But I'm not intimately familiar with this stuff, so wouldn't mind some
other perspectives, or just to defer to your best judgement/as you see fit
here.

>
> What do you think?
> -- adrian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161119/95b846b2/attachment.html>