[PATCH] D26769: [IR] Remove the DIExpression field from DIGlobalVariable.

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 18 13:33:51 PST 2016


On Fri, Nov 18, 2016 at 1:23 PM Adrian Prantl <aprantl at apple.com> wrote:

> On Nov 18, 2016, at 12:35 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Nov 18, 2016 at 12:10 PM Adrian Prantl <aprantl at apple.com> wrote:
>
> On Nov 18, 2016, at 11:45 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Nov 18, 2016 at 11:35 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> On Nov 18, 2016, at 11:09 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Nov 18, 2016 at 10:53 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> On Nov 18, 2016, at 10:36 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Nov 18, 2016 at 10:23 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> On Nov 18, 2016, at 10:10 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Nov 18, 2016 at 10:00 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> On Nov 18, 2016, at 9:51 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
> Would it be reasonable/useful to add another level of indirection while
> we're here - that would describe which part of the variable is being
> described by this location:
>
> DIGlobalVariableExpr(value: DW_OP_const 42) ->
> DIGlobalVariableFragment(offset: 32 bits) -> DIGlobalVariable(name: foo)
>
> (just guessing/pretending there's a DW_OP_const for constant values - I
> forget how it all works, but purely for demonstration purposes)
>
> That way code manipulating DIGlobalVariableExprs wouldn't need to have
> explicit knowledge of which part of the variable is being described here.
> (I suppose that could end up with us creating a bottom-up expression tree
> in metadata, which probably isn't ideal (too heavyweight))
>
>
> The nice thing about the Dwarf expressions being a stack-based
> post-fix-notated programming language is that it you can compose functions
> by simply concatenating them. The typical kind of manipulations that we do
> (for example, adding an "add offset", "dereference" operation) can done by
> simply putting the new expression in front of the old expression. I don't
> think it can get much simpler than that :-)
>
>
> Wouldn't the piece/bit_piece need to go on top level of the expression
> tree? I don't think you can simply concatenate things on top of it...
>
>
> The expression tree of a stack-based postfix language degenerates to a
> list. The works in the language can consume a different number of stack
> arguments, but under the assumption that one well-formed DIExpression takes
> and produces exactly one value (which is true for all non-constant
> expressions we support), it is safe to concatenate the expressions.
>
>
> But what does this mean to the DWARF consumer? All the examples in the
> spec seem to show DW_OP_(bit_)piece at the top level of the expression
> (once the expression stack is evaluated, it may have several
> DW_OP_(bit_)piece remaining on the stack - and they're all read, oldest to
> newest (bottom to top of the stack) as descriptions of portions of the
> variable)
>
>
> I don't think I understand what you're saying here. Can you give me a
> concrete example maybe?
>
>
> Taking the example from the DWARF4 spec:
>
> DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
>
> If we evaluate the stack, at the end of that evaluation the stack has two
> elements:
>
> DW_OP_piece(2) { DW_OP_reg10 }
> DW_OP_piece(4) { DW_OP_reg3 }
>
> And the DWARF consumer should (according to the description in the spec)
> read this as describing a variable who's bytes [0, 4) are in reg3 and bytes
> [4, 6) are in reg10.
>
>
> Yes that makes sense.
>
>
> Actually - I'm going to revisit that claim. I think, maybe, DW_OP_piece
> doesn't appear on the stack after it's evaluated. It has a side effect of
> producing part of a location description. I don't think it has any value -
> that another DW_OP could consume. (& I think it's necessary to think of it
> that way to handle this part of the spec:
>
> "If the location description is empty, the offset doesn’t matter and the
> DW_OP_bit_piece operation describes a piece consisting of the given number
> of bits whose values are undefined. "
>
>
> Yeah I see the DW_OP_(bit)piece operation as more of a side-channel
> operation that can be used to concatenate several independently evaluated
> expressions.
>
>
> So this is how you create holes. If, say, the two bytes of reg10 needed to
> be the high 2 bytes of the object, not the 3rd nibble:
>
> DW_OP_reg3 DW_OP_piece 4 DW_OP_piece 2 DW_OP_reg10 DW_OP_piece 2
>
> I believe the hole is added that way - and the stack is empty when the
> second DW_OP_piece is evaluated to create that hole.
>
>
> Yes, at least for this there exists an example.
>
>
> Ah, so there is!
>
>
> And I think with a lone DW_OP_bit_piece x 0 it is even possible to create
> a non-byte-divisible hole.
>
>
> Yep - sounds like that's how that would work
>
>
>
>
>
>
>
>
>
>
>
>
>
> Oh, also, DW_OP_bit_piece is ordered, judging by the examples in the spec:
>
> DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
>  A variable whose first four bytes reside in register 3 and whose next two
> bytes reside in register 10.
>
> So how do we know which DIGlobalVariableExpr goes first? Or if there are
> holes between exprs?
>
> (am I making sense? not sure - I can try to explain differently, or
> perhaps just not understanding)
>
>
> Please let me know if I'm missing the point. I think there may be some
> confusion about how multi-piece values are represented in LLVM Metadata.
>
> One DIExpression only ever represents one DW_OP_piece (with the
> DW_OP_piece always being the very last element of the expression). To
> represent a variable with multiple pieces, we use two DIExpression. For
> example (extracted from the split-global.ll testcase):
>
> ; struct { int x,y; } Point;
> @point.x = global i32 1, align 4, !dbg !12
> @point.y = global i32 2, align 4, !dbg !13
>
> !0 = distinct !DIGlobalVariable(name: "point", scope: !1, file: !2, line:
> 1, type: !5, isLocal: false, isDefinition: true)
> !12 = !DIGlobalVariableExpression(var: !0,  expr:
> !DIExpression(DW_OP_bit_piece,  0, 32))
> !13 = !DIGlobalVariableExpression(var: !0,  expr:
> !DIExpression(DW_OP_bit_piece, 32, 32))
>
>
> I think this DWARF is wrong/doesn't make sense. DW_OP_bit_piece's two
> operands refer to the offset within the previous value on the stack (in
> this case, the previous/initial value on the stack is implied as the
> location of the respective globals, @point.x and @point.y). So !13 is
> invalid DWARF - it says "this piece starts 32 bits into a 32 bit region and
> is 32 bits long" - which is entirely outside the region.
>
>
> This is not yet DWARF, this is our internal representation in LLVM IR. A
> (non-constant)* DIExpression has an implicit first element that is the
> location of whatever is referring to the DIExpression. It could be a
> GlobalVariable, or a machine value tied to the DIExpression by means of a
> DBG_VALUE instruction.
>
> When we generate a DWARF expression from this we generate the implicit
> element first, which will turn into a DW_OP_(b)reg/addr/..., followed by
> the contents of the DIExpression
>
>
> Yep yep - the wrongness I was referring to wasn't in that implicit first
> argument, etc. Will describe more further down.
>
>
>
> *) constants are a little more special and for the sake of this discussion
> I'd like to ignore their them because they complicate our current model. I
> have a plan for generalizing our expression model to support an arbitrary
> number of locations (including zero), but let's leave that for a separate
> thread.
>
>
> "The DW_OP_bit_piece operation takes two operands. The first is an
> unsigned LEB128 number that gives the size in bits of the piece. The second
> is an unsigned LEB128 number that gives the offset in bits from the
> location defined by the preceding DWARF location description. "
>
> The DWARF expression to describe the location of that variable split in
> two would be:
>
> DW_OP_addr <addr of @point.x> DW_OP_piece 4 DW_OP_addr <addr of @point.y>
> DW_OP_piece 4
>
> Or, if you wanted to use OP_bit_piece, they would each be "DW_OP_bit_piece
> 32 0".
>
>
>
>
>
> The DWARF backend collects all the DIGlobalVariableExpressions and their
> respective GlobalVariables and translates this into a single location:
>
> DW_AT_location ([0x0000000000000000], piece 0x00000004,
> [0x0000000000000004], bit-piece 32 32)
>                             @point.x, !12, @point.y, !13.
>
>
> I believe this DWARF describes a location consisting of the range
> ([0x0000, 0x0004), [0x0008, 0x00012)) which is probably not intended.
>
>
> Oh, sorry! The human-readable output of Darwin dwarfdump is very
> misleading here: [0x123] is indeed a shorthand for DW_OP_addr 0x123. See
> also the corresponding CHECK in in split-global.ll in the review.
>
>
> That wasn't my source of confusion - I assumed that was the representation
> we were talking about.
>
> So, after evaluating this location expression, the stack would contain:
>
> DW_OP_bit_piece (32, 32) { DW_OP_addr 0x0004 }
> DW_OP_piece (4) { DW_OP_addr 0x0000 }
>
> And I believe this describes a variable who's bytes [0, 4) start at
> address 0x0000 and who's bytes [4, 8*) start at address (0x0008**). I don't
> think this is what you intended.
>
> * this is 4 bytes because the first operand to DW_OP_bit_piece is 4 bytes:
> "The first is an unsigned LEB128 number that gives the size in bits of the
> piece. "
> * this is 0x0008 because it's 0x0004 bytes + 4 bytes because the second
> operand to DW_OP_bit_piece is 4 bytes and: "The second is an unsigned
> LEB128 number that gives the offset in bits from the location defined by
> the preceding DWARF location description. "
>
>
>
>
> The example in 2.6.1.3:
>
> DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
>   A variable whose first four bytes reside in register 3 and whose next
> two bytes reside in register 10.
>
> The pieces are considered to fill the variable left to right.
>
>
> Yes.
>
>
> & the wording for op_bit_piece says that the offset refers to the location
> previous in the stack - not about where that fragment goes in the resulting
> variable.
>
>
> I don't follow. The previous value on the stack is the DW_OP_addr.
>
>
> Yes - so the offset is an offset of that address. It says "look 4 bytes
> past the address" not "put this 4 bytes into the outer variable
> description".
>
> To break down the language further:
>
> "Each composition operation is immediately preceded by a simple location
> description which describes the location where part of the resultant value
> is contained." - so the "preceeding location description" is the thing on
> the stack before the DW_OP_(bit_)piece is evaluated. The DW_OP_addr in your
> examples and DW_OP_regX in the spec example.
>
> "The second is an unsigned LEB128 number that gives the offset in bits
> from the location defined by the preceding DWARF location description." -
> this is saying the offset is an offset within that preceeding location
> description, specifically:
>
> "If the location is a memory address, the DW_OP_bit_piece operation
> describes a sequence of bits relative to the location whose address is on
> the top of the DWARF stack using the bit numbering and direction
> conventions that are appropriate to the current language on the target
> system. "
>
>
> Huh. Thanks for pointing this out!
>
>
> Sure thing - glad we were able to figure it out! :)
>
>
> So far I always thought of DW_OP_(bit_)piece as interchangeable,
>
>
> Not sure I understand this statement ^ - they are pretty much
> interchangeable... but, yes, perhaps in a different way.
>
>
> with DW_OP_piece being a more efficient encoding and the bit_offset being
> the offset in the source variable rather than in the location.
>
>
> *nod* sorry for the confuison, wish I coudl've distilled that down better
> sooner.
>
>
>
>
> So what DW_OP_bit_piece(32, 32) says is "there's 32 bits of stuff, 32 bits
> from where the previous expression on the stack points  to" - or, in your
> example, it describes the range [0x0008, 0x0012).
>
> I believe you were trying to describe a variable that consisted of
> ([0x0000, 0x0004), [0x0004, 0x0008)) - is that correct?
>
> I believe the correct DWARF description for that would be:
> DW_OP_addr(0x0000) DW_OP_piece(4) DW_OP_addr(0x0004) DW_OP_piece(4)
>
>
> I think you're right. It's really unfortunate that there are no examples
> of the correct usage of DW_OP_bit_piece in Appendix D.
>
>
> Indeed - an example would be dandy (possibly even one of all the different
> ways DW_OP_bit_piece can behave depending on the value preceeding it on the
> stack (including the absence of a value))
>
>
> Looks like I will need to fix both LLVM's and LLDB's interpretation of
> DW_OP_bit_piece.
>
>
> Ah! :/ that's going to be tricky, I imagine. (how LLDB will know which way
> to interpret bit_piece, etc)
>
>
> I actually have a very narrow time window right now where this transition
> will be really easy (at least for Darwin). We're about to switch clang to
> emit DWARF 4 on Darwin (that change for this is already in trunk) and we
> could use the old interpretation if (Darwin && Dwarf < 4).
>
>
> Ah, handy! :) FreeBSD might be a bit awkward - but that they didn't notice
> this difference in description when they switched debuggers/compilers
> (unless they switched them together?), not sure it comes up enough to
> affect usability too often.
>
>
>
>
>
> With that cleared up I now also get the concern about using
> DW_OP_bit_piece in DIExpression. Yes, we should at least replace this with
> an LLVM-specific operator that has the semantics we need (which is an
> offset into the source variable). I still think it can/should be part of
> the DIExpression though. Straw man proposal: {LLVM_OP_var_piece, size,
> ofs}. I'll create a separate review for that.
>
>
> I still have the original concern that DW_OP_piece must appear at the top
> of the stack (it doesn't produce a value that can be consumed by any other
> operation) - and thus any transformation that wants to touch a DIExpression
> would need to walk through the OP_piece (or LLVM_OP_var_piece) to get to
> the value to manipulate.
>
>
> I don't quite understand this concern. Couldn't any transformation simply
> ignore the LLVM_OP_piece at the end of the DIExpression since it has no
> effect?
>
>
> That's roughly my concern - if we have to teach things to ignore it, and
> otherwise special case it (when producing DWARF later)), why build it into
> the DIExpression byte stream in general rather than pulling it out to a
> separate metadata operand, etc? It doesn't seem to fit there.
>
>
> I think you might be overstating how much work it is to "teach" a
> transformation to ignore it. All (current) transformations that rewrite
> DIExpressions (with the exception of the ones creating DW_OP_pieces) work
> by concatenating a new expression with the existing expression, e.g., by
> putting a DW_OP_deref at the beginning of the expression.
>

Fair point - inserting things into the stack just after the implied raw
element's basically free/orthogonal to the rest of the expression.


> The DIExpression -> DWARF Expression compiler has to special-case many
> operations already because it wants to find the most efficient encoding for
> the expression. This involves things like combining the location with the
> first operation (i.e., merging [reg], DW_OP_deref -> DW_OP_breg) and
> rewriting DW_OP_constu into, e.g, DW_OP_const1u. So I don't think it
> complicates the expression compiler at all; it needs to handle it anyway,
> and it needs to handle it at the end of the expression.
>

It seems the handling needs to be pretty different - since it'll be wanting
to look across all the DIExpressions for the fragments and finding the
holes to create those, etc.

It just seems not really part of the expression - at best silently ignored
in some cases. But perhaps I'm really not appreciating the current emission
code.

In fact from an API perspective it looks like it's embedded as a fairly
first-level concept (looking at DebugLocEntry::finalize and it calling
DIExpression::getBitPiece{Offset,Size}) - sort of supports the idea that
this doesn't fit into a generic piece of an expression, but a fairly
special case thing.

- David


>
> -- adrian
>
>
>
>
> It breaks the orthogonality of concerns - a transformation that moves a
> value from a register to memory doesn't need to know how that piece of data
> is being referred to (whether it's a sub-portion of a variable, etc), just
> that it moved. (perhaps that's not a good example - what's a good example
> where we want to manipulate an existing location description instead of
> rewriting it from scratch?
>
>
> This is not a good example because we need to keep locations out of the
> DIExpressions. Locations are always part of the proper IR and DIExpressions
> are metadata (and thus can't/shouldn't point back to IR). They are only
> tied together by dbg.value instrinsics or !dbg attachments that are part of
> the instruction stream.
>
> I suppose even rewriting it from scratch the OP_piece would get in the way
> - "rewrite it, but preserve the OP_piece" seems awkward when this entity is
> just meant to represent this specific location & shouldn't care about
> whetehr it's a fragment of something larger)
>
>
> At the moment we only do three transformations:
>
> 1. Split one variable (piece) up into multiple pieces. We have API in
> DIExpression to make this simple.
> 2. Put a DW_OP_deref in front of an expression.
> 3. Put a DW_OP_plus DW_OP_constu <ofs> DW_OP_deref in front of an
> expression.
>
> For these examples I don't think it's particularly awkward. In the future
> we might want to generate more complex expressions for loop induction
> variables, but that would likely be creating fresh DIExpressions rather
> than modifying existing ones.
>
> I would like to revisit this if this becomes and issue in the future. At
> this point I don't think it's necessary to create something special for
> pieces (other than perhaps renaming it to avoid semantics confusion).
>
> -- adrian
>
>
>
>
> thanks!!
>
>
> Thank you for sticking with me - I know it was a bit confusing for both of
> us!
>
> - David
>
>
>
> -- adrian
>
>
> - Dave
>
>
>
>
> "The DW_OP_bit_piece operation takes two operands. The first is an
> unsigned LEB128 number that gives the size in bits of the piece. The second
> is an unsigned LEB128 number that gives the offset in bits from the
> location defined by the preceding DWARF location description. "
>
>
> What may also contribute to confusion is that DWARF expression operands'
> operands are different from DWARF expression stack values. Operands (such
> as the two numbers following DW_OP_bit_piece) are part of the DWARF
> expression itself and not pushed onto the stack.
>
> -- adrian
>
>
>
>
> does that make sense?
>
> -- adrian
>
>
>
>
>
> --- adrian
>
>
> Perhaps keeping the bit_piece parts out of the DWARF expression bytes and
> in a separate field of the DIGlobalVariableExpr?
>
> DIGlobalVariableExpr(offset: 32, value: DW_OP_const 42)
>
> Then at least when a variable moves around (does that happen? maybe? - I
> suppose something like asan, smooshing everything into a single alloca
> would move something around without splitting it up) the expression can be
> updated relatively simply without unpacking through a DW_OP_bit_piece, etc?
>
> Maybe this doesn't matter - Ih aven't looked at much of the
> expression/location handling code.
>
> On Fri, Nov 18, 2016 at 9:32 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> aprantl removed rL LLVM as the repository for this revision.
> aprantl updated this revision to Diff 78544.
> aprantl added a comment.
>
> Improved type-safety by adding a DIGlobalVarExpr pointer union.
>
>
> https://reviews.llvm.org/D26769
>
> Files:
>   include/llvm/Bitcode/LLVMBitCodes.h
>   include/llvm/IR/DIBuilder.h
>   include/llvm/IR/DebugInfo.h
>   include/llvm/IR/DebugInfoMetadata.h
>   include/llvm/IR/GlobalVariable.h
>   include/llvm/IR/Metadata.def
>   lib/Analysis/ModuleDebugInfoPrinter.cpp
>   lib/AsmParser/LLParser.cpp
>   lib/Bitcode/Reader/BitcodeReader.cpp
>   lib/Bitcode/Writer/BitcodeWriter.cpp
>   lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
>   lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
>   lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
>   lib/CodeGen/AsmPrinter/DwarfDebug.cpp
>   lib/IR/AsmWriter.cpp
>   lib/IR/DIBuilder.cpp
>   lib/IR/DebugInfo.cpp
>   lib/IR/DebugInfoMetadata.cpp
>   lib/IR/LLVMContextImpl.h
>   lib/IR/Metadata.cpp
>   lib/IR/Verifier.cpp
>   lib/Transforms/IPO/StripSymbols.cpp
>   lib/Transforms/Instrumentation/AddressSanitizer.cpp
>   test/Assembler/diglobalvariable.ll
>   test/Assembler/diglobalvariableexpression.ll
>   test/Bitcode/DIGlobalVariableExpr.ll
>   test/Bitcode/DIGlobalVariableExpr.ll.bc
>   test/Bitcode/diglobalvariable-3.8.ll
>   test/Bitcode/diglobalvariable-3.8.ll.bc
>   test/DebugInfo/X86/multiple-at-const-val.ll
>   test/DebugInfo/X86/pr12831.ll
>   test/DebugInfo/X86/split-global.ll
>   test/DebugInfo/X86/stack-value-dwarf4.ll
>   test/DebugInfo/X86/unattached-global.ll
>   test/Transforms/GlobalMerge/debug-info.ll
>   unittests/IR/MetadataTest.cpp
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161118/2d6eeaaf/attachment.html>


More information about the llvm-commits mailing list