[PATCH] D26769: [IR] Remove the DIExpression field from DIGlobalVariable.

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 18 11:09:41 PST 2016


On Fri, Nov 18, 2016 at 10:53 AM Adrian Prantl <aprantl at apple.com> wrote:

> On Nov 18, 2016, at 10:36 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Nov 18, 2016 at 10:23 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> On Nov 18, 2016, at 10:10 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Nov 18, 2016 at 10:00 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> On Nov 18, 2016, at 9:51 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
> Would it be reasonable/useful to add another level of indirection while
> we're here - that would describe which part of the variable is being
> described by this location:
>
> DIGlobalVariableExpr(value: DW_OP_const 42) ->
> DIGlobalVariableFragment(offset: 32 bits) -> DIGlobalVariable(name: foo)
>
> (just guessing/pretending there's a DW_OP_const for constant values - I
> forget how it all works, but purely for demonstration purposes)
>
> That way code manipulating DIGlobalVariableExprs wouldn't need to have
> explicit knowledge of which part of the variable is being described here.
> (I suppose that could end up with us creating a bottom-up expression tree
> in metadata, which probably isn't ideal (too heavyweight))
>
>
> The nice thing about the Dwarf expressions being a stack-based
> post-fix-notated programming language is that it you can compose functions
> by simply concatenating them. The typical kind of manipulations that we do
> (for example, adding an "add offset", "dereference" operation) can done by
> simply putting the new expression in front of the old expression. I don't
> think it can get much simpler than that :-)
>
>
> Wouldn't the piece/bit_piece need to go on top level of the expression
> tree? I don't think you can simply concatenate things on top of it...
>
>
> The expression tree of a stack-based postfix language degenerates to a
> list. The works in the language can consume a different number of stack
> arguments, but under the assumption that one well-formed DIExpression takes
> and produces exactly one value (which is true for all non-constant
> expressions we support), it is safe to concatenate the expressions.
>
>
> But what does this mean to the DWARF consumer? All the examples in the
> spec seem to show DW_OP_(bit_)piece at the top level of the expression
> (once the expression stack is evaluated, it may have several
> DW_OP_(bit_)piece remaining on the stack - and they're all read, oldest to
> newest (bottom to top of the stack) as descriptions of portions of the
> variable)
>
>
> I don't think I understand what you're saying here. Can you give me a
> concrete example maybe?
>

Taking the example from the DWARF4 spec:

DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2

If we evaluate the stack, at the end of that evaluation the stack has two
elements:

DW_OP_piece(2) { DW_OP_reg10 }
DW_OP_piece(4) { DW_OP_reg3 }

And the DWARF consumer should (according to the description in the spec)
read this as describing a variable who's bytes [0, 4) are in reg3 and bytes
[4, 6) are in reg10.


>
>
>
>
>
> Oh, also, DW_OP_bit_piece is ordered, judging by the examples in the spec:
>
> DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
>  A variable whose first four bytes reside in register 3 and whose next two
> bytes reside in register 10.
>
> So how do we know which DIGlobalVariableExpr goes first? Or if there are
> holes between exprs?
>
> (am I making sense? not sure - I can try to explain differently, or
> perhaps just not understanding)
>
>
> Please let me know if I'm missing the point. I think there may be some
> confusion about how multi-piece values are represented in LLVM Metadata.
>
> One DIExpression only ever represents one DW_OP_piece (with the
> DW_OP_piece always being the very last element of the expression). To
> represent a variable with multiple pieces, we use two DIExpression. For
> example (extracted from the split-global.ll testcase):
>
> ; struct { int x,y; } Point;
> @point.x = global i32 1, align 4, !dbg !12
> @point.y = global i32 2, align 4, !dbg !13
>
> !0 = distinct !DIGlobalVariable(name: "point", scope: !1, file: !2, line:
> 1, type: !5, isLocal: false, isDefinition: true)
> !12 = !DIGlobalVariableExpression(var: !0,  expr:
> !DIExpression(DW_OP_bit_piece,  0, 32))
> !13 = !DIGlobalVariableExpression(var: !0,  expr:
> !DIExpression(DW_OP_bit_piece, 32, 32))
>
>
> I think this DWARF is wrong/doesn't make sense. DW_OP_bit_piece's two
> operands refer to the offset within the previous value on the stack (in
> this case, the previous/initial value on the stack is implied as the
> location of the respective globals, @point.x and @point.y). So !13 is
> invalid DWARF - it says "this piece starts 32 bits into a 32 bit region and
> is 32 bits long" - which is entirely outside the region.
>
>
> This is not yet DWARF, this is our internal representation in LLVM IR. A
> (non-constant)* DIExpression has an implicit first element that is the
> location of whatever is referring to the DIExpression. It could be a
> GlobalVariable, or a machine value tied to the DIExpression by means of a
> DBG_VALUE instruction.
>
> When we generate a DWARF expression from this we generate the implicit
> element first, which will turn into a DW_OP_(b)reg/addr/..., followed by
> the contents of the DIExpression
>

Yep yep - the wrongness I was referring to wasn't in that implicit first
argument, etc. Will describe more further down.


>
> *) constants are a little more special and for the sake of this discussion
> I'd like to ignore their them because they complicate our current model. I
> have a plan for generalizing our expression model to support an arbitrary
> number of locations (including zero), but let's leave that for a separate
> thread.
>
>
> "The DW_OP_bit_piece operation takes two operands. The first is an
> unsigned LEB128 number that gives the size in bits of the piece. The second
> is an unsigned LEB128 number that gives the offset in bits from the
> location defined by the preceding DWARF location description. "
>
> The DWARF expression to describe the location of that variable split in
> two would be:
>
> DW_OP_addr <addr of @point.x> DW_OP_piece 4 DW_OP_addr <addr of @point.y>
> DW_OP_piece 4
>
> Or, if you wanted to use OP_bit_piece, they would each be "DW_OP_bit_piece
> 32 0".
>
>
>
>
>
> The DWARF backend collects all the DIGlobalVariableExpressions and their
> respective GlobalVariables and translates this into a single location:
>
> DW_AT_location ([0x0000000000000000], piece 0x00000004,
> [0x0000000000000004], bit-piece 32 32)
>                             @point.x, !12, @point.y, !13.
>
>
> I believe this DWARF describes a location consisting of the range
> ([0x0000, 0x0004), [0x0008, 0x00012)) which is probably not intended.
>
>
> Oh, sorry! The human-readable output of Darwin dwarfdump is very
> misleading here: [0x123] is indeed a shorthand for DW_OP_addr 0x123. See
> also the corresponding CHECK in in split-global.ll in the review.
>

That wasn't my source of confusion - I assumed that was the representation
we were talking about.

So, after evaluating this location expression, the stack would contain:

DW_OP_bit_piece (32, 32) { DW_OP_addr 0x0004 }
DW_OP_piece (4) { DW_OP_addr 0x0000 }

And I believe this describes a variable who's bytes [0, 4) start at address
0x0000 and who's bytes [4, 8*) start at address (0x0008**). I don't think
this is what you intended.

* this is 4 bytes because the first operand to DW_OP_bit_piece is 4 bytes:
"The first is an unsigned LEB128 number that gives the size in bits of the
piece. "
* this is 0x0008 because it's 0x0004 bytes + 4 bytes because the second
operand to DW_OP_bit_piece is 4 bytes and: "The second is an unsigned
LEB128 number that gives the offset in bits from the location defined by
the preceding DWARF location description. "


>
>
> The example in 2.6.1.3:
>
> DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
>   A variable whose first four bytes reside in register 3 and whose next
> two bytes reside in register 10.
>
> The pieces are considered to fill the variable left to right.
>
>
> Yes.
>
>
> & the wording for op_bit_piece says that the offset refers to the location
> previous in the stack - not about where that fragment goes in the resulting
> variable.
>
>
> I don't follow. The previous value on the stack is the DW_OP_addr.
>

Yes - so the offset is an offset of that address. It says "look 4 bytes
past the address" not "put this 4 bytes into the outer variable
description".

To break down the language further:

"Each composition operation is immediately preceded by a simple location
description which describes the location where part of the resultant value
is contained." - so the "preceeding location description" is the thing on
the stack before the DW_OP_(bit_)piece is evaluated. The DW_OP_addr in your
examples and DW_OP_regX in the spec example.

"The second is an unsigned LEB128 number that gives the offset in bits from
the location defined by the preceding DWARF location description." - this
is saying the offset is an offset within that preceeding location
description, specifically:

"If the location is a memory address, the DW_OP_bit_piece operation
describes a sequence of bits relative to the location whose address is on
the top of the DWARF stack using the bit numbering and direction
conventions that are appropriate to the current language on the target
system. "

So what DW_OP_bit_piece(32, 32) says is "there's 32 bits of stuff, 32 bits
from where the previous expression on the stack points  to" - or, in your
example, it describes the range [0x0008, 0x0012).

I believe you were trying to describe a variable that consisted of
([0x0000, 0x0004), [0x0004, 0x0008)) - is that correct?

I believe the correct DWARF description for that would be:
DW_OP_addr(0x0000) DW_OP_piece(4) DW_OP_addr(0x0004) DW_OP_piece(4)

- Dave


>
>
> "The DW_OP_bit_piece operation takes two operands. The first is an
> unsigned LEB128 number that gives the size in bits of the piece. The second
> is an unsigned LEB128 number that gives the offset in bits from the
> location defined by the preceding DWARF location description. "
>
>
> What may also contribute to confusion is that DWARF expression operands'
> operands are different from DWARF expression stack values. Operands (such
> as the two numbers following DW_OP_bit_piece) are part of the DWARF
> expression itself and not pushed onto the stack.
>
> -- adrian
>
>
>
>
> does that make sense?
>
> -- adrian
>
>
>
>
>
> --- adrian
>
>
> Perhaps keeping the bit_piece parts out of the DWARF expression bytes and
> in a separate field of the DIGlobalVariableExpr?
>
> DIGlobalVariableExpr(offset: 32, value: DW_OP_const 42)
>
> Then at least when a variable moves around (does that happen? maybe? - I
> suppose something like asan, smooshing everything into a single alloca
> would move something around without splitting it up) the expression can be
> updated relatively simply without unpacking through a DW_OP_bit_piece, etc?
>
> Maybe this doesn't matter - Ih aven't looked at much of the
> expression/location handling code.
>
> On Fri, Nov 18, 2016 at 9:32 AM Adrian Prantl <aprantl at apple.com> wrote:
>
> aprantl removed rL LLVM as the repository for this revision.
> aprantl updated this revision to Diff 78544.
> aprantl added a comment.
>
> Improved type-safety by adding a DIGlobalVarExpr pointer union.
>
>
> https://reviews.llvm.org/D26769
>
> Files:
>   include/llvm/Bitcode/LLVMBitCodes.h
>   include/llvm/IR/DIBuilder.h
>   include/llvm/IR/DebugInfo.h
>   include/llvm/IR/DebugInfoMetadata.h
>   include/llvm/IR/GlobalVariable.h
>   include/llvm/IR/Metadata.def
>   lib/Analysis/ModuleDebugInfoPrinter.cpp
>   lib/AsmParser/LLParser.cpp
>   lib/Bitcode/Reader/BitcodeReader.cpp
>   lib/Bitcode/Writer/BitcodeWriter.cpp
>   lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
>   lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
>   lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
>   lib/CodeGen/AsmPrinter/DwarfDebug.cpp
>   lib/IR/AsmWriter.cpp
>   lib/IR/DIBuilder.cpp
>   lib/IR/DebugInfo.cpp
>   lib/IR/DebugInfoMetadata.cpp
>   lib/IR/LLVMContextImpl.h
>   lib/IR/Metadata.cpp
>   lib/IR/Verifier.cpp
>   lib/Transforms/IPO/StripSymbols.cpp
>   lib/Transforms/Instrumentation/AddressSanitizer.cpp
>   test/Assembler/diglobalvariable.ll
>   test/Assembler/diglobalvariableexpression.ll
>   test/Bitcode/DIGlobalVariableExpr.ll
>   test/Bitcode/DIGlobalVariableExpr.ll.bc
>   test/Bitcode/diglobalvariable-3.8.ll
>   test/Bitcode/diglobalvariable-3.8.ll.bc
>   test/DebugInfo/X86/multiple-at-const-val.ll
>   test/DebugInfo/X86/pr12831.ll
>   test/DebugInfo/X86/split-global.ll
>   test/DebugInfo/X86/stack-value-dwarf4.ll
>   test/DebugInfo/X86/unattached-global.ll
>   test/Transforms/GlobalMerge/debug-info.ll
>   unittests/IR/MetadataTest.cpp
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161118/adcfe322/attachment.html>


More information about the llvm-commits mailing list