[PATCH] D56587: Introduce DW_OP_LLVM_convert

Wed Feb 13 00:32:50 PST 2019

markus marked 2 inline comments as done.
markus added inline comments.

================
Comment at: lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1937
+    // easiest way to pass custom values as the expressions are inserted into a
+    // byte stream rather early (see DwarfExpression::addExpression).
+    switch (State) {
----------------
dblaikie wrote:
> probinson wrote:
> > aprantl wrote:
> > > markus wrote:
> > > > aprantl wrote:
> > > > > This is to inject the reference to the basic type die, right?
> > > > > 
> > > > > This code should probably be factored out into a relocate/fixupTypeRefs() helper function. I also assume that you need to apply the same fixup for the case of a single, non-debug_lo, inline DW_AT_location, right?
> > > > > 
> > > > > The fact that the placeholder is encoded as a LEB128 sounds really dangerous. If we ever support any branching operations, it will mess with the offsets. Can we assume that the finalized DIE ref will always be a DW_OP_ref_addr or something with a fixed size? Could we make the placeholder the same fixed size, too?
> > > > > 
> > > > > If that doesn't work, the right solution is probably to defer the emission of DwarfExpressions until here, which we could do in a separate, preparatory commit.
> > > > > This is to inject the reference to the basic type die, right?
> > > > Yes
> > > > > This code should probably be factored out into a relocate/fixupTypeRefs() helper function. I also assume that you need to apply the same fixup for the case of a single, non-debug_lo, inline DW_AT_location, right?
> > > > Sounds reasonable. Not sure I could easily find where in the code the inline expressions are inserted though. If you could point at a file and line number that would be helpful. I guess another option would be to force these expressions (the ones containing a base type reference) to always end up in .debug_loc right?
> > > > > The fact that the placeholder is encoded as a LEB128 sounds really dangerous. If we ever support any branching operations, it will mess with the offsets. Can we assume that the finalized DIE ref will always be a DW_OP_ref_addr or something with a fixed size? Could we make the placeholder the same fixed size, too?
> > > > The spec states that the finalized base type DIE offset is encoded as a ULEB128 so not much choice about that but the value we pick up here (the one inserted in `DwarfExpression::addExpression`) is just a index so we could certainly encode that in a fixed size integer. If branches were to be introduced at a later point I imagine that the branch target in the emitted dwarf would be a label (`MCSymbol`) but in the intermediate expression vector a simple offset would probably suffice.
> > > > > If that doesn't work, the right solution is probably to defer the emission of DwarfExpressions until here, which we could do in a separate, preparatory commit.
> > > > I think that would be a good thing to do but unfortunately it seems far from easy to get rid of the stuff that is in between.
> > > > 
> > > > If you could point at a file and line number that would be helpful. 
> > > 
> > > 
> > > ```
> > > git grep addBlock.*DW_AT_location lib/CodeGen/AsmPrinter
> > > lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:    addBlock(*VariableDIE, dwarf::DW_AT_location, DwarfExpr->finalize());
> > > lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:        addBlock(*VariableDie, dwarf::DW_AT_location, DwarfExpr.finalize());
> > > lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:  addBlock(*VariableDie, dwarf::DW_AT_location, DwarfExpr.finalize());
> > > lib/CodeGen/AsmPrinter/DwarfUnit.cpp:        addBlock(ParamDIE, dwarf::DW_AT_location, Loc);
> > > ```
> > > 
> > > 
> > > 
> > > > The spec states that the finalized base type DIE offset is encoded as a ULEB128 so not much choice about that but the value we pick up here (the one inserted in DwarfExpression::addExpression) is just a index so we could certainly encode that in a fixed size integer.
> > > > 
> > > 
> > > I see. I didn't think about emitting branch targets a label differences, I thought we'd just hardcode the offsets.  I guess we can defer this until it becomes an issue. Encoding the temporary die reference with a fixed size would probably still be a good idea, just to keep this code simpler.
> > Re the ULEB as how to find the base_type DIEs, the unstated assumption is that the base_type DIEs would be emitted unconditionally at the top of the CU, so everyone can just use them as needed.  If you want to emit base_types lazily... don't do that.
> > Re branches, there are already branch operators, if that's what you're talking about (DW_OP_skip and _bra).
> I think maybe you don't need to make that assumption - you can produce a label difference as a uleb (though that may get complicated).
> 
> Yeah, we use it in debug_rnglists, for instance:
> 
>   .uleb128 .Lfunc_end0-.Lfunc_begin0 #   length
> 
> & even if I change that to be a label difference between labels that bound the uleb itself (ie: where the difference would vary depending on the number of bytes required for the uleb) clang still assembles it at least... :)
> Re the ULEB as how to find the base_type DIEs, the unstated assumption is that the base_type DIEs would be emitted unconditionally at the top of the CU, so everyone can just use them as needed. If you want to emit base_types lazily... don't do that.

I played with emitting the base_type DIEs directly after the CU header but since the size of that header varies and due to the phase ordering of how the debug info is emitted I still need label differences to be able to locate the base_type DIEs in a robust manner. Right now I would not say that they are emitted lazily but rather we find out which ones we need and then emit them in table form. Still need the labels though.

> Re branches, there are already branch operators, if that's what you're talking about (DW_OP_skip and _bra).

Where do you see these branch operators being used? I can't find them.

================
Comment at: lib/CodeGen/AsmPrinter/DwarfDebug.cpp:1937
+    // easiest way to pass custom values as the expressions are inserted into a
+    // byte stream rather early (see DwarfExpression::addExpression).
+    switch (State) {
----------------
markus wrote:
> dblaikie wrote:
> > probinson wrote:
> > > aprantl wrote:
> > > > markus wrote:
> > > > > aprantl wrote:
> > > > > > This is to inject the reference to the basic type die, right?
> > > > > > 
> > > > > > This code should probably be factored out into a relocate/fixupTypeRefs() helper function. I also assume that you need to apply the same fixup for the case of a single, non-debug_lo, inline DW_AT_location, right?
> > > > > > 
> > > > > > The fact that the placeholder is encoded as a LEB128 sounds really dangerous. If we ever support any branching operations, it will mess with the offsets. Can we assume that the finalized DIE ref will always be a DW_OP_ref_addr or something with a fixed size? Could we make the placeholder the same fixed size, too?
> > > > > > 
> > > > > > If that doesn't work, the right solution is probably to defer the emission of DwarfExpressions until here, which we could do in a separate, preparatory commit.
> > > > > > This is to inject the reference to the basic type die, right?
> > > > > Yes
> > > > > > This code should probably be factored out into a relocate/fixupTypeRefs() helper function. I also assume that you need to apply the same fixup for the case of a single, non-debug_lo, inline DW_AT_location, right?
> > > > > Sounds reasonable. Not sure I could easily find where in the code the inline expressions are inserted though. If you could point at a file and line number that would be helpful. I guess another option would be to force these expressions (the ones containing a base type reference) to always end up in .debug_loc right?
> > > > > > The fact that the placeholder is encoded as a LEB128 sounds really dangerous. If we ever support any branching operations, it will mess with the offsets. Can we assume that the finalized DIE ref will always be a DW_OP_ref_addr or something with a fixed size? Could we make the placeholder the same fixed size, too?
> > > > > The spec states that the finalized base type DIE offset is encoded as a ULEB128 so not much choice about that but the value we pick up here (the one inserted in `DwarfExpression::addExpression`) is just a index so we could certainly encode that in a fixed size integer. If branches were to be introduced at a later point I imagine that the branch target in the emitted dwarf would be a label (`MCSymbol`) but in the intermediate expression vector a simple offset would probably suffice.
> > > > > > If that doesn't work, the right solution is probably to defer the emission of DwarfExpressions until here, which we could do in a separate, preparatory commit.
> > > > > I think that would be a good thing to do but unfortunately it seems far from easy to get rid of the stuff that is in between.
> > > > > 
> > > > > If you could point at a file and line number that would be helpful. 
> > > > 
> > > > 
> > > > ```
> > > > git grep addBlock.*DW_AT_location lib/CodeGen/AsmPrinter
> > > > lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:    addBlock(*VariableDIE, dwarf::DW_AT_location, DwarfExpr->finalize());
> > > > lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:        addBlock(*VariableDie, dwarf::DW_AT_location, DwarfExpr.finalize());
> > > > lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:  addBlock(*VariableDie, dwarf::DW_AT_location, DwarfExpr.finalize());
> > > > lib/CodeGen/AsmPrinter/DwarfUnit.cpp:        addBlock(ParamDIE, dwarf::DW_AT_location, Loc);
> > > > ```
> > > > 
> > > > 
> > > > 
> > > > > The spec states that the finalized base type DIE offset is encoded as a ULEB128 so not much choice about that but the value we pick up here (the one inserted in DwarfExpression::addExpression) is just a index so we could certainly encode that in a fixed size integer.
> > > > > 
> > > > 
> > > > I see. I didn't think about emitting branch targets a label differences, I thought we'd just hardcode the offsets.  I guess we can defer this until it becomes an issue. Encoding the temporary die reference with a fixed size would probably still be a good idea, just to keep this code simpler.
> > > Re the ULEB as how to find the base_type DIEs, the unstated assumption is that the base_type DIEs would be emitted unconditionally at the top of the CU, so everyone can just use them as needed.  If you want to emit base_types lazily... don't do that.
> > > Re branches, there are already branch operators, if that's what you're talking about (DW_OP_skip and _bra).
> > I think maybe you don't need to make that assumption - you can produce a label difference as a uleb (though that may get complicated).
> > 
> > Yeah, we use it in debug_rnglists, for instance:
> > 
> >   .uleb128 .Lfunc_end0-.Lfunc_begin0 #   length
> > 
> > & even if I change that to be a label difference between labels that bound the uleb itself (ie: where the difference would vary depending on the number of bytes required for the uleb) clang still assembles it at least... :)
> > Re the ULEB as how to find the base_type DIEs, the unstated assumption is that the base_type DIEs would be emitted unconditionally at the top of the CU, so everyone can just use them as needed. If you want to emit base_types lazily... don't do that.
> 
> I played with emitting the base_type DIEs directly after the CU header but since the size of that header varies and due to the phase ordering of how the debug info is emitted I still need label differences to be able to locate the base_type DIEs in a robust manner. Right now I would not say that they are emitted lazily but rather we find out which ones we need and then emit them in table form. Still need the labels though.
> 
> > Re branches, there are already branch operators, if that's what you're talking about (DW_OP_skip and _bra).
> 
> Where do you see these branch operators being used? I can't find them.
> I think maybe you don't need to make that assumption - you can produce a label difference as a uleb (though that may get complicated).

Yes, that is what I currently do. It looks like this
```
        .byte   168                     # DW_OP_convert
        .uleb128 .Lbase_type0-.Lcu_begin0
```

> & even if I change that to be a label difference between labels that bound the uleb itself (ie: where the difference would vary depending on the number of bytes required for the uleb) clang still assembles it at least... :)

Yep, I thought about that too when I realized that I need to add some prototype support for our downstream assembler. Came to the conclusion that I could treat the ULEB128s as fixed size by sign/zero extending them, so that should simplify things a lot even though it is not space efficient... Either way it is not a problem for this review how the assembler solves it :)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56587/new/

https://reviews.llvm.org/D56587