[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Fri Oct 4 13:58:26 PDT 2019

dblaikie added inline comments.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+  EntryIterator Absolute =
+      getAbsoluteLocations(
+          SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+          LookupPooledAddress)
+          .begin();
----------------
labath wrote:
> This parallel iteration is not completely nice, but I think it's worth being able to reuse the absolute range computation code. I'm open to ideas for improvement though.
Ah, I see - this is what you meant about "In particular it makes it possible to reuse this stuff in the dumping code, which would have been pretty hard with callbacks.".

I'm wondering if that might be worth revisiting somewhat. A full iterator abstraction for one user here (well, two once you include lldb - but I assume it's likely going to build its own data structure from the iteration anyway, right? (it's not going to keep the iterator around, do anything interesting like partial iterations, re-iterate/etc - such that a callback would suffice))

I could imagine two callback APIs for this - one that gets entries and locations and one that only gets locations by filtering on the entry version.

eg:

  // for non-verbose output:
  LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
    if (Verbose && actually dumping debug_loc)
      print(E) // print any LLE_*, raw parameters, etc
    if (L)
      print(*L) // print the resulting address range, section name (if verbose), 
    else
      print(error stuff)
  });

One question would be "when/where do we print the DWARF expression" - if there's an error computing the address range, we can still print the expression, so maybe that happens unconditionally at the end of the callback, using the expression in the Entry? (then, arguably, the expression doesn't need to be in the DWARFLocation - and I'd say make the DWARFLocation a sectioned range, exactly the same type as for ranges so that part of the dumping code, etc, can be maximally reused)

================
Comment at: test/CodeGen/X86/debug-loclists.ll:16
 ; CHECK-NEXT: 0x00000000:
-; CHECK-NEXT:  [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
-; CHECK-NEXT:  [0x0000000000000004, 0x0000000000000012): DW_OP_breg3 RBX+0
-
-; There is no way to use llvm-dwarfdump atm (2018, october) to verify the DW_LLE_* codes emited,
-; because dumper is not yet implements that. Use asm code to do this check instead.
-;
-; RUN: llc -mtriple=x86_64-pc-linux -filetype=asm < %s -o - | FileCheck %s --check-prefix=ASM
-; ASM:      .section .debug_loclists,"", at progbits
-; ASM-NEXT: .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 # Length
-; ASM-NEXT: .Ldebug_loclist_table_start0:
-; ASM-NEXT:  .short 5                              # Version
-; ASM-NEXT:  .byte 8                               # Address size
-; ASM-NEXT:  .byte 0                               # Segment selector size
-; ASM-NEXT:  .long 0                               # Offset entry count
-; ASM-NEXT: .Lloclists_table_base0:                
-; ASM-NEXT: .Ldebug_loc0:
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Lfunc_begin0-.Lfunc_begin0  # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 117                             # DW_OP_breg5
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp1-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 115                             # DW_OP_breg3
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 0                               # DW_LLE_end_of_list
-; ASM-NEXT: .Ldebug_loclist_table_end0:
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000000, 0x0000000000000004 => [0x0000000000000000, 0x0000000000000004) DW_OP_breg5 RDI+0
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000004, 0x0000000000000012 => [0x0000000000000004, 0x0000000000000012) DW_OP_breg3 RBX+0
----------------
labath wrote:
> This tries to follow the RLE format as closely as possible, but I think something like
> ```
> [DW_LLE_offset_pair,  0x0000000000000000, 0x0000000000000004] => [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
> ```
> would make more sense (both here and for RLE).
Yep, that'd make more sense to me - are you planning to unify the codepaths for this? I think that'd be for the best.

If I were picking a printing from scratch, I might go with:

  DW_LLE_offset_pair(0x0000, 0x0004) => [0x0000, 0x0004): DW_OP_breg5 RDI+0

Making it look a bit more like a function call and function arguments. Though the () might be confusing with the range notation. 

I'm also undecided on the " => " separator. Whether a ':' might be better/fine, etc.

Totally open to ideas, but mostly I'd really love these to use loclist and ranges to use the same code as much as possible, so we can get consistency and any readability benefits, etc in both.

================
Comment at: test/DebugInfo/X86/dwarfdump-debug-loclists.test:7
 # CHECK:       DW_AT_location [DW_FORM_sec_offset]   (0x0000000c
-# CHECK-NEXT:    [0x0000000000000010, 0x0000000000000020): DW_OP_breg5 RDI+0
-# CHECK-NEXT:    [0x0000000000000530, 0x0000000000000540): DW_OP_breg6 RBP-8, DW_OP_deref
-# CHECK-NEXT:    [0x0000000000000700, 0x0000000000000710): DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_offset_pair  ]: 0x0000000000000000, 0x0000000000000010 => [0x0000000000000010, 0x0000000000000020) DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_base_address ]: 0x0000000000000500
----------------
I don't think the inline dumping should print the encoding - I'd borrow a lot from/try to unify with the ranges printing, which doesn't. I think verbose ranges print the same as non-verbose except they also add the section name/number.

================
Comment at: test/DebugInfo/X86/fission-ranges.ll:48
 ; CHECK:      [[A]]:
-; CHECK-NEXT:   Addr idx 2 (w/ length 15): DW_OP_consts +0, DW_OP_stack_value
-; CHECK-NEXT:   Addr idx 3 (w/ length 15): DW_OP_reg0 RAX
-; CHECK-NEXT:   Addr idx 4 (w/ length 18): DW_OP_breg7 RSP-8
+; CHECK-NEXT:   [DW_LLE_startx_length]: 0x00000002, 0x0000000f => <Failed to read address offset 2> DW_OP_consts +0, DW_OP_stack_value
+; CHECK-NEXT:   [DW_LLE_startx_length]: 0x00000003, 0x0000000f => <Failed to read address offset 3> DW_OP_reg0 RAX
----------------
labath wrote:
> This is somewhat annoying, because the entries printed through the loclists section will always have this error (as we don't have the DWARFUnit). I'll have to figure out a way to suppress those, while still keeping them around when printing from DWARFDie (as there a failure means a real error).
IMHO we may want to move to a model where we don't try to create/parse any content except by finding a reference from a CU (or the DWARFv5 stanfdalone line tables).

In theory, it's perfectly find to have random garbage in debug sections other than debug_info (or the standalone line table) - because the only parts that should be parsed are those referenced from debug_info.

This came up in the form of a bug in location list dumping when the binary is linked with bfd ld. It doesn't update any addresses to discarded sections, leaving them as zero (whereas gold and lld write the addend to the relocation - which generally makes sure any range pair doesn't end up as "zero zero" which marks the end of a list) which terminates a list early and leads to the following location expression to be parsed as the start of a new list... which is totally bogus.
Now, granted, the resulting debug info from bfd ld is wrong (if you had a location list spanning multiple functions (eg: a global variable had been put in a register for the duration of a function, etc) then resolving any one of those location entries to zero-zero would terminate the list early even though there might be non-dropped functions in the list after that point) - but I still think there's something to be said for it.

There's a fair counterargument too - that we might want to be able to make a best-effort to dump content that isn't complete (eg: if a section was emitted alone - or there was some hunk of unreferenced location list in the debug_loc section, it might be interesting to know what's in that hunk - might give you hints about where it /should/ have been referenced from)

Apparently binutils objdump when printing debug info only dumps those referenced pieces and prints info about "holes" when there's unreferenced chunks.

Ah, here's the bug context on that: https://bugs.llvm.org/show_bug.cgi?id=43290

But, yeah, all that aside - given the architecture of libDebugInfoDWARF/llvm-dwarfdump right now, yes, it'd be good to omit those error messages.

Also note that address indexes wouldn't be resolvable when dumping .dwo files - since the debug_addr would be in the .o file instead. So it'd be good to not print lots of error messages there either.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68270/new/

https://reviews.llvm.org/D68270