[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Mon Oct 7 11:27:25 PDT 2019

labath marked 3 inline comments as done.
labath added inline comments.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+  EntryIterator Absolute =
+      getAbsoluteLocations(
+          SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+          LookupPooledAddress)
+          .begin();
----------------
dblaikie wrote:
> labath wrote:
> > This parallel iteration is not completely nice, but I think it's worth being able to reuse the absolute range computation code. I'm open to ideas for improvement though.
> Ah, I see - this is what you meant about "In particular it makes it possible to reuse this stuff in the dumping code, which would have been pretty hard with callbacks.".
> 
> I'm wondering if that might be worth revisiting somewhat. A full iterator abstraction for one user here (well, two once you include lldb - but I assume it's likely going to build its own data structure from the iteration anyway, right? (it's not going to keep the iterator around, do anything interesting like partial iterations, re-iterate/etc - such that a callback would suffice))
> 
> I could imagine two callback APIs for this - one that gets entries and locations and one that only gets locations by filtering on the entry version.
> 
> eg:
> 
>   // for non-verbose output:
>   LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
>     if (Verbose && actually dumping debug_loc)
>       print(E) // print any LLE_*, raw parameters, etc
>     if (L)
>       print(*L) // print the resulting address range, section name (if verbose), 
>     else
>       print(error stuff)
>   });
> 
> One question would be "when/where do we print the DWARF expression" - if there's an error computing the address range, we can still print the expression, so maybe that happens unconditionally at the end of the callback, using the expression in the Entry? (then, arguably, the expression doesn't need to be in the DWARFLocation - and I'd say make the DWARFLocation a sectioned range, exactly the same type as for ranges so that part of the dumping code, etc, can be maximally reused)
Actually, what lldb currently does is that it does not build any data structures at all (except storing the pointer to the right place in the debug_loc section. Then, whenever it wants to do something to the loclist, it parses it afresh. I don't know why it does this exactly, but I assume it has something to do with most locations never being used, or being only a couple of times, and the actual parsing being fairly fast. What this means is that lldb is not really a single "user", but there are like four or five places where it iterates through the list, depending on what does it actually want to do with it. It also does partial iteration where it stops as soon as it find the entry it was interested in.
Now, all of that is possible with a callback (though I am generally trying to avoid them), but it does resurface the issue of what should be the value of the second argument for DW_LLE_base_address entries (the thing which I originally used a error type for).
Maybe this should be actually one callback API, taking two callback functions, with one of them being invoked for base_address entries, and one for others? However, if we stick to the current approaches in both LLE and RLE of making the address pool resolution function a parameter (which I'd like to keep, as it makes my job in lldb easier), then this would actually be three callbacks, which starts to get unwieldy. Though one of those callbacks could be removed with the "DWARFUnit implementing a AddrOffsetResolver interface" idea, which I really like. :)

================
Comment at: test/CodeGen/X86/debug-loclists.ll:16
 ; CHECK-NEXT: 0x00000000:
-; CHECK-NEXT:  [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
-; CHECK-NEXT:  [0x0000000000000004, 0x0000000000000012): DW_OP_breg3 RBX+0
-
-; There is no way to use llvm-dwarfdump atm (2018, october) to verify the DW_LLE_* codes emited,
-; because dumper is not yet implements that. Use asm code to do this check instead.
-;
-; RUN: llc -mtriple=x86_64-pc-linux -filetype=asm < %s -o - | FileCheck %s --check-prefix=ASM
-; ASM:      .section .debug_loclists,"", at progbits
-; ASM-NEXT: .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 # Length
-; ASM-NEXT: .Ldebug_loclist_table_start0:
-; ASM-NEXT:  .short 5                              # Version
-; ASM-NEXT:  .byte 8                               # Address size
-; ASM-NEXT:  .byte 0                               # Segment selector size
-; ASM-NEXT:  .long 0                               # Offset entry count
-; ASM-NEXT: .Lloclists_table_base0:                
-; ASM-NEXT: .Ldebug_loc0:
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Lfunc_begin0-.Lfunc_begin0  # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 117                             # DW_OP_breg5
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp1-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 115                             # DW_OP_breg3
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 0                               # DW_LLE_end_of_list
-; ASM-NEXT: .Ldebug_loclist_table_end0:
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000000, 0x0000000000000004 => [0x0000000000000000, 0x0000000000000004) DW_OP_breg5 RDI+0
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000004, 0x0000000000000012 => [0x0000000000000004, 0x0000000000000012) DW_OP_breg3 RBX+0
----------------
dblaikie wrote:
> labath wrote:
> > This tries to follow the RLE format as closely as possible, but I think something like
> > ```
> > [DW_LLE_offset_pair,  0x0000000000000000, 0x0000000000000004] => [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
> > ```
> > would make more sense (both here and for RLE).
> Yep, that'd make more sense to me - are you planning to unify the codepaths for this? I think that'd be for the best.
> 
> If I were picking a printing from scratch, I might go with:
> 
>   DW_LLE_offset_pair(0x0000, 0x0004) => [0x0000, 0x0004): DW_OP_breg5 RDI+0
> 
> Making it look a bit more like a function call and function arguments. Though the () might be confusing with the range notation. 
> 
> I'm also undecided on the " => " separator. Whether a ':' might be better/fine, etc.
> 
> Totally open to ideas, but mostly I'd really love these to use loclist and ranges to use the same code as much as possible, so we can get consistency and any readability benefits, etc in both.
I like the function call format. I hoping to get some code reuse, though it's still not fully clear to me how to achieve that..

================
Comment at: test/DebugInfo/X86/dwarfdump-debug-loclists.test:7
 # CHECK:       DW_AT_location [DW_FORM_sec_offset]   (0x0000000c
-# CHECK-NEXT:    [0x0000000000000010, 0x0000000000000020): DW_OP_breg5 RDI+0
-# CHECK-NEXT:    [0x0000000000000530, 0x0000000000000540): DW_OP_breg6 RBP-8, DW_OP_deref
-# CHECK-NEXT:    [0x0000000000000700, 0x0000000000000710): DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_offset_pair  ]: 0x0000000000000000, 0x0000000000000010 => [0x0000000000000010, 0x0000000000000020) DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_base_address ]: 0x0000000000000500
----------------
dblaikie wrote:
> I don't think the inline dumping should print the encoding - I'd borrow a lot from/try to unify with the ranges printing, which doesn't. I think verbose ranges print the same as non-verbose except they also add the section name/number.
Sure, I can do that, though I think that means there won't be a single place where one can see both the raw encodings and their interpretation -- section-based dumping will not show the interpretation (would you want me to show still show them I they happen to be interpretable without the base address or the address pool?), and the debug_info dumping will not show the encoding. Is that bad? -- I don't know...

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68270/new/

https://reviews.llvm.org/D68270