[PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Mon Oct 7 18:32:00 PDT 2019

dblaikie added inline comments.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+  EntryIterator Absolute =
+      getAbsoluteLocations(
+          SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+          LookupPooledAddress)
+          .begin();
----------------
labath wrote:
> dblaikie wrote:
> > labath wrote:
> > > This parallel iteration is not completely nice, but I think it's worth being able to reuse the absolute range computation code. I'm open to ideas for improvement though.
> > Ah, I see - this is what you meant about "In particular it makes it possible to reuse this stuff in the dumping code, which would have been pretty hard with callbacks.".
> > 
> > I'm wondering if that might be worth revisiting somewhat. A full iterator abstraction for one user here (well, two once you include lldb - but I assume it's likely going to build its own data structure from the iteration anyway, right? (it's not going to keep the iterator around, do anything interesting like partial iterations, re-iterate/etc - such that a callback would suffice))
> > 
> > I could imagine two callback APIs for this - one that gets entries and locations and one that only gets locations by filtering on the entry version.
> > 
> > eg:
> > 
> >   // for non-verbose output:
> >   LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
> >     if (Verbose && actually dumping debug_loc)
> >       print(E) // print any LLE_*, raw parameters, etc
> >     if (L)
> >       print(*L) // print the resulting address range, section name (if verbose), 
> >     else
> >       print(error stuff)
> >   });
> > 
> > One question would be "when/where do we print the DWARF expression" - if there's an error computing the address range, we can still print the expression, so maybe that happens unconditionally at the end of the callback, using the expression in the Entry? (then, arguably, the expression doesn't need to be in the DWARFLocation - and I'd say make the DWARFLocation a sectioned range, exactly the same type as for ranges so that part of the dumping code, etc, can be maximally reused)
> Actually, what lldb currently does is that it does not build any data structures at all (except storing the pointer to the right place in the debug_loc section. Then, whenever it wants to do something to the loclist, it parses it afresh. I don't know why it does this exactly, but I assume it has something to do with most locations never being used, or being only a couple of times, and the actual parsing being fairly fast. What this means is that lldb is not really a single "user", but there are like four or five places where it iterates through the list, depending on what does it actually want to do with it. It also does partial iteration where it stops as soon as it find the entry it was interested in.
> Now, all of that is possible with a callback (though I am generally trying to avoid them), but it does resurface the issue of what should be the value of the second argument for DW_LLE_base_address entries (the thing which I originally used a error type for).
> Maybe this should be actually one callback API, taking two callback functions, with one of them being invoked for base_address entries, and one for others? However, if we stick to the current approaches in both LLE and RLE of making the address pool resolution function a parameter (which I'd like to keep, as it makes my job in lldb easier), then this would actually be three callbacks, which starts to get unwieldy. Though one of those callbacks could be removed with the "DWARFUnit implementing a AddrOffsetResolver interface" idea, which I really like. :)
Ah, thanks for the details on LLDB's location parsing logic. That's interesting indeed!

I can appreciate an iterator-based API if that's the sort of usage we've got, though I expect it doesn't have any interest in the low-level encoding & just wants the fully processed address ranges/locations - it doesn't want base_address or end_of_list entries? & I think the dual-iteration is a fairly awkward API design, trying to iterate them in lock-step, etc. I'd rather avoid that if reasonably possible.

Either having an iterator API that gives only the fully processed data/semantic view & a completely different API if you want to access the low level primitives (LLE, etc) (this is how ranges works - there's an API that gives a collection of ranges & abstracts over v4/v5/rnglists/etc - though that's partly motivated by a strong multi-client need for that functionality for symbolizing, etc - but I think it's a good abstraction/model anyway (& one of the reasons the inline range list printing doesn't include encoding information, the API it uses is too high level to even have access to it))

> Now, all of that is possible with a callback (though I am generally trying to avoid them), but it does resurface the issue of what should be the value of the second argument for DW_LLE_base_address entries (the thing which I originally used a error type for).

Sorry, my intent in the above API was for the second argument to be Optional's "None" state when... oh, I see, I did use Expected there, rather than Optional, because there are legit error cases.

I know it's sort of awkward, but I might be inclined to use Optional<Expected<AddressRange>> there. I realize two layers of wrapping is a bit weird, but I think it'd be nicer than having an error state for what, I think, isn't erroneous.

> Maybe this should be actually one callback API, taking two callback functions, with one of them being invoked for base_address entries, and one for others? However, if we stick to the current approaches in both LLE and RLE of making the address pool resolution function a parameter (which I'd like to keep, as it makes my job in lldb easier), then this would actually be three callbacks, which starts to get unwieldy.

Don't mind three callbacks too much.

> Though one of those callbacks could be removed with the "DWARFUnit implementing a AddrOffsetResolver interface" idea, which I really like. :)

Sorry, I haven't really looked at where the address resolver callback is registered and alternative designs being discussed - but yeah, going off just the one-sentence, it seems reasonable to have the DWARFUnit own an address resolver/be the thing you consult when you want to resolve an address (just through a normal function call in DWARFUnit, perhaps - which might, internally, use a callback registered when it was constructed).

================
Comment at: test/CodeGen/X86/debug-loclists.ll:16
 ; CHECK-NEXT: 0x00000000:
-; CHECK-NEXT:  [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
-; CHECK-NEXT:  [0x0000000000000004, 0x0000000000000012): DW_OP_breg3 RBX+0
-
-; There is no way to use llvm-dwarfdump atm (2018, october) to verify the DW_LLE_* codes emited,
-; because dumper is not yet implements that. Use asm code to do this check instead.
-;
-; RUN: llc -mtriple=x86_64-pc-linux -filetype=asm < %s -o - | FileCheck %s --check-prefix=ASM
-; ASM:      .section .debug_loclists,"", at progbits
-; ASM-NEXT: .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 # Length
-; ASM-NEXT: .Ldebug_loclist_table_start0:
-; ASM-NEXT:  .short 5                              # Version
-; ASM-NEXT:  .byte 8                               # Address size
-; ASM-NEXT:  .byte 0                               # Segment selector size
-; ASM-NEXT:  .long 0                               # Offset entry count
-; ASM-NEXT: .Lloclists_table_base0:                
-; ASM-NEXT: .Ldebug_loc0:
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Lfunc_begin0-.Lfunc_begin0  # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 117                             # DW_OP_breg5
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp1-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 115                             # DW_OP_breg3
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 0                               # DW_LLE_end_of_list
-; ASM-NEXT: .Ldebug_loclist_table_end0:
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000000, 0x0000000000000004 => [0x0000000000000000, 0x0000000000000004) DW_OP_breg5 RDI+0
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000004, 0x0000000000000012 => [0x0000000000000004, 0x0000000000000012) DW_OP_breg3 RBX+0
----------------
labath wrote:
> dblaikie wrote:
> > labath wrote:
> > > This tries to follow the RLE format as closely as possible, but I think something like
> > > ```
> > > [DW_LLE_offset_pair,  0x0000000000000000, 0x0000000000000004] => [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
> > > ```
> > > would make more sense (both here and for RLE).
> > Yep, that'd make more sense to me - are you planning to unify the codepaths for this? I think that'd be for the best.
> > 
> > If I were picking a printing from scratch, I might go with:
> > 
> >   DW_LLE_offset_pair(0x0000, 0x0004) => [0x0000, 0x0004): DW_OP_breg5 RDI+0
> > 
> > Making it look a bit more like a function call and function arguments. Though the () might be confusing with the range notation. 
> > 
> > I'm also undecided on the " => " separator. Whether a ':' might be better/fine, etc.
> > 
> > Totally open to ideas, but mostly I'd really love these to use loclist and ranges to use the same code as much as possible, so we can get consistency and any readability benefits, etc in both.
> I like the function call format. I hoping to get some code reuse, though it's still not fully clear to me how to achieve that..
I've posted my unification of range/loc/v4/v5 emission here: https://reviews.llvm.org/D68620 - & I'd imagine something similar in the parsing side.

================
Comment at: test/DebugInfo/X86/dwarfdump-debug-loclists.test:7
 # CHECK:       DW_AT_location [DW_FORM_sec_offset]   (0x0000000c
-# CHECK-NEXT:    [0x0000000000000010, 0x0000000000000020): DW_OP_breg5 RDI+0
-# CHECK-NEXT:    [0x0000000000000530, 0x0000000000000540): DW_OP_breg6 RBP-8, DW_OP_deref
-# CHECK-NEXT:    [0x0000000000000700, 0x0000000000000710): DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_offset_pair  ]: 0x0000000000000000, 0x0000000000000010 => [0x0000000000000010, 0x0000000000000020) DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_base_address ]: 0x0000000000000500
----------------
labath wrote:
> dblaikie wrote:
> > I don't think the inline dumping should print the encoding - I'd borrow a lot from/try to unify with the ranges printing, which doesn't. I think verbose ranges print the same as non-verbose except they also add the section name/number.
> Sure, I can do that, though I think that means there won't be a single place where one can see both the raw encodings and their interpretation -- section-based dumping will not show the interpretation (would you want me to show still show them I they happen to be interpretable without the base address or the address pool?), and the debug_info dumping will not show the encoding. Is that bad? -- I don't know...
Fair - that comes back to the issue I mentioned in a previous comment about potentially limiting dumping of non-debug_info sections based on the presence of a CU that references it (& only dumping it that way, rather than trying to parse it without a CU). DWARF isn't really designed to be parsed without the CU anyway. (could leave it in as best-effort to parse things without a referencing CU for debugging, etc).

Mostly I'm interested in unification perhaps more/primarily, than feature improvements - then we can make feature improvements to both ranges and locs without having to duplicate things.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68270/new/

https://reviews.llvm.org/D68270