[PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry
David Blaikie via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 9 08:44:50 PDT 2019
dblaikie added a comment.
In D68270#1700108 <https://reviews.llvm.org/D68270#1700108>, @probinson wrote:
> Do we care whether llvm-dwarfdump's output bears any similarities to the output from GNU readelf or objdump? There has been a push lately to get the LLVM "binutils" to behave more like GNU's, although AFAIK it hasn't gotten to the DWARF dumping part.
Generally I hope not to deal with that until there's a user with a need for it who wants to do the work & has a specific use-case that can help motivate which similarities are desirable and which ones don't matter (& perhaps if there's enough that they start to tradeoff usability - maybe the "compatibility mode" is a separate tool or separate flag to the existing tool).
My broader hope is probably that llvm-dwarfdump is more for interactive uses than other dumpers, so fewer people might try to build automated things on top of it & thus expect specific output (this gives us both the freedom not to match the GNU tools, and the freedom not to match previous llvm-dwarfdump behavior (which we've done a fair bit in the past - which seems to support the theory that people don't seem to be building much on top of this))
================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+ EntryIterator Absolute =
+ getAbsoluteLocations(
+ SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+ LookupPooledAddress)
+ .begin();
----------------
labath wrote:
> dblaikie wrote:
> > labath wrote:
> > > dblaikie wrote:
> > > > labath wrote:
> > > > > This parallel iteration is not completely nice, but I think it's worth being able to reuse the absolute range computation code. I'm open to ideas for improvement though.
> > > > Ah, I see - this is what you meant about "In particular it makes it possible to reuse this stuff in the dumping code, which would have been pretty hard with callbacks.".
> > > >
> > > > I'm wondering if that might be worth revisiting somewhat. A full iterator abstraction for one user here (well, two once you include lldb - but I assume it's likely going to build its own data structure from the iteration anyway, right? (it's not going to keep the iterator around, do anything interesting like partial iterations, re-iterate/etc - such that a callback would suffice))
> > > >
> > > > I could imagine two callback APIs for this - one that gets entries and locations and one that only gets locations by filtering on the entry version.
> > > >
> > > > eg:
> > > >
> > > > // for non-verbose output:
> > > > LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
> > > > if (Verbose && actually dumping debug_loc)
> > > > print(E) // print any LLE_*, raw parameters, etc
> > > > if (L)
> > > > print(*L) // print the resulting address range, section name (if verbose),
> > > > else
> > > > print(error stuff)
> > > > });
> > > >
> > > > One question would be "when/where do we print the DWARF expression" - if there's an error computing the address range, we can still print the expression, so maybe that happens unconditionally at the end of the callback, using the expression in the Entry? (then, arguably, the expression doesn't need to be in the DWARFLocation - and I'd say make the DWARFLocation a sectioned range, exactly the same type as for ranges so that part of the dumping code, etc, can be maximally reused)
> > > Actually, what lldb currently does is that it does not build any data structures at all (except storing the pointer to the right place in the debug_loc section. Then, whenever it wants to do something to the loclist, it parses it afresh. I don't know why it does this exactly, but I assume it has something to do with most locations never being used, or being only a couple of times, and the actual parsing being fairly fast. What this means is that lldb is not really a single "user", but there are like four or five places where it iterates through the list, depending on what does it actually want to do with it. It also does partial iteration where it stops as soon as it find the entry it was interested in.
> > > Now, all of that is possible with a callback (though I am generally trying to avoid them), but it does resurface the issue of what should be the value of the second argument for DW_LLE_base_address entries (the thing which I originally used a error type for).
> > > Maybe this should be actually one callback API, taking two callback functions, with one of them being invoked for base_address entries, and one for others? However, if we stick to the current approaches in both LLE and RLE of making the address pool resolution function a parameter (which I'd like to keep, as it makes my job in lldb easier), then this would actually be three callbacks, which starts to get unwieldy. Though one of those callbacks could be removed with the "DWARFUnit implementing a AddrOffsetResolver interface" idea, which I really like. :)
> > Ah, thanks for the details on LLDB's location parsing logic. That's interesting indeed!
> >
> > I can appreciate an iterator-based API if that's the sort of usage we've got, though I expect it doesn't have any interest in the low-level encoding & just wants the fully processed address ranges/locations - it doesn't want base_address or end_of_list entries? & I think the dual-iteration is a fairly awkward API design, trying to iterate them in lock-step, etc. I'd rather avoid that if reasonably possible.
> >
> > Either having an iterator API that gives only the fully processed data/semantic view & a completely different API if you want to access the low level primitives (LLE, etc) (this is how ranges works - there's an API that gives a collection of ranges & abstracts over v4/v5/rnglists/etc - though that's partly motivated by a strong multi-client need for that functionality for symbolizing, etc - but I think it's a good abstraction/model anyway (& one of the reasons the inline range list printing doesn't include encoding information, the API it uses is too high level to even have access to it))
> >
> > > Now, all of that is possible with a callback (though I am generally trying to avoid them), but it does resurface the issue of what should be the value of the second argument for DW_LLE_base_address entries (the thing which I originally used a error type for).
> >
> > Sorry, my intent in the above API was for the second argument to be Optional's "None" state when... oh, I see, I did use Expected there, rather than Optional, because there are legit error cases.
> >
> > I know it's sort of awkward, but I might be inclined to use Optional<Expected<AddressRange>> there. I realize two layers of wrapping is a bit weird, but I think it'd be nicer than having an error state for what, I think, isn't erroneous.
> >
> > > Maybe this should be actually one callback API, taking two callback functions, with one of them being invoked for base_address entries, and one for others? However, if we stick to the current approaches in both LLE and RLE of making the address pool resolution function a parameter (which I'd like to keep, as it makes my job in lldb easier), then this would actually be three callbacks, which starts to get unwieldy.
> >
> > Don't mind three callbacks too much.
> >
> > > Though one of those callbacks could be removed with the "DWARFUnit implementing a AddrOffsetResolver interface" idea, which I really like. :)
> >
> > Sorry, I haven't really looked at where the address resolver callback is registered and alternative designs being discussed - but yeah, going off just the one-sentence, it seems reasonable to have the DWARFUnit own an address resolver/be the thing you consult when you want to resolve an address (just through a normal function call in DWARFUnit, perhaps - which might, internally, use a callback registered when it was constructed).
> > I know it's sort of awkward, but I might be inclined to use Optional<Expected<AddressRange>> there. I realize two layers of wrapping is a bit weird, but I think it'd be nicer than having an error state for what, I think, isn't erroneous.
> Actually, my very first attempt at this patch used an `Expected<Optional<Whatever>>`, but then I scrapped it because I didn't think you'd like it. It's not the friendliest of APIs, but I think we can go with that.
>
> > Sorry, I haven't really looked at where the address resolver callback is registered and alternative designs being discussed - but yeah, going off just the one-sentence, it seems reasonable to have the DWARFUnit own an address resolver/be the thing you consult when you want to resolve an address (just through a normal function call in DWARFUnit, perhaps - which might, internally, use a callback registered when it was constructed).
>
> I think you got that backwards. I don't want the DWARFUnit to be the source of truth for address pool resolutions, as that would make it hard to use from lldb (it's far from ready to start using the llvm version right now). What I wanted was to replace the lambda/function_ref with a single-method interface. Then both DWARFUnits could implement that interface so that passing a DWARFUnit& would "just work" (but you wouldn't be limited to DWARFUnits as anyone could implement that interface, just like anyone can write a lambda).
As for Expected<Optional<Whatever>> (or Optional<Expected<>>) - yeah, I think this is a non-obvious API (both the general problem and this specific solution). I think it's probably worth discussing this design a bit more to save you time writing/rewriting things a bit. I guess there are a few layers of failure here.
There's the possibility that the iteration itself could fail - even for debug_loc style lists (if we reached the end of the section before encountering a terminating {0,0}). That would suggest a fallible iterator idiom: http://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges
But then, yes, when looking at the "processed"/semantic view, that could fail too in the case of an invalid address index, etc.
The generic/processed/abstracted-over-ranges-and-rnglists API for ranges produces a fully computer vector (& then returns Expected<vector> of that range) - is that reasonable? (this does mean manifesting a whole location in memory, which may not be needed so I could understand avoiding that even without fully implementing & demonstrating the vector solution is inadequate).
But I /think/ maybe the we could/should have two APIs - one generic API that abstracts over loc/loclists and only provides the fully processed view, and another that is type specific for dumping the underlying representation (only used in dumping debug_loclists).
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D68270/new/
https://reviews.llvm.org/D68270
More information about the llvm-commits
mailing list