[llvm-dev] Extracting LocList address ranges from DWO .debug_info

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Wed Feb 24 16:42:40 PST 2021


On Mon, Feb 22, 2021 at 10:50 AM Alexander Yermolovich <ayermolo at fb.com>
wrote:

> Hello David.
>
> My apologies, let me provide some context. I am helping with BOLT binary
> optimizer (soon to be upstreamed). As part of its functionality it updates
> debug information to reflect the changes it had made to the binary. Moving
> functions around, extracting cold blocks, ICF, etc.
> Right now, it works with monolithic Debug information, but not with
> Fission one.
>
> It completely re-writes debug line, ranges/aranges, and patches relevant
> DIEs entries to point to new offsets within those sections. Which means
> finding what current addresses are in DIE, mapping them to new addresses
> and from that new offsets within sections. For debug fission it also will
> need to re-write .debug_addr and update indices that point to it.
>
> I looked at llvm-symbolizer and this seems a bit high level.
>

It is, but somewhere down there it has to follow from executable to dwo/dwp
files - that part of its implementation might be able to be reused (may
benefit/require some refactoring to make it more reusable) for the purposes
you have. I'd suggest looking there first, if you have a chance.

Perhaps that looks like refactoring llvm-symbolizer to use a codepath that
looks like the ones you're already using, and making that work with dwos in
a way that it doesn't already - or changing your code to more like some
aspects of llvm-symbolizer's implementation and follow that codepath.

So llvm-symbolizer goes down through
LLVMSymbolizer::symbolizeInlinedCodeCommon ->
SymbolizableObjectFile::symbolizeInlinedCode ->
DWARFContext::getInliningInfoForAddress

It looks like this code does correctly stitch together the addr and ranges
tables in "parseDWO" (where it calls setAddrOffsetSection/setRangesSection).

But it sounds like you're trying to go from loading DWARFContext for
dwo/dwp files directly, back to the skeleton/executable - it may be better
to go forward instead of backwards? Load up the DWARFContext for the linked
executable, then walk the (possibly skeleton) units there, and
parseDWO/getDWO to walk into the split units - and those split units,
loaded that way, should have their addr table working correctly due to the
parseDWO code?


> So usage model is closer to 1) I think.
>
> Right now there is no link, but one solution would be to add it, when
> *getNonSkeletonUnitDIE/parseDWO *is called. This reflects the code in
> getAddrOffsetSection that tries to parse normal CUs current DWARFUnit is
> DWO. I don't know what original intent of that code was, but as it stands,
> I don't think it works because it parses none relocated skeleton CU in A.o.
> Rough idea: *
> https://reviews.llvm.org/D96826
>
> Alternative, that whole code can be skipped entirely. *
> https://reviews.llvm.org/D96827
>
> This works because in parseDWO we set AddrOffsetSectionBase, and
> AddrOffsetSection from .debug_addr in binary. Then in
> getAddrOffsetSectionItem we have all the information to get addresses from
> indices. One weird part is that DWARFDataExtractor is created with A.o
> file, while AddrOffsetSection is from A binary.
>
> The getAddrOffsetSectionItem is an important low level API. For example,
> it is also used by DWARFUnit::getLowandHighPC, along with
> DWARFDie::getLocations, DWARFUnit::findLocationLIstFromOffset. So, making a
> fix at that level, would make other more high-level APIs work for DWO
> contents.
>
> *Diffs are same ones as previously mentioned.
>
> Alex
>
> "
> Sorry I'm not really following all these pieces.
>
> There's two basic ways these APIs are predominantly used:
>
> 1) llvm-dwarfdump: This opens one file/context at a time, and generally
> doesn't open other files - such as dwos or o/exe for skeleton. (indeed,
> there's no reliable way to find a skeleton, given a dwo - only to find dwos
> given skeletons)
> 2) llvm-symbolizer: this opens executable files (or .o files) and from
> there can load dwo/dwp/dsym related files as needed
>
> What sort of use case do you have? I guess it can/should look something
> like (2) so can you use the LLVM debug info APIs in a similar manner to
> llvm-symbolizer to achieve your goals?
>
> "
> ------------------------------
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Monday, February 15, 2021 10:09 PM
> *To:* Alexander Yermolovich <ayermolo at fb.com>; Pavel Labath <
> pavel at labath.sk>
> *Cc:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] Extracting LocList address ranges from DWO
> .debug_info
>
> This stuff is a bit ad-hoc at best.
>
> I believe some of these APIs have been generalized enough to be usable
> for your use-case, but it might be at a lower level - specifically I
> think the loclist infrastructure is used by lldb when parsing DWARFv5.
> But it might be used without some of the LLVM DWARF Unit abstractions
> you're using. (those abstractions are used in llvm-dwarfdump - which
> often isn't dealing with both .o and .dwo, but only dumping one of the
> files & doing what it can (or sometimes dumping one file containing
> both sets of sections, in which case it can do some address lookup,
> etc, more conveniently))
>
> On Fri, Feb 12, 2021 at 6:07 PM Alexander Yermolovich via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hello
> >
> > I am wondering if this is a bug, or more likely something I am doing
> wrong/using wrong APIs.
> > I have binary A, and object file A.o, compiled with Clang debug fission
> single mode. So .dwo sections are in the object file. Although with split
> mode it would bre the same behavior.
> > Relevant parts of the code:
> > for (const auto &CU : DwCtx->compile_units()) {
> >       auto *const DwarfUnit = CU.get();
> >       if (llvm::Optional<uint64_t> DWOId = DwarfUnit->getDWOId()) {
> >         auto *CUDWO =
> static_cast<DWARFCompileUnit*>(DwarfUnit->getNonSkeletonUnitDIE(false).getDwarfUnit());
> >         ...
> >       }
> >   }
> >
> > Later in the code I iterate over DIEs for .debug_info.dwo and call
> > DIE.getLocations(dwarf::DW_AT_location);
> >
> > Alternatively can manually extract offset and call
> > CUnit->findLoclistFromOffset(Offset);
> >
> > It fails because it tries to look up address using DWARFUnit in
> NormalUnits that it extracts from A.o.
> > Under the hood vistAsoluteLocationList is called with
> getAddrOffsetSectionItem passed in.
> > Since this DWARFUnit is DWO, it invokes Context.info_section_units().
> Which uses A.o to create DW_SECT_INFO and DW_SECT_EXT_TYPES.
> > Then calls itself, but from the newly constructed Debug DWARFUnit. The
> skeleton CU that is in A.o.
> >
> > Since the way it's constructed the AddrOffsetSectionBase is never set,
> so getAddrOffsetSectionItem returns None. Eventually error is returned from
> high level API call.
> >
> > I ended up doing this to get address ranges:
> >           DWARFLocationExpressionsVector LocEVector;
> >           auto CallBack = [&](const DWARFLocationEntry &Entry) -> bool {
> >             auto StartAddress =
> >                 BaseUnit->getAddrOffsetSectionItem(Entry.Value0);
> >             if (!StartAddress) {
> >               //TODO: Handle Error
> >               return false;
> >             }
> >
> LocEVector.emplace_back(DWARFLocationExpression{DWARFAddressRange{
> >                 (*StartAddress).Address, (*StartAddress).Address +
> Entry.Value1,
> >                 Entry.SectionIndex}, Entry.Loc});
> >             return true;
> >           };
> >
> >           if(Unit->getLocationTable().visitLocationList(&Offset,
> CallBack))
> >             ...
> >
> >
> > But back to original API calls. Are they just not designed to work with
> DWO CUs, or am I missing something?
> >
> > Even if AddrOffsetSectionBase was set to 0, the address section it is
> accessing is in A.o and is not relocated. One would still need to get base
> address from the address from Skeleton CU to get fully resolved address
> ranges, or what I did to use index to access binary .debug_addr section
> directly (with appropriate AddrOffsetSectionBase).
> >
> > Thank You
> > Alex
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210224/f5b98c99/attachment.html>


More information about the llvm-dev mailing list