[llvm-dev] Extracting LocList address ranges from DWO .debug_info

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 3 14:03:32 PST 2021


Ah, OK.

So this functionality was added in
https://github.com/llvm/llvm-project/commit/c8ae09673969e7b179fe419d780d1d0f2d2c2c19
- though the comment there is a bit confusing (a true dwo file wouldn't
have any skeleton units in it - it looks like, judging from the test case,
what I was talking about was dumping .o files that contain both dwo and
non-dwo sections (the original way dwo files were emitted was to produce a
single .o file then run objcopy to split out the dwo parts - Clang doesn't
do that anymore (though llc still retains the functionality), instead
producing the two files separately from the start - but split-dwarf=single
now recreates that situation again))

So this isn't intended to apply when navigating from a .o/linked executable
to a .dwo. Reordering the contents of this function a bit - something in
the realm of your second patch ( https://reviews.llvm.org/D96827 ) seems
like the right path.

I might be inclined for it to be more like:

if (!addr base) {
  if (!isdwo)
    return None;
  R = info_section_units;
  if (!hasSingleElement(R))
    return None;
  return (*R.begin())->getAddrOffsetSectionItem(Index);
}

To reduce indentation and avoid duplicate tests.

If you could update that review with code something like that, and include
a test case - I'd be happy to review it.

On Wed, Mar 3, 2021 at 1:01 PM Alexander Yermolovich <ayermolo at fb.com>
wrote:

> Hello David
>
> Thank you for the example. I was able to reproduce your results.
> llvm-symbolizer 0x400611 -obj=a.out
> f2()
> /home/ayermolo/local/tasks/T83058825/test.cpp:7:3
> main
> /home/ayermolo/local/tasks/T83058825/test.cpp:13:3
>
> I was wrong lumping .dwo files, split mode, into this. I primarily been
> looking at it in -gsplit-dwarf=single mode where .dwo sections are left in
> the .o files.
> In single mode in .o file there is the Skelton CU that gets relocated by
> linker, and the dwo sections. In -gsplit-dwarf=split mode where debug
> information is in the .dwo files there is only .dwo sections. I am probably
> repeating what you already know, but in case others read this who are not
> familiar. πŸ™‚
>
> Reason your example works is because -gsplit-dwarf defaults to
> -gsplit-dwarf=split. When code gets to DWARFUnit::getAddrOffsetSectionItem
> just like before we are in DWO Context/DWO CU so IsDWO is set. It tries to
> parse NormalUnits
> auto R = Context.info_section_units();
>
> But since we are dealing with .dwo files there is nothing there. Just dwo
> sections.
> It then goes for the, what looks like sanity check, hasSingleElement.
> Which returns false because NormalUnits is empty.
> At which point it goes to retrieve address in the DWO CU. Same path as
> with my exploratory changes.
>
> Building in single mode, keeping .o around, should reproduce:
> clang++ -g -gsplit-dwarf=single test.cpp -O3 -c
> ld.lld -out a.out test.o
>
> Sorry for the confusion.
>
> Alex
>
> ------------------------------
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Tuesday, March 2, 2021 6:06 PM
> *To:* Alexander Yermolovich <ayermolo at fb.com>
> *Cc:* Pavel Labath <pavel at labath.sk>; llvm-dev at lists.llvm.org <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] Extracting LocList address ranges from DWO
> .debug_info
>
> Could you provide more detailed repro steps - so far as I can see,
> llvm-symbolizer is correctly reading dwo files:
>
> *$ cat test.cpp*
>
> __attribute__((nodebug)) __attribute__((optnone)) void f1() {
>
> }
>
> __attribute__((always_inline)) inline void f2() {
>
>   f1();
>
> }
>
> int main() {
>
>   f2();
>
> }
>
> *$ clang++-tot -g -gsplit-dwarf test.cpp -O3*
>
> *$ llvm-symbolizer 0x401121 -obj=a.out*
>
> f2()
>
> /usr/local/google/home/blaikie/dev/scratch/test.cpp:4:3
>
> main
>
> /usr/local/google/home/blaikie/dev/scratch/test.cpp:7:3
>
>
> *$ rm test.dwo*
>
> *$ llvm-symbolizer 0x401121 -obj=a.out*
>
> main
>
> /usr/local/google/home/blaikie/dev/scratch/test.cpp:4:3
>
>
>
>
>
> On Tue, Mar 2, 2021 at 5:40 PM Alexander Yermolovich <ayermolo at fb.com>
> wrote:
>
> Hello David
>
> Thank you for the pointer.
> I looked at llvm-symbolizer and I think it suffers from the same problem.
> First output:
> Case 1: Monolithic debug information:
> llvm-symbolizer --obj bzip2 --print-address 0x00000000004014b7
> 0x4014b7
> copyFileName
> /home/ayermolo/local/bzip2_base/bzip2.c:941:3
> main
> /home/ayermolo/local/bzip2_base/bzip2.c:1823:4
>
> Case 2: Debug fission with upstream build
> llvm-symbolizer --obj bzip2 --print-address 0x00000000004014b7
> 0x4014b7
> main
> /home/ayermolo/local/bzip2_DF/bzip2.c:941:3
>
> Case 3: Debug fission with changes (either one will work) proposed
> llvm-symbolizer --obj bzip2 --print-address 0x00000000004014b7
> 0x4014b7
> copyFileName
> /home/ayermolo/local/bzip2_DF/bzip2.c:941:3
> main
> /home/ayermolo/local/bzip2_DF/bzip2.c:1823:4
>
> For reference
> Debug entry in Monolithic format
>
> 0x00000784:     DW_TAG_inlined_subroutine [29] *
>
>                   DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x06a9 =>
> {0x000006a9} "copyFileName")
>
>                   DW_AT_low_pc [DW_FORM_addr] (0x00000000004014b7)
>
>                   DW_AT_high_pc [DW_FORM_data4] (0x0000001b)
>
>                   DW_AT_call_file [DW_FORM_data1]
> ("/home/ayermolo/local/bzip2_base/bzip2.c")
>
>                   DW_AT_call_line [DW_FORM_data2] (1823)
>
>                   DW_AT_call_column [DW_FORM_data1] (0x04)
>
>
> Debug entry in Debug fission format
> 0x0000052e:     DW_TAG_inlined_subroutine [29] *
>                   DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x047e =>
> {0x0000047e} "copyFileName")
>                   DW_AT_low_pc [DW_FORM_GNU_addr_index] (indexed
> (0000001a) address = 0x00000000000000a7 ".text.main")
>                   DW_AT_high_pc [DW_FORM_data4] (0x0000001b)
>                   DW_AT_call_file [DW_FORM_data1] (0x01)
>                   DW_AT_call_line [DW_FORM_data2] (1823)
>                   DW_AT_call_column [DW_FORM_data1] (0x04)
>
>
>
> To dig into APIs.
>
> SymbolizableObjectFile::symbolizeInlinedCode β†’
> DWARFContext::getInliningInfoForAddress β†’
> DWARFUnit::getInlinedChainForAddress β†’ DWARFUnit::parseDWO
>
> At which point DWO Context is created, DWO CU is created and DWO field is
> set in Skeleton CU.
>
> By comparison this is how I get DWO CU:
> DWARFUnit::getNonSkeletonUnitDIE --> DWARFUnit::parseDWO()
>
> ​After parseDWO a DWARFUnit::getSubroutineForAddress is called on DWO CU
> (since we are dealing with debug fission).
> DWARFDie SubroutineDIE =
>       (DWO ? *DWO : *this).getSubroutineForAddress(Address);
>
>
> getSubroutineForAddress calls DWARFUnit::updateAddressDieMap.
> As part of DWARFUnit::updateAddressDieMap we get this sequence of calls:
> DWARFDie::getAddressRanges() β†’ DWARFDie::getLowAndHighPC β†’
> toSectionedAddress β†’ DWARFFormValue::getAsAddress() β†’
> DWARFUnit::getAddrOffsetSectionItem
>
> The DWARFUnit::getAddrOffsetSectionItem returns NONE (to circle back to
> original post) because in this DWO CU IsDWO flag is set, it then tries to
> parse NormalUnits. Except now it gets un-relocated Skelton CU from .o/.dwo,
> and it invokes DWARFUnit::getAddrOffsetSectionItem on that. Since
> AddrOffsetSectionBase is not set it returns NONE.
>
> So we basically start from relocated Skeleton CU we got from binary debug
> information, create DWO CU from .o/.dwo, we then create Skeleton CU from
> .o/.dwo and try to get address from it. Since .debug_addr is in the binary, and
> we never set correct section/offset in that Sketon CU clearly that
> doesn't work.
>
> My usage model.
> It is more from bottom up as you have mentioned. This is because of what
> needs to be done. Bolt moves functions around, hoists out cold sections
> into their own functions, etc. It also converts low_pc/high_pc to ranges.
> So .debug_ranges, .debug_addr, .debug_loc are completely re-written. We
> then update every reference in DIE with new value of DW_AT_low_pc or
> modify DW_AT_low_pc/DW_AT_high_pc to range semantic. This means that we
> need to iterate over every DIE get original address map it to new
> address/addresses and update the DIE. Both in CUs in binary (in case of
> monolithic or fission + -fsplit-dwarf-inlining), and in .debug_info.dwo CUs.
>
> For example, when processing DW_TAG_inline_subroutine a
> DWARFDie::getAddressRanges()  is invoked. Which follows the same
> execution path as when it is invoked in symbolizer and hits the same
> problem.
>
> Now I can get raw index Value from DIE, then look up address in Skeleton
> CU with getAddrOffsetSectionItem, but it exposes extra complexity.
>
> I think if we can do a fix "under the hood" it will simplify things and
> looks like will help tools like symbolizer also.
>
> To iterate the two patches are just to start the discussion, maybe a more
> extensive refactoring is necessary. Working on bolt and looking at
> symoblizer (or at least part of it) I don't quite understand logic in getAddrOffsetSectionItem.
> Doesn't seem like it works at least in those usage models.
>
> Alex
>
>
> ------------------------------
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Wednesday, February 24, 2021 4:42 PM
> *To:* Alexander Yermolovich <ayermolo at fb.com>
> *Cc:* Pavel Labath <pavel at labath.sk>; llvm-dev at lists.llvm.org <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] Extracting LocList address ranges from DWO
> .debug_info
>
>
>
> On Mon, Feb 22, 2021 at 10:50 AM Alexander Yermolovich <ayermolo at fb.com>
> wrote:
>
> Hello David.
>
> My apologies, let me provide some context. I am helping with BOLT binary
> optimizer (soon to be upstreamed). As part of its functionality it updates
> debug information to reflect the changes it had made to the binary. Moving
> functions around, extracting cold blocks, ICF, etc.
> Right now, it works with monolithic Debug information, but not with
> Fission one.
>
> It completely re-writes debug line, ranges/aranges, and patches relevant
> DIEs entries to point to new offsets within those sections. Which means
> finding what current addresses are in DIE, mapping them to new addresses
> and from that new offsets within sections. For debug fission it also will
> need to re-write .debug_addr and update indices that point to it.
>
> I looked at llvm-symbolizer and this seems a bit high level.
>
>
> It is, but somewhere down there it has to follow from executable to
> dwo/dwp files - that part of its implementation might be able to be reused
> (may benefit/require some refactoring to make it more reusable) for the
> purposes you have. I'd suggest looking there first, if you have a chance.
>
> Perhaps that looks like refactoring llvm-symbolizer to use a codepath that
> looks like the ones you're already using, and making that work with dwos in
> a way that it doesn't already - or changing your code to more like some
> aspects of llvm-symbolizer's implementation and follow that codepath.
>
> So llvm-symbolizer goes down through
> LLVMSymbolizer::symbolizeInlinedCodeCommon ->
> SymbolizableObjectFile::symbolizeInlinedCode ->
> DWARFContext::getInliningInfoForAddress
>
> It looks like this code does correctly stitch together the addr and ranges
> tables in "parseDWO" (where it calls setAddrOffsetSection/setRangesSection).
>
> But it sounds like you're trying to go from loading DWARFContext for
> dwo/dwp files directly, back to the skeleton/executable - it may be better
> to go forward instead of backwards? Load up the DWARFContext for the linked
> executable, then walk the (possibly skeleton) units there, and
> parseDWO/getDWO to walk into the split units - and those split units,
> loaded that way, should have their addr table working correctly due to the
> parseDWO code?
>
>
> So usage model is closer to 1) I think.
>
> Right now there is no link, but one solution would be to add it, when
> *getNonSkeletonUnitDIE/parseDWO *is called. This reflects the code in
> getAddrOffsetSection that tries to parse normal CUs current DWARFUnit is
> DWO. I don't know what original intent of that code was, but as it stands,
> I don't think it works because it parses none relocated skeleton CU in A.o.
> Rough idea: *
> https://reviews.llvm.org/D96826
>
> Alternative, that whole code can be skipped entirely. *
> https://reviews.llvm.org/D96827
>
> This works because in parseDWO we set AddrOffsetSectionBase, and
> AddrOffsetSection from .debug_addr in binary. Then in
> getAddrOffsetSectionItem we have all the information to get addresses from
> indices. One weird part is that DWARFDataExtractor is created with A.o
> file, while AddrOffsetSection is from A binary.
>
> The getAddrOffsetSectionItem is an important low level API. For example,
> it is also used by DWARFUnit::getLowandHighPC, along with
> DWARFDie::getLocations, DWARFUnit::findLocationLIstFromOffset. So, making a
> fix at that level, would make other more high-level APIs work for DWO
> contents.
>
> *Diffs are same ones as previously mentioned.
>
> Alex
>
> "
> Sorry I'm not really following all these pieces.
>
> There's two basic ways these APIs are predominantly used:
>
> 1) llvm-dwarfdump: This opens one file/context at a time, and generally
> doesn't open other files - such as dwos or o/exe for skeleton. (indeed,
> there's no reliable way to find a skeleton, given a dwo - only to find dwos
> given skeletons)
> 2) llvm-symbolizer: this opens executable files (or .o files) and from
> there can load dwo/dwp/dsym related files as needed
>
> What sort of use case do you have? I guess it can/should look something
> like (2) so can you use the LLVM debug info APIs in a similar manner to
> llvm-symbolizer to achieve your goals?
>
> "
> ------------------------------
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Monday, February 15, 2021 10:09 PM
> *To:* Alexander Yermolovich <ayermolo at fb.com>; Pavel Labath <
> pavel at labath.sk>
> *Cc:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] Extracting LocList address ranges from DWO
> .debug_info
>
> This stuff is a bit ad-hoc at best.
>
> I believe some of these APIs have been generalized enough to be usable
> for your use-case, but it might be at a lower level - specifically I
> think the loclist infrastructure is used by lldb when parsing DWARFv5.
> But it might be used without some of the LLVM DWARF Unit abstractions
> you're using. (those abstractions are used in llvm-dwarfdump - which
> often isn't dealing with both .o and .dwo, but only dumping one of the
> files & doing what it can (or sometimes dumping one file containing
> both sets of sections, in which case it can do some address lookup,
> etc, more conveniently))
>
> On Fri, Feb 12, 2021 at 6:07 PM Alexander Yermolovich via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hello
> >
> > I am wondering if this is a bug, or more likely something I am doing
> wrong/using wrong APIs.
> > I have binary A, and object file A.o, compiled with Clang debug fission
> single mode. So .dwo sections are in the object file. Although with split
> mode it would bre the same behavior.
> > Relevant parts of the code:
> > for (const auto &CU : DwCtx->compile_units()) {
> >       auto *const DwarfUnit = CU.get();
> >       if (llvm::Optional<uint64_t> DWOId = DwarfUnit->getDWOId()) {
> >         auto *CUDWO =
> static_cast<DWARFCompileUnit*>(DwarfUnit->getNonSkeletonUnitDIE(false).getDwarfUnit());
> >         ...
> >       }
> >   }
> >
> > Later in the code I iterate over DIEs for .debug_info.dwo and call
> > DIE.getLocations(dwarf::DW_AT_location);
> >
> > Alternatively can manually extract offset and call
> > CUnit->findLoclistFromOffset(Offset);
> >
> > It fails because it tries to look up address using DWARFUnit in
> NormalUnits that it extracts from A.o.
> > Under the hood vistAsoluteLocationList is called with
> getAddrOffsetSectionItem passed in.
> > Since this DWARFUnit is DWO, it invokes Context.info_section_units().
> Which uses A.o to create DW_SECT_INFO and DW_SECT_EXT_TYPES.
> > Then calls itself, but from the newly constructed Debug DWARFUnit. The
> skeleton CU that is in A.o.
> >
> > Since the way it's constructed the AddrOffsetSectionBase is never set,
> so getAddrOffsetSectionItem returns None. Eventually error is returned from
> high level API call.
> >
> > I ended up doing this to get address ranges:
> >           DWARFLocationExpressionsVector LocEVector;
> >           auto CallBack = [&](const DWARFLocationEntry &Entry) -> bool {
> >             auto StartAddress =
> >                 BaseUnit->getAddrOffsetSectionItem(Entry.Value0);
> >             if (!StartAddress) {
> >               //TODO: Handle Error
> >               return false;
> >             }
> >
> LocEVector.emplace_back(DWARFLocationExpression{DWARFAddressRange{
> >                 (*StartAddress).Address, (*StartAddress).Address +
> Entry.Value1,
> >                 Entry.SectionIndex}, Entry.Loc});
> >             return true;
> >           };
> >
> >           if(Unit->getLocationTable().visitLocationList(&Offset,
> CallBack))
> >             ...
> >
> >
> > But back to original API calls. Are they just not designed to work with
> DWO CUs, or am I missing something?
> >
> > Even if AddrOffsetSectionBase was set to 0, the address section it is
> accessing is in A.o and is not relocated. One would still need to get base
> address from the address from Skeleton CU to get fully resolved address
> ranges, or what I did to use index to access binary .debug_addr section
> directly (with appropriate AddrOffsetSectionBase).
> >
> > Thank You
> > Alex
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210303/3c8003bc/attachment.html>


More information about the llvm-dev mailing list