[llvm-dev] [LLD] Support DWARF64, debug_info "sorting"

Wed Nov 11 17:46:32 PST 2020

On Wed, Nov 11, 2020 at 2:59 PM Fangrui Song <maskray at google.com> wrote:
>
> (Adding back Cc: which got dropped)
>
> > (Igor - I don't know what happened, but your email split the mail thread in gmail for me.)
>
> The problem is that https://lists.llvm.org/pipermail/llvm-dev/2020-November/146528.html does not have an In-Reply-To: header.
> Added Igor to the Cc: list.
>
> If we go down the route (sorting DWARF64 after DWARF32), compared with a
> lightweight parse, I'd prefer the relocation based approach: if a .debug_* has
> an 64-bit absolute relocation type (e.g. R_X86_64_64).
>
> In LLD, for an input section, we don't know its associated SHT_REL[A] section.
> So when adding an orphan section we would have another loop iterating
> over inputSections. We can reuse the dependentSections to have this
> piece of information (generalizing the existing special case for -r/--emit-relocs)
>
> > This way as long as DWARF32 sections don't themselves go over 4gigs, the final binary can contain debug information that exceeds 4gig.
> > Which I think will be the common case.
>
> I would not expect the linker behaves differently when linking a few additional sections change the behavior so drastically
> in a not-easily-explainable way. This deserves a dedicated linker option (see below, I have a concern about the inconsistency
> with an input section description)
>
> On 2020-11-11, Robinson, Paul via llvm-dev wrote:
> >
> >
> >> -----Original Message-----
> >> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of David
> >> Blaikie via llvm-dev
> >> Sent: Wednesday, November 11, 2020 12:46 PM
> >> To: James Henderson <jh7370.2008 at my.bristol.ac.uk>
> >> Cc: llvm-dev at lists.llvm.org
> >> Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"
> >>
> >> On Wed, Nov 11, 2020 at 12:55 AM James Henderson
> >> <jh7370.2008 at my.bristol.ac.uk> wrote:
> >> >
> >> >
> >> >
> >> > On Wed, 11 Nov 2020 at 05:41, David Blaikie <dblaikie at gmail.com> wrote:
> >> >>
> >> >> +James for context too (always good to include the folks from the
> >> >> original threads for continuity)
> >> >>
> >> >> Yeah, my general attitude there was just twofold, one that the
> >> >> discussion had strayed fairly far from the review (so interested
> >> >> parties might not see it, both because it's a targeted review thread
> >> >> on the noisy llvm-commits, and because fo the title not having much
> >> >> connection to the discussion) and it seemed to be somewhat
> >> >> abstract/general - and there's a balance there. "We should do this
> >> >> because I need it" (we shouldn't be implementing features for
> >> >> especially niche use cases/if they don't generalize) isn't always a
> >> >> compelling motivation but "we should do this because someone might
> >> >> need it" isn't either (we shouldn't be implementing features that have
> >> >> no users).
> >> >>
> >> >> The major drawback in sorting, is the need to parse DWARF, even a
> >> >> little bit of it (only the first 4 bytes of a section to tell which
> >> >> version it is - first 12 if you want to be able to jump over
> >> >> contributions and check /all/ contributions coming from a given input
> >> >> object file (it might contain a combination of DWARFv4 and DWARFv5)
> >> >> and then the hairy uncertainty of which sections to check (do you
> >> >> check them all? well, all the ones with length prefixes that
> >> >> communicate DWARF32/64 - some sections don't
> >> >> (debug_ranges/loc/str/macro for instance, if I recall correctly)...
> >> >> and if something has some 4 and 5, does it get sorted to the start? I
> >> >> guess so.
> >> >>
> >> > I assume this comment is meant to say DWARF32/DWARF64, not DWARFv4 and
> >> DWARFv5, as the DWARF version (as opposed to the 32/64 bit style) is
> >> irrelevant to this, I believe, at least for the current known DWARF
> >> standards.
> >>
> >> Yep! thanks for the correction - had a lot of DWARFv4/v5 on my mind
> >> due to other work, so got the terms jumbled up.
> >>
> >> >  Whilst the majority of objects will only have a single CU in them,
> >> there will be exceptions (LTO-generated objects, -r merged objects etc),
> >> so we do need to consider this approach. Mixtures would certainly be
> >> possible, and there's no guarantee the CUs would be in a nice order with
> >> 32-bit blocks before 64-bit blocks. If I follow this to its full
> >> conclusion, you could potentially end up with a single .debug_info
> >> (.debug_line, .debug_rnglists etc) input section with a mixture of
> >> DWARF32/DWARF64 sub-sections, which, if following the reordering approach,
> >> the linker might have to split up internally in order to rearrange (aside
> >> - there's some interesting crossover with ideas I've been considering
> >> regarding the Fragmented DWARF topic discussed elsewhere).
>
> I'm still learning the internals but would expect that mixed DWARF32/DWARF64 is
> a problem for LTO. A reloctable link (-r) can combine DWARF32/DWARF64 object
> files and potentially nullify the aforementioned relocation based approach
> (we probably just want to check the first relocation to save time;
> if we link DWARF64 before DWARF32 we may create a .debug_info
> which looks like DWARF64 but is actually restricted by DWARF32 relocations)
>
> >> I think given this is a pragmatic feature I'd be inclined to say "eh,
> >> sort any input object containing at least one DWARFv4 contribution
> >> before input objects not containing any v4 contribution" - if that
> >> doesn't solve some real world issues/situations, I'd be willing to
> >> revisit this direction/consider more invasive/expensive solutions.
> >
> >I was under the impression that *object* order meant a lot to people,
> >and changing that would have all sorts of unpleasant fallout.  If I'm
> >remember that correctly, sorting DWARF sections really should be its
> >own thing, separate from object order.  Shoving DWARF-64 sections to
> >the end of the line seems like it would be less problematic than
> >reordering entire objects, if the linker can handle that in some
> >reasonably efficient way.
> >--paulr
>
> This behavior does add some inconsistency to the system:
>
> For an output section description .debug_info 0 : { *(.debug_info) } ,
> should the linker sort DWARF32 and DWARF64 components? It it does, the behavior
> will be inconsistent with other input section descriptions *(foo)
>
> If there is a magic keyword, say, SORT_BY_MAGIC_DEBUG, and the internal
> linker script does something similar to
>
>    *(SORT_BY_MAGIC_DEBUG(.debug_info))
>
> then the system is still consistent.
>
> >>
> >> Though, as Eric said - some of this conversation might be better had
> >> in terms of concrete patches with concrete performance measurements.
> >>
> >> > Maybe the solution here would be to change producers to produce separate
> >> .debug_info sections containing DWARF32 and DWARF64.
> >>
> >> That'd involve changing how certain objects were generated - if that's
> >> possible, then I assume it'd be possible to change that generation to
> >> use DWARF64 anyway - in the limit: one might have precompiled binaries
> >> with debug info that one cannot recompile, so any new format options I
> >> doubt are able to address the original/likely use case for this
> >> functionality.
> >>
> >> > I used the -u option more as an example that it might be possible to get
> >> things to work the way we want without needing to have the linker do the
> >> work. The linker currently has a --symbol-ordering-file option which can
> >> be used to request an order for the specified list of symbols. The linker
> >> does this by rearranging the input sections to get as close as it can to
> >> the requested order. We could maybe implement the same on a file/section
> >> basis. It would avoid needing to read the sections themselves, but doesn't
> >> solve the "what to do about mixed single input" case directly (though
> >> might allow the user to dodge the decision at least).
>
> Yeah, --symbol-ordering-file applies on both global and local symbols.
> Unfortunately no symbols are defined relative to .debug_* sections
> (if we don't consider the STT_SECTION symbols, which cannot be used
> anyway because .debug_* do not have unique names).
>
> (The usage of -u still requires the user to add archives (they want to
> change order) before other object files. In LLD this requires https://reviews.llvm.org/D81052 )
>
> >> > Other ideas I had involved changing the section header properties.
> >> Currently DWARF sections are all SHT_PROGBITS, but we could change that to
> >> e.g. SHT_DWARF_32 or similar, and/or use the sh_info field to contain a
> >> value that would indicate the 32/64 bit nature. I'm not convinced by these
> >> ideas though, as a) I don't know if it translates well to other non-ELF
> >> formats, and b) we can't really control the producers of DWARF at this
> >> stage to conform.
>
> Inventing a new section type is not bad at a first glance. Leveraging it
> can remove the inconsistency in the system as well.
> Unfortunately linker scripts (as implemented by GNU ld and emulated by LLD)
> don't provide a way to match input sections by section type.
>
> If we are going to have many thoughts on the linker side design, might
> be worth asking on https://groups.google.com/g/generic-abi as well.
> That would have to a separate discussion because the list is moderated
> and users who haven't joined the group cannot reply there. If there are
> opinions, we can share them with llvm-dev.

I'm not sure/don't think this rises to that level - if a user is able
to regenerate their object files with some new object
feature/flag/attribute/etc, then they are probably able to generate
them with DWARF64. So this seems more about a linker doing something
that might help users who have DWARF32 backed into some precompiled
objects/libraries/things they otherwise can't change the way it's
built. So it seems to me it's more a linker-doing-something-nice than
linker/object files defining a new mode of interaction.

- Dave

>
> >> > It would be nice if there was a solution that could be consistently
> >> applied across all build systems, linkers and DWARF producers. I don't
> >> have one as yet though.
> >> >
> >> >>
> >> >> On Tue, Nov 10, 2020 at 9:30 PM Eric Christopher via llvm-dev
> >> >> <llvm-dev at lists.llvm.org> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Nov 11, 2020 at 12:19 AM Alexander Yermolovich via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> >> >>
> >> >> >> This year Igor Kudrin put in a lot of work in enabling DWARF64
> >> support in LLVM. At Facebook we are looking into it as one of the options
> >> for handling debug information over 4gigs in production environment. One
> >> concern is that due to mix of third party libraries and llvm compiled code
> >> the final library/binary will have a mix of CU that are DWARF32/64. This
> >> is supported by DWARF format. With this mix it is possible that even with
> >> DWARF64 enabled one can still encounter relocation overflows errors in LLD
> >> if DWARF32 sections happen to be processed towards the end.
> >> >> >>
> >> >> >> One proposal that was discussed in
> >> https://urldefense.com/v3/__https://reviews.llvm.org/D87011__;!!JmoZiZGBv3
> >> RvKRSx!pnSYzjQly_yuEU-ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9Ue7WdoQ$
> >> , is to modify LLD linker to arrange debug_info sections so that DWARF32
> >> comes first, and DWARF64 after them. This way as long as DWARF32 sections
> >> don't themselves go over 4gigs, the final binary can contain debug
> >> information that exceeds 4gig. Which I think will be the common case.
> >> >> >>
> >> >> >> An alternative approach that was proposed by James Henderson is for
> >> build system to take care of it, and to use -u to enforce order.
> >> >> >
> >> >> >
> >> >> > +Fangrui Song here for thread visibility
> >> >> >
> >> >> > Of these two approaches I think that the linker sorting is probably
> >> the one I'd go with for the reasons you list below - I'm particularly
> >> sympathetic to not wanting the unintended consequences of using -u here :)
> >> >> >
> >> >> > I do worry about slowing down general debug links so a "debug info
> >> sorting" option may make sense, or it may not be worth it after measuring
> >> the speed difference.
> >> >> >
> >> >> > Thanks for bringing this up on the list! :)
> >> >> >
> >> >> > -eric
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> As, I would imagine, most projects of scale are using configurable
> >> build system that pulls in all the various dependencies automatically in a
> >> multi-language environment. I think the alternative approach will be more
> >> fragile than modifying LLD as it relies on a more complex system, and each
> >> customer of LLD will have to implement this "sorting" in their own build
> >> systems. The use of -u also kind of abuses this flag, and might have
> >> unintended consequences. As was pointed out by Wen Lei.
> >> >> >> From overhead perspective we only need to access few bytes of DWARF
> >> to determine if it's 32 or 64 bits. Customers who need DWARF64, already
> >> accept the overhead that it entails.
> >> >> >>
> >> >> >> Any thoughts?
> >> >> >>
> >> >> >> Thank You
> >> >> >> Alex
> >> >> >> _______________________________________________
> >> >> >> LLVM Developers mailing list
> >> >> >> llvm-dev at lists.llvm.org
> >> >> >> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> >> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
> >> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$
> >> >> >
> >> >> > _______________________________________________
> >> >> > LLVM Developers mailing list
> >> >> > llvm-dev at lists.llvm.org
> >> >> > https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> >> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
> >> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> >> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
> >> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$
> >_______________________________________________
> >LLVM Developers mailing list
> >llvm-dev at lists.llvm.org
> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev