[llvm-dev] Switch to ld.bfd tombstone behavior by default
Petr Hosek via llvm-dev
llvm-dev at lists.llvm.org
Wed Jul 29 22:46:23 PDT 2020
I can weigh in as a maintainer of the Clang toolchain on a project
(Fuchsia) that lives at HEAD. We are still dealing with the fallout of this
change and our toolchain roll has been blocked since it landed, while
normally we would roll the toolchain every week. All the issues in Breakpad
that were affecting us seem to have been addressed, but there are issues in
other tools that we still haven't addressed and we're still debugging. This
has been difficult because each of the issues we found so far have
manifested differently and we had to do bisect multiple times, because we
weren't sure whether it was introduced by this one or some other change.
Furthermore, as the time since our last roll increases while we're blocked,
the range over which we have to bisect is also extending making this
process more and more time consuming.
Fangrui suggested passing the appropriate flag in our build, but that's
just infeasible for us. I cannot go into details, but there are too many
projects, too many repositories, too many build systems involved. Doing
such a coordinated change would likely take longer than addressing all the
issues. We have considered passing the flag in the Fuchsia driver
<https://reviews.llvm.org/D84482> (which we control) but this doesn't
entirely address the issue because some of the failures we're seeing are in
the host binaries (we use the same toolchain for both the host and the
target) so we would need to make the same change in the Linux driver as
well.
I agree that this change is the right long-term direction, but I wish this
was communicated better and it was done as an opt-in rather than opt-out to
give us enough time to iron out all the issues without the stress.
On Wed, Jul 29, 2020 at 6:32 PM Fāng-ruì Sòng via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 2020-07-29, Eric Christopher wrote:
> >I think the arguments are largely compatibility for software that's
> already
> >deployed and can't easily upgrade, and wanting to ensure a larger time
> >frame for migration with a fallback if things go wrong. A bridge basically
> >from what we had to where we'd like to be.
> >
> >I think we also need to make the change in mainline lld as well - as you
> >might know it's occasionally easier to upgrade a linker than to upgrade a
> >debugger. I don't have any strong opinions on what we should name the
> >command line option, but something that makes it unnecessary to actually
> >know the values you want to put in would make it easier.
> >-debug-comdat-relocation-filler=compat, bfd, <something> (yes, that's not
> >great, but it's at least a start :)
> >
> >-eric
>
> .debug_* => 0 looks good to me and I'd like to make the change because some
> users in this thread expressed preference, while some are bystanders, and
> I don't want
> to be the only one on the side opposite to the side of users expressing
> preferece.
>
> An argument I have in mind but haven't raised is that I don't want
> releases or
> HEAD to change behaviors too many times. This was one argument I did not
> want
> .debug_* => 0 for the upcoming release and .debug_* => -1 for a next one:
> there are 2
> behavior changes rather than one: old LLD: addend, LLD 11: 0, LLD 11: -1.
>
> >I think we also need to make the change in mainline lld as well - as you
> >might know it's occasionally easier to upgrade a linker than to upgrade a
>
> This is now a new request about HEAD behavior. I would honestly be a bit
> concerned about making such kind of back-and-forth change. This would
> invalidate
> some comments people have already contributed to some projects (LLD since
> D81784
> use -1 ......) Do we know who are the other LLD HEAD users? I still want to
> remind people that so far the only problematic behaviors were caused by
> .debug_line: -1, not any other .debug_*. With lack of more input, I'd want
> to
> avoid changes mitigating hypothetical issues. It is possible that making
> more
> revert-restore changes could make the situation more confusing.
> Ultimately,
> this is an open source project and we make changes for the interests of
> thecommunity. Maybe it is time for more folks to share opinions?
>
> If I had to make a name, I would call it --dead-reloc-in-debug={bfd,new},
> not -z
> as we don't expect GNU ld to adopt. The option comes with a large notice
> that
> this option will go away. Moreover, the user has to look up the semantics
> of
> 'bfd'. And the 'bfd' semantics may get stale if GNU ld updates its
> behaviors.
>
> We can also take it from another perspective, do the opt-in options look
> intimidating just because they are too long?
>
> -z 'dead-reloc-in-nonalloc=.debug_ranges=0xfffffffffffffffe'
> -z 'dead-reloc-in-nonalloc=.debug_loc=0xfffffffffffffffe'
> -z 'dead-reloc-in-nonalloc=.debug_*=0xffffffffffffffff'
>
> We can support signed integers:
>
> -z 'dead-reloc-in-nonalloc=.debug_ranges=-2'
> -z 'dead-reloc-in-nonalloc=.debug_loc=-2'
> -z 'dead-reloc-in-nonalloc=.debug_*=-1'
>
> Three options seem too long?
>
> We can support comma separated list:
>
> -z 'dead-reloc-in-nonalloc=.debug_ranges=-2,.debug_loc=-2,.debug_*=-1'
>
> I'm glad that people raised a request for a easy-to-toggle option, but I'd
> only
> be more comfortable accepting it if there are sufficiently strong reasons
> for
> it, with the acknowledgement that the option will be another GNU
> ld/gold/LLD
> difference (-z dead-reloc-in-nonalloc= has somewhat a larger chance that
> it can
> be adopted by other linkers) and will go away in two or three releases.
>
> >On Tue, Jul 28, 2020 at 10:00 PM Fāng-ruì Sòng <maskray at google.com>
> wrote:
> >
> >> Created https://reviews.llvm.org/D84825 to be used for release/11.x
> >>
> >> I haven't seen a strong argument for changing other .debug_* but in
> >> any case I don't want to continue debating on this topic.
> >>
> >> * .debug_ranges & .debug_loc: -2 (lld<11: 0+addend)
> >> * .debug_*: 0 (lld<11: 0+addend, lld HEAD: -1)
> >>
> >> On Mon, Jul 27, 2020 at 12:47 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Jul 27, 2020 at 9:11 AM Robinson, Paul <
> paul.robinson at sony.com>
> >> wrote:
> >> >>
> >> >> > I still think that we do bfd locs with a decent option to change
> for
> >> at least the current release and sources and then, once we're a little
> more
> >> certain we have everything that might want to parse dwarf (say by
> working
> >> with dwarf-discuss), we can change the default.
> >> >>
> >> >>
> >> >>
> >> >> Given what’s been found, I think Eric/Dave are correct, use bfd
> >> behavior by default with an option to do the new thing. The option can
> be
> >> coded to let Sony (SCE debugger tuning) have it on by default.
> >> >>
> >> >>
> >> >>
> >> >> Bringing it up on dwarf-discuss seems like a good idea too, you can
> >> refer to http://dwarfstd.org/ShowIssue.php?issue=200609.1 as part of
> that.
> >> >
> >> >
> >> > Thanks for the pointer! Posted here:
> >>
> http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2020-July/004683.html
> >> >
> >> >>
> >> >> --paulr
> >> >>
> >> >>
> >> >>
> >> >> From: Eric Christopher <echristo at gmail.com>
> >> >> Sent: Saturday, July 25, 2020 12:56 AM
> >> >> To: Fāng-ruì Sòng <maskray at google.com>
> >> >> Cc: Hans Wennborg <hans at chromium.org>; llvm-dev at lists.llvm.org;
> Alexey
> >> Lapshin <a.v.lapshin at mail.ru>; George Rimar <grimar at accesssoftek.com>;
> >> Robinson, Paul <paul.robinson at sony.com>; Adrian Prantl <
> aprantl at apple.com>;
> >> Jonas Devlieghere <jdevlieghere at apple.com>; James Henderson <
> >> jh7370.2008 at my.bristol.ac.uk>; Peter Smith <Peter.Smith at arm.com>; David
> >> Blaikie <dblaikie at gmail.com>; Pavel Labath <pavel at labath.sk>
> >> >> Subject: Re: [llvm-dev] Switch to ld.bfd tombstone behavior by
> default
> >> >>
> >> >>
> >> >>
> >> >> Hi Ray :)
> >> >>
> >> >>
> >> >>
> >> >> While I understand the desire to say "we've updated everything we've
> >> found" the problem is that we don't know that we've found even much of
> the
> >> uses or can guarantee that people can upgrade, say their gdb or
> breakpad,
> >> as fast as their compiler. Even worse it's a change in behavior without
> >> much notice at all or even a particularly friendly way to recognize what
> >> they should do to fix it.
> >> >>
> >> >>
> >> >>
> >> >> I still think that we do bfd locs with a decent option to change for
> at
> >> least the current release and sources and then, once we're a little more
> >> certain we have everything that might want to parse dwarf (say by
> working
> >> with dwarf-discuss), we can change the default.
> >> >>
> >> >>
> >> >>
> >> >> -eric
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Jul 24, 2020 at 7:33 PM Fāng-ruì Sòng <maskray at google.com>
> >> wrote:
> >> >>
> >> >> From my understanding the breakpad bug was also only related to
> >> .debug_line and has been fixed by
> >> https://chromium-review.googlesource.com/c/breakpad/breakpad/+/2317730
> >> >>
> >> >>
> >> >>
> >> >> > a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_*
> >> -> -1
> >> >> > b) .debug_ranges&.debug_loc => -2, other .debug_* => 0
> >> >>
> >> >>
> >> >>
> >> >> I am still of the opinion that we should just do a), not b).
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Jul 24, 2020 at 9:33 AM Hans Wennborg <hans at chromium.org>
> >> wrote:
> >> >>
> >> >> On Fri, Jul 24, 2020 at 5:12 PM Fangrui Song <maskray at google.com>
> >> wrote:
> >> >> >
> >> >> > On 2020-07-24, Hans Wennborg via llvm-dev wrote:
> >> >> > >Sounds good to me from a release perspective.
> >> >> >
> >> >> > I think we need more input from the triage of
> >> >> > https://chromium-review.googlesource.com/c/chromium/src/+/2291352
> >> >> > whether it is just .debug_line or .debug_*
> >> >>
> >> >> The issue we hit in Chromium is tracked here:
> >> >> https://bugs.chromium.org/p/chromium/issues/detail?id=1105559
> Doesn't
> >> >> look like it has any more info at the moment.
> >> >>
> >> >> >
> >> >> > >On Fri, Jul 24, 2020 at 7:53 AM Eric Christopher via llvm-dev
> >> >> > ><llvm-dev at lists.llvm.org> wrote:
> >> >> > >>
> >> >> > >> Hi All,
> >> >> > >>
> >> >> > >> In general I think we should adopt Dave's plan here. The number
> of
> >> consumers that can (and have) been caught off guard by this change is
> just
> >> too high.
> >> >> > >>
> >> >> > >> At the very least I think we should move this to opt in to the
> new
> >> tombstoning behavior by default and at most migrate to bfd's behavior
> for
> >> both the current release and in the current tree. If we want to make
> this
> >> sort of change in the future by default I think we're going to need to
> >> provide release notes about this and do aggressive outreach towards the
> >> consumers we do know before making the change. If anyone wants to drive
> >> that effort I'll happily provide any help or assistance in getting you
> in
> >> touch with at least the consumers I know about.
> >> >> > >>
> >> >> > >> This change and the need for it is also probably worth a quick
> >> message to dwarf-discuss at dwarfstd.org as well. Most of the major
> >> consumers and producers are on that list and it's probably one of the
> >> easiest ways to get this change out there.
> >> >> > >>
> >> >> > >> Any strong objections to this path in the meantime?
> >> >> > >>
> >> >> > >> Thanks!
> >> >> > >>
> >> >> > >> -eric
> >> >> > >>
> >> >> > >> On Tue, Jul 21, 2020 at 12:18 PM David Blaikie via llvm-dev <
> >> llvm-dev at lists.llvm.org> wrote:
> >> >> > >>>
> >> >> > >>> On Tue, Jul 21, 2020 at 5:34 AM Alexey Lapshin
> >> >> > >>> <alapshin at accesssoftek.com> wrote:
> >> >> > >>> >
> >> >> > >>> >
> >> >> > >>> > >On Mon, Jul 20, 2020 at 10:32 AM Alexey Lapshin
> >> >> > >>> > ><alapshin at accesssoftek.com> wrote:
> >> >> > >>> > >>
> >> >> > >>> > >> >On Fri, Jul 17, 2020 at 1:55 PM Alexey Lapshin <
> >> a.v.lapshin at mail.ru> wrote:
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> >Пятница, 17 июля 2020, 19:42 +03:00 от David Blaikie <
> >> dblaikie at gmail.com>:
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >On Fri, Jul 17, 2020 at 12:03 AM Fangrui Song <
> >> maskray at google.com> wrote:
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> Thanks for the write-up!
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> On 2020-07-16, David Blaikie wrote:
> >> >> > >>> > >> >> >> >In short: Perhaps we should switch lld to the
> >> bfd-style tombstoning
> >> >> > >>> > >> >> >> >behavior for a release or two, letting users opt-in
> >> to testing with the new
> >> >> > >>> > >> >> >> >-1/-2 tombstoning in the interim, before switching
> to
> >> the new tombstone by
> >> >> > >>> > >> >> >> >default (while still having the flag to switch back
> >> when users find
> >> >> > >>> > >> >> >> >surprise places that can't handle the new
> behavior).
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >In long:
> >> >> > >>> > >> >> >> >https://reviews.llvm.org/D81784 and follow-on
> >> patches modified the behavior
> >> >> > >>> > >> >> >> >of lld with regards to resolving relocations from
> >> debug sections to dead
> >> >> > >>> > >> >> >> >code (either comdat deduplicated, or gc-sections
> use).
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >A very quick summary of the situation:
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >Original Behavior:
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> > - bfd: 1 for debug_ranges(0 would prematurely
> >> terminate the list), 0
> >> >> > >>> > >> >> >> > elsewhere
> >> >> > >>> > >> >> >> > - gold/lld: 0+addend everywhere
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >Limitations/bugs:
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> > - bfd/gold/lld
> >> >> > >>> > >> >> >> > - doesn't support 0 as a valid executable address
> >> without ambiguities
> >> >> > >>> > >> >> >> > - gold/lld
> >> >> > >>> > >> >> >> > - ambiguities with large gc'd functions combined
> >> with a .text mapping
> >> >> > >>> > >> >> >> > that starts in relative low addresses
> >> >> > >>> > >> >> >> > - premature debug_range termination with
> zero-length
> >> functions (Clang
> >> >> > >>> > >> >> >> > produces these with __builtin_unreachable or
> >> non-void return
> >> >> > >>> > >> >> >> >type functions
> >> >> > >>> > >> >> >> > without a return statement)
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >New behavior:
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a
> >> base address specifier
> >> >> > >>> > >> >> >> > there)
> >> >> > >>> > >> >> >> > - -1 elsewhere
> >> >> > >>> > >> >> >> > - linker flag to customize to other values if
> desired
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >Known issues:
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> > - lldb's line table parsing can't handle -1 well
> at
> >> all (essentially
> >> >> > >>> > >> >> >> > unusable)
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> Pavel Labath will fix this soon
> >> https://reviews.llvm.org/D83957
> >> >> > >>> > >> >> >> This is an unhandled address-space wraparound
> problem.
> >> >> > >>> > >> >> >> This pattern is potentially common - and other
> >> downstream DWARF
> >> >> > >>> > >> >> >> consumers might make similar line table handling
> >> mistakes.
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >That's the thing - I'm not sure we can really classify
> >> them as
> >> >> > >>> > >> >> >"mistakes". I think bfd.ld's tombstoning behavior is
> >> about the only
> >> >> > >>> > >> >> >thing we can reasonably say DWARF consumers should
> >> /probably/ be
> >> >> > >>> > >> >> >expected to handle - and I'd imagine in many cases
> they
> >> haven't been
> >> >> > >>> > >> >> >written intentionally to handle it, but whatever
> >> behavior they have
> >> >> > >>> > >> >> >has accidentally been sufficient for their needs.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> >> > - gdb's line table parsing ends up with different
> >> handling when breaking
> >> >> > >>> > >> >> >> > on gc'd functions (minor functionality issue)
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> This is just a behavior difference, not affecting
> >> users.
> >> >> > >>> > >> >> >> It did break a test if linked with LLD (gdb
> >> intrinsically has lots of
> >> >> > >>> > >> >> >> failing tests even if built with GCC+GNU ld).
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> Previous behavior (when an address is zero): a
> >> breakpoint on a
> >> >> > >>> > >> >> >> --gc-sections discarded function will be redirected
> to
> >> a larger line
> >> >> > >>> > >> >> >> number with debug info, even if that line can be an
> >> unrelated different
> >> >> > >>> > >> >> >> function.
> >> >> > >>> > >> >> >> New behavior is that the breakpoint is on a
> >> wrapped-around small address.
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> GDB 9.3 will restore the previous behavior
> >> >> > >>> > >> >> >> (
> >>
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510
> >> )
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >I think there's enough risk in this work (even
> given
> >> the small number of
> >> >> > >>> > >> >> >> >bugs found so far), given there's a pretty wide
> array
> >> of debug info
> >> >> > >>> > >> >> >> >consumers out there, that we should change lld's
> >> default to match the
> >> >> > >>> > >> >> >> >long-lived bfd strategy. This would address my
> >> original motivation for
> >> >> > >>> > >> >> >> >raising all this (empty functions prematurely
> >> terminating the list), while
> >> >> > >>> > >> >> >> >letting users who want to experiment with it, or
> need
> >> it (like Alexey), can
> >> >> > >>> > >> >> >> >opt-in to the -1/-2 behavior.
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> I think we can only confidently say that there is
> >> enough risk in using
> >> >> > >>> > >> >> >> tombstone value -1 in .debug_line, but I'd not say
> >> tombstone value -1 in
> >> >> > >>> > >> >> >> other .debug_* can cause problems.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> >Given how many DWARF parsers we've had to cleanup or
> >> migrate off in
> >> >> > >>> > >> >> >transitioning to DWARFv5 inside Google, I think
> that's a
> >> fair bit of
> >> >> > >>> > >> >> >evidence the set of parsers isn't as narrow/closed as
> >> we'd like and
> >> >> > >>> > >> >> >thus the number of places that might have issues isn't
> >> known/easily
> >> >> > >>> > >> >> >exhaustively tested. So I'm not so concerned about the
> >> bugs we've
> >> >> > >>> > >> >> >seen, but what that might indicate about the things
> that
> >> we don't know
> >> >> > >>> > >> >> >about and can't test (because we don't know about
> them).
> >> (of course,
> >> >> > >>> > >> >> >the flipside of that is that if we don't know about
> >> them, we can't
> >> >> > >>> > >> >> >tell people who own them to go check if they work with
> >> this opt-in
> >> >> > >>> > >> >> >feature - so they'll break whenever we turn this on by
> >> default - but
> >> >> > >>> > >> >> >perhaps in the interim we can get at least a few big
> LLD
> >> customers to
> >> >> > >>> > >> >> >deploy the feature and flush out some of the issues -
> >> happy for Google
> >> >> > >>> > >> >> >to use this internally, hopefully we can encourage
> Apple
> >> folks, Sony
> >> >> > >>> > >> >> >of course already has this semantic so nothing to find
> >> there most
> >> >> > >>> > >> >> >likely - Chromium, maybe Firefox, etc)
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> From the other side — when we already switched new
> >> behavior ON as default —
> >> >> > >>> > >> >> then it is easier to discover all these unknown cases.
> We
> >> have an option restoring
> >> >> > >>> > >> >> old behavior(which should be fine for all of the
> current
> >> users). Thus, everybody
> >> >> > >>> > >> >> who needs old behavior is able to continue using it.
> >> >> > >>> > >>
> >> >> > >>> > >> >The cost to those users figuring out that it's a problem
> >> and finding
> >> >> > >>> > >> >the flags/adding them, etc, is non-zero.
> >> >> > >>> > >> >But, yes, as I said - eventually some of that will
> happen,
> >> sooner or
> >> >> > >>> > >> >later. But reducing how much of it happens is valuable
> too.
> >> >> > >>> > >>
> >> >> > >>> > >> I agree. Reducing the impact on customers is important.
> >> >> > >>> > >> Though in this case, as you said, it would be done
> >> anyway(sooner or later).
> >> >> > >>> > >> Finally, I do not know what is more important: a) get
> >> feedback faster with
> >> >> > >>> > >> the cost of involved users and requiring them to make
> efforts
> >> >> > >>> > >> to figure out the problem. or b) test with a smaller set
> of
> >> users,
> >> >> > >>> > >> find more problematic cases, make new tombstone value to
> be
> >> default later.
> >> >> > >>> > >> I am kind of preferring a). But, b) is more safe and is
> OK.
> >> >> > >>> > >>
> >> >> > >>> > >>
> >> >> > >>> > >> >> That makes transition more understandable. All problems
> >> would be explicitly seen. Then,
> >> >> > >>> > >> >> If we will know about new problems - we could adapt the
> >> current solution. Similar to this >suggestion:
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> a) .debug_ranges&.debug_loc => -2, .debug_line => 0,
> >> other .debug_* -> -1
> >> >> > >>> > >>
> >> >> > >>> > >> >The issue I have with this is that it's reflective of the
> >> known
> >> >> > >>> > >> >breakage in some tools that happen to have a quick
> >> turnaround/feedback
> >> >> > >>> > >> >- and assumes all the other sections are significantly
> less
> >> likely to
> >> >> > >>> > >> >have problems with this format - and I don't think we
> have
> >> enough data
> >> >> > >>> > >> >to make that estimate.
> >> >> > >>> > >>
> >> >> > >>> > >> We could extend that rule when/if new problems become
> known.
> >> >> > >>> > >> Or, customers could fix their code or use option returning
> >> old behavior
> >> >> > >>> > >> and it would probably not be necessary to extend the rule.
> >> >> > >>> > >>
> >> >> > >>> > >> >I think having a clear flag for bfd behavior or new
> >> behavior would be
> >> >> > >>> > >> >good, keeping it bfd-by-default for now, evangelizing
> >> (patch notes,
> >> >> > >>> > >> >outreach to other projects and tool developers) the new
> >> functionality
> >> >> > >>> > >> >& then changing the default shortly after an LLVM release
> >> (perhaps 12
> >> >> > >>> > >> >- but perhaps longer) to give it time to bake for
> >> trunk-following
> >> >> > >>> > >> >users (ample time for them to pick it up on their
> schedule,
> >> see what
> >> >> > >>> > >> >other tools might break, whether they're isolated enough
> to
> >> expect a
> >> >> > >>> > >> >quick turnaround or whether we should get them updated
> and
> >> then wait a
> >> >> > >>> > >> >while longer.
> >> >> > >>> > >>
> >> >> > >>> > >> yes, doing clear flag for bfd behavior or new behavior,
> >> >> > >>> > >> and keep it bfd-by-default for now would be OK and is more
> >> safe.
> >> >> > >>> > >>
> >> >> > >>> > >>
> >> >> > >>> > >> >> This would help while dwarf users are preparing their
> >> code to the new solution.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> Is option restoring old behavior not enough to solve
> >> problems caused by new tombstone >value?
> >> >> > >>> > >>
> >> >> > >>> > >> >I don't personally think it is enough - given some of the
> >> known
> >> >> > >>> > >> >breakage - I think that points to a fair bit of
> reasonably
> >> possible
> >> >> > >>> > >> >unknown breakage. Having a few big (like Apple and
> Google)
> >> users
> >> >> > >>> > >> >switch over by default & potentially flush out a few more
> >> issues (or
> >> >> > >>> > >> >not -but build confidence that there aren't more issues)
> I
> >> think is an
> >> >> > >>> > >> >appropriate way to roll out a change like this.
> >> >> > >>> > >>
> >> >> > >>> > >> >(minor doubt: I wonder how well the bfd tombstoning works
> >> in DWARFv5
> >> >> > >>> > >> >(rnglists/loclists) or in debug_range/debug_loc that uses
> >> base address
> >> >> > >>> > >> >specifiers, where the zero-without-addend doesn't have a
> >> chance to
> >> >> > >>> > >> >make an empty range (because it's not a start/end pair,
> >> it's a
> >> >> > >>> > >> >start+offset pair - so the offset remains non-zero)... if
> >> bfd
> >> >> > >>> > >> >tombstoning breaks DWARFv5 parsing (or base address
> >> specifiers in
> >> >> > >>> > >> >range/loc) in common consumers I might be more inclined
> to
> >> support
> >> >> > >>> > >> >enabling new tombstoning by default sooner (though if we
> >> could enable
> >> >> > >>> > >> >it just for DWARFv5 that might be nice - but not
> practical,
> >> since the
> >> >> > >>> > >> >linker doesn't know what DWARF it's linking & could be
> >> linking
> >> >> > >>> > >> >multiple versions))
> >> >> > >>> > >>
> >> >> > >>> > >> - From the point of view overlapping address ranges,
> >> >> > >>> > >> both bfd-solution(using 1) and old lld solution(using 0)
> are
> >> equal.
> >> >> > >>> >
> >> >> > >>> > >gold and lld's solution was a bit more complicated than
> using
> >> zero (&
> >> >> > >>> > >bfd's solution uses 1 in debug_ranges, but not in other
> debug
> >> >> > >>> > >sections). lld and gold use 0+addend in ranges.
> >> >> > >>> >
> >> >> > >>> > right. I am talking about the same(though assumed that
> context
> >> >> > >>> > instead of detailed explanation). it should be
> >> "bfd-solution(using 1
> >> >> > >>> > for debug_ranges) and lld/gold solution(using 0+addend)".
> >> >> > >>> > Thank you for pointing into that.
> >> >> > >>> >
> >> >> > >>> > >0 alone in ranges
> >> >> > >>> > >would be more problematic than any of the other deployed
> >> solutions,
> >> >> > >>> > >because it would lead to the debug_ranges contributions
> being
> >> >> > >>> > >terminated early (0, 0 terminates the list - so debug_ranges
> >> for a CU
> >> >> > >>> > >with one gc'd function mid-way through the range list would
> >> drop the
> >> >> > >>> > >rest of the range lit) - that's why bfd uses 1 for
> >> debug_ranges rather
> >> >> > >>> > >than 0 which it uses elsewhere (though the same fix should,
> >> >> > >>> > >technically, be applied to debug_loc too for the same
> reasons
> >> as
> >> >> > >>> > >debug_ranges). the 0+addend approach of gold/lld works OK
> >> (except for
> >> >> > >>> > >the low address overlap) except when there's a zero-length
> >> function,
> >> >> > >>> > >then 0+addend == 0 for the end of the range, andy ou get a
> 0,0
> >> entry
> >> >> > >>> > >with the premature range list termination issues.
> >> >> > >>> > >
> >> >> > >>> > >> for the "start+offset pair" case they could result in
> >> overlapping address ranges.
> >> >> > >>> >
> >> >> > >>> > >Not quite - the way bfd does things, at least in
> debug_ranges
> >> without
> >> >> > >>> > >base address specifiers (bit hard to isolate for a consumer,
> >> really -
> >> >> > >>> > >though they could special case zero base addresses... sort
> of)
> >> the
> >> >> > >>> > >empty ranges are (1, 1) - they're actually empty. Whereas
> lld's
> >> >> > >>> > >approach ends up with (0+addend, 0+addend) which is usually
> >> (0, size)
> >> >> > >>> > >which has issues with overlap on systems that have a valid
> low
> >> address
> >> >> > >>> > >range (or if you have sufficiently large functions).
> >> >> > >>> > >
> >> >> > >>> >
> >> >> > >>> > There is a case when size of address range is set not by
> >> relocated value but by constant value:
> >> >> > >>> >
> >> >> > >>> > DWARF5:
> >> >> > >>> > DW_RLE_startx_length
> >> >> > >>> > DW_RLE_start_length
> >> >> > >>> >
> >> >> > >>> > {address of deleted code, length}
> >> >> > >>> > {address of existing code, length}
> >> >> > >>> >
> >> >> > >>> > DWARF4:
> >> >> > >>> > base address selection entry: {-1, address of deleted code}
> >> >> > >>> > following range list entry: {0, length}
> >> >> > >>> > base address selection entry: {-1, address of existing code}
> >> >> > >>> > following range list entry: {0, length}
> >> >> > >>> >
> >> >> > >>> > In these cases linker do not see the length of address range
> >> >> > >>> > and could not change it(using 1, or 0, or 0+addend) to be
> zero
> >> length
> >> >> > >>> > address range. Thus address ranges could overlap.
> >> >> > >>> >
> >> >> > >>> > So in both DWARF5 and DWARF4 there are cases when address
> ranges
> >> >> > >>> > could be overlapped and it could not be solved without new
> >> meaning for tombstone value.
> >> >> > >>> >
> >> >> > >>> > >> - From the point of view correct parsing of DWARF 5 - it
> >> looks like they are equally good.
> >> >> > >>> > >
> >> >> > >>> > >Yep, DWARFv5 is more complicated - bfd's approach would be
> >> marginally
> >> >> > >>> > >better even for addr+offset encodings in debug_rnglists -
> >> since it'd
> >> >> > >>> > >always produce a 0 for the addr, whereas gold/lld's behavior
> >> can in
> >> >> > >>> > >some cases (eg: "void f1() { } __attribute__((nodebug)) void
> >> f2() { }
> >> >> > >>> > >void f3() { }" with -fno-function-sections - you'll end up
> >> with the
> >> >> > >>> > >range for f3 starting at a non-zero addend - this debug info
> >> will be
> >> >> > >>> > >worse for bfd/ld (if the DWARF for this object was included,
> >> but the
> >> >> > >>> > >whole .text section was discarded (eg: maybe there's a
> global
> >> variable
> >> >> > >>> > >in this object which is linked in, but -gc-section is still
> >> able to
> >> >> > >>> > >discard the whole .text section as unreferenced) then
> >> gold/lld's
> >> >> > >>> > >approach would make f3 unidentifiable as dead code (because
> it
> >> doesn't
> >> >> > >>> > >have a tombstone start) but bfd's approach would work
> >> (assuming zero
> >> >> > >>> > >isn't a valid address))
> >> >> > >>> >
> >> >> > >>> > I see. thanks.
> >> >> > >>> >
> >> >> > >>> > >
> >> >> > >>> > >> - From the point of view correct parsing of DWARF 4 - it
> >> looks bfd-solution(using 1)
> >> >> > >>> > >> is better than the old lld solution(using 0). When range
> >> list entry contains
> >> >> > >>> > >> addresses(start+end) which should be relocated and for the
> >> zero-length functions,
> >> >> > >>> > >> bfd-solution would result in range list entry: {1, 1},
> while
> >> old lld solution
> >> >> > >>> > >> would result in {0, 0}, and match with the end of list
> entry.
> >> >> > >>> > >> That is the original problem that started this thread.
> >> >> > >>> > >
> >> >> > >>> > >Only comes up for zero-length functions, because gold/lld's
> >> approach
> >> >> > >>> > >was 0+addend, not straight 0.
> >> >> > >>> > >
> >> >> > >>> >
> >> >> > >>> > right. for zero-length functions.
> >> >> > >>> >
> >> >> > >>> > >> Though it looks like there still exist case when range
> list
> >> could be terminated earlier:
> >> >> > >>> > >>
> >> >> > >>> > >> base address selection entry: {-1, address of deleted
> code}
> >> >> > >>> > >> following range list entry: {0, 0} << points to the same
> >> address as set by base address selection entry and has zero size.
> >> >> > >>> > >
> >> >> > >>> > >That's a bug in the producer (though a good point - I've
> >> probably made
> >> >> > >>> > >that bug in LLVM) - the linker can't solve that problem,
> since
> >> the
> >> >> > >>> > >linker can't touch the literal unrelocated 0, 0.
> >> >> > >>> > >
> >> >> > >>> > >> after linker resolved relocations it would look like this,
> >> for bfd case:
> >> >> > >>> > >>
> >> >> > >>> > >> base address selection entry: {-1, 1}
> >> >> > >>> > >> following range list entry: {0, 0} <<<<<<<<<<<
> >> >> > >>> > >>
> >> >> > >>> > >> So there still exists {0,0} entry which could be
> considered
> >> as the end of list entry.
> >> >> > >>> > >>
> >> >> > >>> > >> But old lld solution has the same problem, thus it would
> not
> >> be new.
> >> >> > >>> > >>
> >> >> > >>> > >> - Additionally, AFAIK gdb has special processing for
> >> overlapped address
> >> >> > >>> > >> ranges starting from 0. Using bfd tombstone value could
> >> break that processing - I would check it.
> >> >> > >>> > >
> >> >> > >>> > >Not sure I understand - presumably gdb's special processing
> is
> >> >> > >>> > >intended to work with bfd's tombstoning, since it's been the
> >> most
> >> >> > >>> > >common/prolific unix linker, the one intended to work with
> gdb
> >> (they
> >> >> > >>> > >exist in the same repository) for decades, right?
> >> >> > >>> >
> >> >> > >>> > I think I misunderstood this: "I wonder how well the bfd
> >> tombstoning works in DWARFv5
> >> >> > >>> > (rnglists/loclists)". I read it as we would like to use bfd
> >> tombstoning(1 for ranges)
> >> >> > >>> > for rnglists/loclists also.
> >> >> > >>> >
> >> >> > >>> > So, right. bfd's tombstoning works correctly with gdb until 1
> >> is not used
> >> >> > >>> > for rnglists/loclists as tombstone value.
> >> >> > >>>
> >> >> > >>> Not quite parsing this, but I think we're on the same page -
> that
> >> >> > >>> bfd's tombstoning "1 for debug_ranges/debug_loc, 0 (not
> 0+addend,
> >> but
> >> >> > >>> absolute 0) for everything else (including debug_loclists and
> >> >> > >>> debug_rnglists)" is probably the most likely to work for gdb,
> >> since
> >> >> > >>> it's been deployed for a long time.
> >> >> > >>>
> >> >> > >>> - Dave
> >> >> > >>>
> >> >> > >>> PS: Fair point about base address specifiers being able to
> >> produce 0,0
> >> >> > >>> entries - wouldn't mind fixing that in LLVM if I knew of an
> easy
> >> way
> >> >> > >>> to test at compile-time whether the difference between two
> labels
> >> was
> >> >> > >>> zero, then skip that entry in the lists entirely. Would save
> >> space and
> >> >> > >>> address the original issue I had with debug_ranges terminating
> >> early.
> >> >> > >>> (should've thought about that much earlier than my more
> alarmist
> >> "oh
> >> >> > >>> deer, we need to fundamentally change how linkers resolve
> >> relocations
> >> >> > >>> because everything's been broken and we just didn't realize
> it" -
> >> >> > >>> fixing the compiler not to produce zero-length ranges would've
> >> been
> >> >> > >>> less risky & probably still worth doing - though addressing the
> >> >> > >>> broader issue to help with your situation of 0 as a valid
> address
> >> I
> >> >> > >>> think is still good too)
> >> >> > >>>
> >> >> > >>> >
> >> >> > >>> > Alexey.
> >> >> > >>> >
> >> >> > >>> > > > >> With consideration for satefy for the upcoming
> >> release/11.x, we can make
> >> >> > >>> > > > >> two choices:
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> a) .debug_ranges&.debug_loc => -2, .debug_line => 0,
> >> other .debug_* -> -1
> >> >> > >>> > > > >> b) .debug_ranges&.debug_loc => -2, other .debug_* => 0
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> Delaying .debug_line => -1 for one or two release
> sounds
> >> good to me.
> >> >> > >>> > > > >> So LLD 11 or 12 linked binaries can be debugged by
> LLDB
> >> 10. This is a
> >> >> > >>> > > > >> nice property.
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> This write-up proposes b), but I'd say a) is likely
> >> sufficient. With the
> >> >> > >>> > > > >> available information, I cannot yet say that a) will
> >> have more risk.
> >> >> > >>> > > > >
> >> >> > >>> > > > >Risk is about the unknowns - and it still seems like a
> lot
> >> of
> >> >> > >>> > > > >unknowns. While there are probably many more consumers
> >> that read
> >> >> > >>> > > > >.debug_line than other sections, reading debug_info (for
> >> instance) is
> >> >> > >>> > > > >necessary for inline frames in symbolizing - still
> >> probably one of the
> >> >> > >>> > > > >most common uses of DWARF I'd guess. (what about stack
> >> unwinding using
> >> >> > >>> > > > >debug_frame? that'd worry me a bit if anyone got /that/
> >> wrong because
> >> >> > >>> > > > >of this change)
> >> >> > >>> > > > >
> >> >> > >>> > > > >> > - chromium/firefox have some tools that were broken:
> >> >> > >>> > > > >> >
> >> https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> This is potentially related to other .debug_* (not
> >> .debug_line)
> >> >> > >>> > > > >> I hope Chromium developers can chime in here:) The
> >> breakage was
> >> >> > >>> > > > >> unfortunate but I don't know how we could have avoided
> >> that. IMHO this
> >> >> > >>> > > > >> is no different from "clang started to emit a new
> >> DW_FORM_* and a
> >> >> > >>> > > > >> postprocessing tool of .debug chokes on that" Whether
> we
> >> want to
> >> >> > >>> > > > >> suppress that particular DW_FORM_* definitely should
> >> depend on how
> >> >> > >>> > > > >> likely it can cause problems, but we can't yet say we
> >> have to hold off
> >> >> > >>> > > > >> on a feature for a solved (precisely, mitigated)
> problem.
> >> >> > >>> > > > >
> >> >> > >>> > > > >LLVM has no custom forms and I'd be super cautious about
> >> adding any
> >> >> > >>> > > > >that were on by default because of how bad that breakage
> >> would be.
> >> >> > >>> > > > >
> >> >> > >>> > > > >I'm not so concerned about the problems we know - but
> what
> >> they tell
> >> >> > >>> > > > >us about the problems that might arise from use cases we
> >> don't know.
> >> >> > >>> > > > >All the other projects out there that might have custom
> >> DWARF parsers
> >> >> > >>> > > > >to do some ad-hoc things.
> >> >> > >>> > > > >
> >> >> > >>> > > > >(also, ultimately - given how far-reaching this is, I
> >> think we'll want
> >> >> > >>> > > > >some tidier flags that are more user-focussed. I'd hope
> >> for a flag
> >> >> > >>> > > > >that gives BFD-like semantics (though I'd be OK with
> >> fixing debug_loc
> >> >> > >>> > > > >(using 1 instead of 0) to work the same as debug_ranges
> >> while we're
> >> >> > >>> > > > >there - a minor divergence from BFD, but highly likely
> to
> >> not cause
> >> >> > >>> > > > >problems/fall out naturally from a simple implementation
> >> of parsing
> >> >> > >>> > > > >that section) - something that's been in-use and tested
> by
> >> basically
> >> >> > >>> > > > >everyone for decades. And another flag for the new
> >> semantics (-2 for
> >> >> > >>> > > > >debug_loc/debug_ranges, -1 everywhere else).
> Customizable
> >> per-section
> >> >> > >>> > > > >expression-based support I think is a recipe for
> platform
> >> divergence &
> >> >> > >>> > > > >I'd rather it not be available/supported at all, but if
> >> you really
> >> >> > >>> > > > >want to keep it in, I'd at least rather it not be the
> >> feature we
> >> >> > >>> > > > >promote to users about how they can test/opt in/out of
> the
> >> behavior
> >> >> > >>> > > > >when they're seeing breakages or want to test the future
> >> semantics)
> >> >> > >>> > > > >
> >> >> > >>> > > > >> >I'm not sure how to get the word out to DWARF
> consumers
> >> that they should
> >> >> > >>> > > > >> >consider this new experimental behavior. Ray's done a
> >> good job
> >> >> > >>> > > > >> >evangelizing/discussing this with gdb and lldb at
> least
> >> - and of course
> >> >> > >>> > > > >> >having turned it on by default briefly has found some
> >> users (like Chromium)
> >> >> > >>> > > > >> >that we probably wouldn't have found no matter how
> long
> >> we left this as an
> >> >> > >>> > > > >> >experimental option... so some things are going to
> >> break when we switch no
> >> >> > >>> > > > >> >matter what.
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> Thank you for following up with some GNU folks on
> their
> >> lists!
> >> >> > >>> > > > >> If folks want to follow along the thread:
> >> >> > >>> > > > >>
> >> https://sourceware.org/pipermail/binutils/2020-June/111376.html
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> We have informed binutils, elfutils-devel (elfutils
> has
> >> a few debug
> >> >> > >>> > > > >> tools) and gdb. I don't recall that anyone has thought
> >> about problems
> >> >> > >>> > > > >> with a tombstone value.
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> >
> >> >> > >>> > > > >> >P.S: Sony's already been using the -1 technique with
> >> their debugger and
> >> >> > >>> > > > >> >linker for a while, so they may want to keep this on
> by
> >> default for SCE -
> >> >> > >>> > > > >> >but I'm not sure how to do that in-tree.
> >> >> > >>> > > > >> >
> >> >> > >>> > > > >> >
> >> >> > >>> > > > >> >Clang doesn't know which lld
> >> >> > >>> > > > >> >version it's running, so whether the flag can be
> >> specified, I would think?
> >> >> > >>> > > > >> >(so it'd be hard to have Clang go "if SCE and LLD,
> pass
> >> the flag to use
> >> >> > >>> > > > >> >-1", I think) - if there is a way to make that
> decision
> >> in the compiler
> >> >> > >>> > > > >.> >driver+linker, then we'd have a question of "default
> >> new behavior except
> >> >> > >>> > > > >> >when tuning for LLDB and GDB" or "default bfd
> behavior
> >> except when tuning
> >> >> > >>> > > > >> >for SCE".
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> I've been involed in another thread on SHF_LINK_ORDER
> (
> >> https://sourceware.org/pipermail/binutils/2020-
> >> >> > >>> > > > >July/112415.html ).
> >> >> > >>> > > > >> We may need a way to tell codegen about the used
> linker.
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> pcc proposed -mbinutils-version= - This is nice in
> that
> >> some MC
> >> >> > >>> > > > >> decisions related to -fno-integrated-as can use this
> >> option as well.
> >> >> > >>> > > > >> jyknight proposed -mlinker-version= and syntax like
> >> -fuse-ld=bfd:2.34
> >> >> > >>> > > > >>
> >> >> > >>> > > > >> This may get more complex if the generated object file
> >> want to be linked
> >> >> > >>> > > > >> with more than one linker. This discussion probably
> >> deserves its own
> >> >> > >>> > > > >> thread.
> >> >> > >>> > > >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200729/2f35783e/attachment-0001.html>
More information about the llvm-dev
mailing list