[llvm-dev] Switch to ld.bfd tombstone behavior by default

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 17 14:39:40 PDT 2020


On Fri, Jul 17, 2020 at 1:55 PM Alexey Lapshin <a.v.lapshin at mail.ru> wrote:
>
> >Пятница, 17 июля 2020, 19:42 +03:00 от David Blaikie <dblaikie at gmail.com>:
> >
> >On Fri, Jul 17, 2020 at 12:03 AM Fangrui Song <maskray at google.com> wrote:
> >>
> >> Thanks for the write-up!
> >>
> >> On 2020-07-16, David Blaikie wrote:
> >> >In short: Perhaps we should switch lld to the bfd-style tombstoning
> >> >behavior for a release or two, letting users opt-in to testing with the new
> >> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
> >> >default (while still having the flag to switch back when users find
> >> >surprise places that can't handle the new behavior).
> >> >
> >> >In long:
> >> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
> >> >of lld with regards to resolving relocations from debug sections to dead
> >> >code (either comdat deduplicated, or gc-sections use).
> >> >
> >> >A very quick summary of the situation:
> >> >
> >> >Original Behavior:
> >> >
> >> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
> >> > elsewhere
> >> > - gold/lld: 0+addend everywhere
> >> >
> >> >Limitations/bugs:
> >> >
> >> > - bfd/gold/lld
> >> > - doesn't support 0 as a valid executable address without ambiguities
> >> > - gold/lld
> >> > - ambiguities with large gc'd functions combined with a .text mapping
> >> > that starts in relative low addresses
> >> > - premature debug_range termination with zero-length functions (Clang
> >> > produces these with __builtin_unreachable or non-void return
> >> >type functions
> >> > without a return statement)
> >> >
> >> >New behavior:
> >> >
> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
> >> > there)
> >> > - -1 elsewhere
> >> > - linker flag to customize to other values if desired
> >> >
> >> >Known issues:
> >> >
> >> > - lldb's line table parsing can't handle -1 well at all (essentially
> >> > unusable)
> >>
> >> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
> >> This is an unhandled address-space wraparound problem.
> >> This pattern is potentially common - and other downstream DWARF
> >> consumers might make similar line table handling mistakes.
> >
> >That's the thing - I'm not sure we can really classify them as
> >"mistakes". I think bfd.ld's tombstoning behavior is about the only
> >thing we can reasonably say DWARF consumers should /probably/ be
> >expected to handle - and I'd imagine in many cases they haven't been
> >written intentionally to handle it, but whatever behavior they have
> >has accidentally been sufficient for their needs.
>
> >> > - gdb's line table parsing ends up with different handling when breaking
> >> > on gc'd functions (minor functionality issue)
> >>
> >> This is just a behavior difference, not affecting users.
> >> It did break a test if linked with LLD (gdb intrinsically has lots of
> >> failing tests even if built with GCC+GNU ld).
> >>
> >> Previous behavior (when an address is zero): a breakpoint on a
> >> --gc-sections discarded function will be redirected to a larger line
> >> number with debug info, even if that line can be an unrelated different
> >> function.
> >> New behavior is that the breakpoint is on a wrapped-around small address.
> >>
> >> GDB 9.3 will restore the previous behavior
> >> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
> >>
> >> >
> >> >I think there's enough risk in this work (even given the small number of
> >> >bugs found so far), given there's a pretty wide array of debug info
> >> >consumers out there, that we should change lld's default to match the
> >> >long-lived bfd strategy. This would address my original motivation for
> >> >raising all this (empty functions prematurely terminating the list), while
> >> >letting users who want to experiment with it, or need it (like Alexey), can
> >> >opt-in to the -1/-2 behavior.
> >>
> >> I think we can only confidently say that there is enough risk in using
> >> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
> >> other .debug_* can cause problems.
>
> >Given how many DWARF parsers we've had to cleanup or migrate off in
> >transitioning to DWARFv5 inside Google, I think that's a fair bit of
> >evidence the set of parsers isn't as narrow/closed as we'd like and
> >thus the number of places that might have issues isn't known/easily
> >exhaustively tested. So I'm not so concerned about the bugs we've
> >seen, but what that might indicate about the things that we don't know
> >about and can't test (because we don't know about them). (of course,
> >the flipside of that is that if we don't know about them, we can't
> >tell people who own them to go check if they work with this opt-in
> >feature - so they'll break whenever we turn this on by default - but
> >perhaps in the interim we can get at least a few big LLD customers to
> >deploy the feature and flush out some of the issues - happy for Google
> >to use this internally, hopefully we can encourage Apple folks, Sony
> >of course already has this semantic so nothing to find there most
> >likely - Chromium, maybe Firefox, etc)
>
> From the other side — when we already switched new behavior ON as default —
> then it is easier to discover all these unknown cases. We have an option restoring
> old behavior(which should be fine for all of the current users). Thus, everybody
> who needs old behavior is able to continue using it.

The cost to those users figuring out that it's a problem and finding
the flags/adding them, etc, is non-zero.
But, yes, as I said - eventually some of that will happen, sooner or
later. But reducing how much of it happens is valuable too.

> That makes transition more understandable. All problems would be explicitly seen. Then,
> If we will know about new problems - we could adapt the current solution. Similar to this suggestion:
>
> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1

The issue I have with this is that it's reflective of the known
breakage in some tools that happen to have a quick turnaround/feedback
- and assumes all the other sections are significantly less likely to
have problems with this format - and I don't think we have enough data
to make that estimate.

I think having a clear flag for bfd behavior or new behavior would be
good, keeping it bfd-by-default for now, evangelizing (patch notes,
outreach to other projects and tool developers) the new functionality
& then changing the default shortly after an LLVM release (perhaps 12
- but perhaps longer) to give it time to bake for trunk-following
users (ample time for them to pick it up on their schedule, see what
other tools might break, whether they're isolated enough to expect a
quick turnaround or whether we should get them updated and then wait a
while longer.

> This would help while dwarf users are preparing their code to the new solution.
>
> Is option restoring old behavior not enough to solve problems caused by new tombstone value?

I don't personally think it is enough - given some of the known
breakage - I think that points to a fair bit of reasonably possible
unknown breakage. Having a few big (like Apple and Google) users
switch over by default & potentially flush out a few more issues (or
not -but build confidence that there aren't more issues) I think is an
appropriate way to roll out a change like this.

(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
(rnglists/loclists) or in debug_range/debug_loc that uses base address
specifiers, where the zero-without-addend doesn't have a chance to
make an empty range (because it's not a start/end pair, it's a
start+offset pair - so the offset remains non-zero)... if bfd
tombstoning breaks DWARFv5 parsing (or base address specifiers in
range/loc) in common consumers I might be more inclined to support
enabling new tombstoning by default sooner (though if we could enable
it just for DWARFv5 that might be nice - but not practical, since the
linker doesn't know what DWARF it's linking & could be linking
multiple versions))

- Dave

> >> With consideration for satefy for the upcoming release/11.x, we can make
> >> two choices:
> >>
> >> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
> >> b) .debug_ranges&.debug_loc => -2, other .debug_* => 0
> >>
> >> Delaying .debug_line => -1 for one or two release sounds good to me.
> >> So LLD 11 or 12 linked binaries can be debugged by LLDB 10. This is a
> >> nice property.
> >>
> >> This write-up proposes b), but I'd say a) is likely sufficient. With the
> >> available information, I cannot yet say that a) will have more risk.
> >
> >Risk is about the unknowns - and it still seems like a lot of
> >unknowns. While there are probably many more consumers that read
> >.debug_line than other sections, reading debug_info (for instance) is
> >necessary for inline frames in symbolizing - still probably one of the
> >most common uses of DWARF I'd guess. (what about stack unwinding using
> >debug_frame? that'd worry me a bit if anyone got /that/ wrong because
> >of this change)
> >
> >> > - chromium/firefox have some tools that were broken:
> >> > https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5
> >>
> >> This is potentially related to other .debug_* (not .debug_line)
> >> I hope Chromium developers can chime in here:) The breakage was
> >> unfortunate but I don't know how we could have avoided that. IMHO this
> >> is no different from "clang started to emit a new DW_FORM_* and a
> >> postprocessing tool of .debug chokes on that" Whether we want to
> >> suppress that particular DW_FORM_* definitely should depend on how
> >> likely it can cause problems, but we can't yet say we have to hold off
> >> on a feature for a solved (precisely, mitigated) problem.
> >
> >LLVM has no custom forms and I'd be super cautious about adding any
> >that were on by default because of how bad that breakage would be.
> >
> >I'm not so concerned about the problems we know - but what they tell
> >us about the problems that might arise from use cases we don't know.
> >All the other projects out there that might have custom DWARF parsers
> >to do some ad-hoc things.
> >
> >(also, ultimately - given how far-reaching this is, I think we'll want
> >some tidier flags that are more user-focussed. I'd hope for a flag
> >that gives BFD-like semantics (though I'd be OK with fixing debug_loc
> >(using 1 instead of 0) to work the same as debug_ranges while we're
> >there - a minor divergence from BFD, but highly likely to not cause
> >problems/fall out naturally from a simple implementation of parsing
> >that section) - something that's been in-use and tested by basically
> >everyone for decades. And another flag for the new semantics (-2 for
> >debug_loc/debug_ranges, -1 everywhere else). Customizable per-section
> >expression-based support I think is a recipe for platform divergence &
> >I'd rather it not be available/supported at all, but if you really
> >want to keep it in, I'd at least rather it not be the feature we
> >promote to users about how they can test/opt in/out of the behavior
> >when they're seeing breakages or want to test the future semantics)
> >
> >> >I'm not sure how to get the word out to DWARF consumers that they should
> >> >consider this new experimental behavior. Ray's done a good job
> >> >evangelizing/discussing this with gdb and lldb at least - and of course
> >> >having turned it on by default briefly has found some users (like Chromium)
> >> >that we probably wouldn't have found no matter how long we left this as an
> >> >experimental option... so some things are going to break when we switch no
> >> >matter what.
> >>
> >> Thank you for following up with some GNU folks on their lists!
> >> If folks want to follow along the thread:
> >> https://sourceware.org/pipermail/binutils/2020-June/111376.html
> >>
> >> We have informed binutils, elfutils-devel (elfutils has a few debug
> >> tools) and gdb. I don't recall that anyone has thought about problems
> >> with a tombstone value.
> >>
> >> >
> >> >P.S: Sony's already been using the -1 technique with their debugger and
> >> >linker for a while, so they may want to keep this on by default for SCE -
> >> >but I'm not sure how to do that in-tree.
> >> >
> >> >
> >> >Clang doesn't know which lld
> >> >version it's running, so whether the flag can be specified, I would think?
> >> >(so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use
> >> >-1", I think) - if there is a way to make that decision in the compiler
> >.> >driver+linker, then we'd have a question of "default new behavior except
> >> >when tuning for LLDB and GDB" or "default bfd behavior except when tuning
> >> >for SCE".
> >>
> >> I've been involed in another thread on SHF_LINK_ORDER (https://sourceware.org/pipermail/binutils/2020-
> >July/112415.html ).
> >> We may need a way to tell codegen about the used linker.
> >>
> >> pcc proposed -mbinutils-version= - This is nice in that some MC
> >> decisions related to -fno-integrated-as can use this option as well.
> >> jyknight proposed -mlinker-version= and syntax like -fuse-ld=bfd:2.34
> >>
> >> This may get more complex if the generated object file want to be linked
> >> with more than one linker. This discussion probably deserves its own
> >> thread.
>
>
> --
> Alexey Lapshin
>


More information about the llvm-dev mailing list