[llvm-dev] Switch to ld.bfd tombstone behavior by default

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 17 15:11:47 PDT 2020


On Fri, Jul 17, 2020 at 2:58 PM Fangrui Song <maskray at google.com> wrote:
>
> On 2020-07-17, David Blaikie wrote:
> >On Fri, Jul 17, 2020 at 1:55 PM Alexey Lapshin <a.v.lapshin at mail.ru> wrote:
> >>
> >> >Пятница, 17 июля 2020, 19:42 +03:00 от David Blaikie <dblaikie at gmail.com>:
> >> >
> >> >On Fri, Jul 17, 2020 at 12:03 AM Fangrui Song <maskray at google.com> wrote:
> >> >>
> >> >> Thanks for the write-up!
> >> >>
> >> >> On 2020-07-16, David Blaikie wrote:
> >> >> >In short: Perhaps we should switch lld to the bfd-style tombstoning
> >> >> >behavior for a release or two, letting users opt-in to testing with the new
> >> >> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
> >> >> >default (while still having the flag to switch back when users find
> >> >> >surprise places that can't handle the new behavior).
> >> >> >
> >> >> >In long:
> >> >> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
> >> >> >of lld with regards to resolving relocations from debug sections to dead
> >> >> >code (either comdat deduplicated, or gc-sections use).
> >> >> >
> >> >> >A very quick summary of the situation:
> >> >> >
> >> >> >Original Behavior:
> >> >> >
> >> >> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
> >> >> > elsewhere
> >> >> > - gold/lld: 0+addend everywhere
> >> >> >
> >> >> >Limitations/bugs:
> >> >> >
> >> >> > - bfd/gold/lld
> >> >> > - doesn't support 0 as a valid executable address without ambiguities
> >> >> > - gold/lld
> >> >> > - ambiguities with large gc'd functions combined with a .text mapping
> >> >> > that starts in relative low addresses
> >> >> > - premature debug_range termination with zero-length functions (Clang
> >> >> > produces these with __builtin_unreachable or non-void return
> >> >> >type functions
> >> >> > without a return statement)
> >> >> >
> >> >> >New behavior:
> >> >> >
> >> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
> >> >> > there)
> >> >> > - -1 elsewhere
> >> >> > - linker flag to customize to other values if desired
> >> >> >
> >> >> >Known issues:
> >> >> >
> >> >> > - lldb's line table parsing can't handle -1 well at all (essentially
> >> >> > unusable)
> >> >>
> >> >> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
> >> >> This is an unhandled address-space wraparound problem.
> >> >> This pattern is potentially common - and other downstream DWARF
> >> >> consumers might make similar line table handling mistakes.
> >> >
> >> >That's the thing - I'm not sure we can really classify them as
> >> >"mistakes". I think bfd.ld's tombstoning behavior is about the only
> >> >thing we can reasonably say DWARF consumers should /probably/ be
> >> >expected to handle - and I'd imagine in many cases they haven't been
> >> >written intentionally to handle it, but whatever behavior they have
> >> >has accidentally been sufficient for their needs.
> >>
> >> >> > - gdb's line table parsing ends up with different handling when breaking
> >> >> > on gc'd functions (minor functionality issue)
> >> >>
> >> >> This is just a behavior difference, not affecting users.
> >> >> It did break a test if linked with LLD (gdb intrinsically has lots of
> >> >> failing tests even if built with GCC+GNU ld).
> >> >>
> >> >> Previous behavior (when an address is zero): a breakpoint on a
> >> >> --gc-sections discarded function will be redirected to a larger line
> >> >> number with debug info, even if that line can be an unrelated different
> >> >> function.
> >> >> New behavior is that the breakpoint is on a wrapped-around small address.
> >> >>
> >> >> GDB 9.3 will restore the previous behavior
> >> >> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
> >> >>
> >> >> >
> >> >> >I think there's enough risk in this work (even given the small number of
> >> >> >bugs found so far), given there's a pretty wide array of debug info
> >> >> >consumers out there, that we should change lld's default to match the
> >> >> >long-lived bfd strategy. This would address my original motivation for
> >> >> >raising all this (empty functions prematurely terminating the list), while
> >> >> >letting users who want to experiment with it, or need it (like Alexey), can
> >> >> >opt-in to the -1/-2 behavior.
> >> >>
> >> >> I think we can only confidently say that there is enough risk in using
> >> >> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
> >> >> other .debug_* can cause problems.
> >>
> >> >Given how many DWARF parsers we've had to cleanup or migrate off in
> >> >transitioning to DWARFv5 inside Google, I think that's a fair bit of
> >> >evidence the set of parsers isn't as narrow/closed as we'd like and
> >> >thus the number of places that might have issues isn't known/easily
> >> >exhaustively tested. So I'm not so concerned about the bugs we've
> >> >seen, but what that might indicate about the things that we don't know
> >> >about and can't test (because we don't know about them). (of course,
> >> >the flipside of that is that if we don't know about them, we can't
> >> >tell people who own them to go check if they work with this opt-in
> >> >feature - so they'll break whenever we turn this on by default - but
> >> >perhaps in the interim we can get at least a few big LLD customers to
> >> >deploy the feature and flush out some of the issues - happy for Google
> >> >to use this internally, hopefully we can encourage Apple folks, Sony
> >> >of course already has this semantic so nothing to find there most
> >> >likely - Chromium, maybe Firefox, etc)
> >>
> >> From the other side — when we already switched new behavior ON as default —
> >> then it is easier to discover all these unknown cases. We have an option restoring
> >> old behavior(which should be fine for all of the current users). Thus, everybody
> >> who needs old behavior is able to continue using it.
> >
> >The cost to those users figuring out that it's a problem and finding
> >the flags/adding them, etc, is non-zero.
> >But, yes, as I said - eventually some of that will happen, sooner or
> >later. But reducing how much of it happens is valuable too.
> >
> >> That makes transition more understandable. All problems would be explicitly seen. Then,
> >> If we will know about new problems - we could adapt the current solution. Similar to this suggestion:
> >>
> >> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
> >
> >The issue I have with this is that it's reflective of the known
> >breakage in some tools that happen to have a quick turnaround/feedback
> >- and assumes all the other sections are significantly less likely to
> >have problems with this format - and I don't think we have enough data
> >to make that estimate.
> >
> >I think having a clear flag for bfd behavior or new behavior would be
> >good, keeping it bfd-by-default for now, evangelizing (patch notes,
> >outreach to other projects and tool developers) the new functionality
> >& then changing the default shortly after an LLVM release (perhaps 12
> >- but perhaps longer) to give it time to bake for trunk-following
> >users (ample time for them to pick it up on their schedule, see what
> >other tools might break, whether they're isolated enough to expect a
> >quick turnaround or whether we should get them updated and then wait a
> >while longer.
> >> This would help while dwarf users are preparing their code to the new solution.
> >>
> >> Is option restoring old behavior not enough to solve problems caused by new tombstone value?
> >
> >I don't personally think it is enough - given some of the known
> >breakage - I think that points to a fair bit of reasonably possible
> >unknown breakage. Having a few big (like Apple and Google) users
> >switch over by default & potentially flush out a few more issues (or
> >not -but build confidence that there aren't more issues) I think is an
> >appropriate way to roll out a change like this.
> >
> >(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
> >(rnglists/loclists) or in debug_range/debug_loc that uses base address
> >specifiers, where the zero-without-addend doesn't have a chance to
> >make an empty range (because it's not a start/end pair, it's a
> >start+offset pair - so the offset remains non-zero)... if bfd
> >tombstoning breaks DWARFv5 parsing (or base address specifiers in
> >range/loc) in common consumers I might be more inclined to support
> >enabling new tombstoning by default sooner (though if we could enable
> >it just for DWARFv5 that might be nice - but not practical, since the
> >linker doesn't know what DWARF it's linking & could be linking
> >multiple versions))
> >
> >- Dave
>
> The argument that (.debug_line=>-1 is risky so other .debug_*=>-1 are risky) looks a bit
> like a slippery slope argument to me.

I'm not suggesting that one leads to the other, which is a slippery
slope (if you do X, then Y will be inevitable - when X and Y aren't
linked). I'm suggesting it's a good reminder that this change can
break consumers & I think the scope is large enough that we should be
more conservative here.

> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* => -1
>
> has the merit that people will be aware that .debug_* sections can generally have -1
> as tombstone values. This is a step towards the full transition (.debug_line => -1).
> We keep it for one or two release so that LLD 11 or 12 linked binaries can be debugged by LLDB 10.
>
> If we do b), I don't see how the situation will be different in llvm-project
> 12.x or 13.x and the same unsafety argument could be made against the
> transition.

I think opt-in deployment experience (eg: at Google, possibly through
Apple, ) would help provide confidence that at least a reasonable set
of tools are robust to this change.

>   I don't see how the half transition is more risky than compiler
> generating a new DW_FORM_* which wasn't used before.

LLVM hasn't added any custom forms that I know of, and I'd push back
pretty hard on adding one that was on-by-default. (even one that works
in gdb or lldb current release and is only used when tuning for the
respective debugger would be questionable to me - given existing
installations of debuggers that aren't necessarily the latest release)

> With -z dead-reloc-in-nonalloc=.debug_*=0 and a detailed release note, I don't
> find a) more problematic than b).
>
> >> >> With consideration for satefy for the upcoming release/11.x, we can make
> >> >> two choices:
> >> >>
> >> >> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
> >> >> b) .debug_ranges&.debug_loc => -2, other .debug_* => 0
> >> >>
> >> >> Delaying .debug_line => -1 for one or two release sounds good to me.
> >> >> So LLD 11 or 12 linked binaries can be debugged by LLDB 10. This is a
> >> >> nice property.
> >> >>
> >> >> This write-up proposes b), but I'd say a) is likely sufficient. With the
> >> >> available information, I cannot yet say that a) will have more risk.
> >> >
> >> >Risk is about the unknowns - and it still seems like a lot of
> >> >unknowns. While there are probably many more consumers that read
> >> >.debug_line than other sections, reading debug_info (for instance) is
> >> >necessary for inline frames in symbolizing - still probably one of the
> >> >most common uses of DWARF I'd guess. (what about stack unwinding using
> >> >debug_frame? that'd worry me a bit if anyone got /that/ wrong because
> >> >of this change)
> >> >
> >> >> > - chromium/firefox have some tools that were broken:
> >> >> > https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5
> >> >>
> >> >> This is potentially related to other .debug_* (not .debug_line)
> >> >> I hope Chromium developers can chime in here:) The breakage was
> >> >> unfortunate but I don't know how we could have avoided that. IMHO this
> >> >> is no different from "clang started to emit a new DW_FORM_* and a
> >> >> postprocessing tool of .debug chokes on that" Whether we want to
> >> >> suppress that particular DW_FORM_* definitely should depend on how
> >> >> likely it can cause problems, but we can't yet say we have to hold off
> >> >> on a feature for a solved (precisely, mitigated) problem.
> >> >
> >> >LLVM has no custom forms and I'd be super cautious about adding any
> >> >that were on by default because of how bad that breakage would be.
> >> >
> >> >I'm not so concerned about the problems we know - but what they tell
> >> >us about the problems that might arise from use cases we don't know.
> >> >All the other projects out there that might have custom DWARF parsers
> >> >to do some ad-hoc things.
> >> >
> >> >(also, ultimately - given how far-reaching this is, I think we'll want
> >> >some tidier flags that are more user-focussed. I'd hope for a flag
> >> >that gives BFD-like semantics (though I'd be OK with fixing debug_loc
> >> >(using 1 instead of 0) to work the same as debug_ranges while we're
> >> >there - a minor divergence from BFD, but highly likely to not cause
> >> >problems/fall out naturally from a simple implementation of parsing
> >> >that section) - something that's been in-use and tested by basically
> >> >everyone for decades. And another flag for the new semantics (-2 for
> >> >debug_loc/debug_ranges, -1 everywhere else). Customizable per-section
> >> >expression-based support I think is a recipe for platform divergence &
> >> >I'd rather it not be available/supported at all, but if you really
> >> >want to keep it in, I'd at least rather it not be the feature we
> >> >promote to users about how they can test/opt in/out of the behavior
> >> >when they're seeing breakages or want to test the future semantics)
> >> >
> >> >> >I'm not sure how to get the word out to DWARF consumers that they should
> >> >> >consider this new experimental behavior. Ray's done a good job
> >> >> >evangelizing/discussing this with gdb and lldb at least - and of course
> >> >> >having turned it on by default briefly has found some users (like Chromium)
> >> >> >that we probably wouldn't have found no matter how long we left this as an
> >> >> >experimental option... so some things are going to break when we switch no
> >> >> >matter what.
> >> >>
> >> >> Thank you for following up with some GNU folks on their lists!
> >> >> If folks want to follow along the thread:
> >> >> https://sourceware.org/pipermail/binutils/2020-June/111376.html
> >> >>
> >> >> We have informed binutils, elfutils-devel (elfutils has a few debug
> >> >> tools) and gdb. I don't recall that anyone has thought about problems
> >> >> with a tombstone value.
> >> >>
> >> >> >
> >> >> >P.S: Sony's already been using the -1 technique with their debugger and
> >> >> >linker for a while, so they may want to keep this on by default for SCE -
> >> >> >but I'm not sure how to do that in-tree.
> >> >> >
> >> >> >
> >> >> >Clang doesn't know which lld
> >> >> >version it's running, so whether the flag can be specified, I would think?
> >> >> >(so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use
> >> >> >-1", I think) - if there is a way to make that decision in the compiler
> >> >.> >driver+linker, then we'd have a question of "default new behavior except
> >> >> >when tuning for LLDB and GDB" or "default bfd behavior except when tuning
> >> >> >for SCE".
> >> >>
> >> >> I've been involed in another thread on SHF_LINK_ORDER (https://sourceware.org/pipermail/binutils/2020-
> >> >July/112415.html ).
> >> >> We may need a way to tell codegen about the used linker.
> >> >>
> >> >> pcc proposed -mbinutils-version= - This is nice in that some MC
> >> >> decisions related to -fno-integrated-as can use this option as well.
> >> >> jyknight proposed -mlinker-version= and syntax like -fuse-ld=bfd:2.34
> >> >>
> >> >> This may get more complex if the generated object file want to be linked
> >> >> with more than one linker. This discussion probably deserves its own
> >> >> thread.
> >>
> >>
> >> --
> >> Alexey Lapshin
> >>


More information about the llvm-dev mailing list