[llvm-dev] Switch to ld.bfd tombstone behavior by default

Fri Jul 17 14:58:48 PDT 2020

On 2020-07-17, David Blaikie wrote:
>On Fri, Jul 17, 2020 at 1:55 PM Alexey Lapshin <a.v.lapshin at mail.ru> wrote:
>>
>> >Пятница, 17 июля 2020, 19:42 +03:00 от David Blaikie <dblaikie at gmail.com>:
>> >
>> >On Fri, Jul 17, 2020 at 12:03 AM Fangrui Song <maskray at google.com> wrote:
>> >>
>> >> Thanks for the write-up!
>> >>
>> >> On 2020-07-16, David Blaikie wrote:
>> >> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >> >behavior for a release or two, letting users opt-in to testing with the new
>> >> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >> >default (while still having the flag to switch back when users find
>> >> >surprise places that can't handle the new behavior).
>> >> >
>> >> >In long:
>> >> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>> >> >of lld with regards to resolving relocations from debug sections to dead
>> >> >code (either comdat deduplicated, or gc-sections use).
>> >> >
>> >> >A very quick summary of the situation:
>> >> >
>> >> >Original Behavior:
>> >> >
>> >> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> >> > elsewhere
>> >> > - gold/lld: 0+addend everywhere
>> >> >
>> >> >Limitations/bugs:
>> >> >
>> >> > - bfd/gold/lld
>> >> > - doesn't support 0 as a valid executable address without ambiguities
>> >> > - gold/lld
>> >> > - ambiguities with large gc'd functions combined with a .text mapping
>> >> > that starts in relative low addresses
>> >> > - premature debug_range termination with zero-length functions (Clang
>> >> > produces these with __builtin_unreachable or non-void return
>> >> >type functions
>> >> > without a return statement)
>> >> >
>> >> >New behavior:
>> >> >
>> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> >> > there)
>> >> > - -1 elsewhere
>> >> > - linker flag to customize to other values if desired
>> >> >
>> >> >Known issues:
>> >> >
>> >> > - lldb's line table parsing can't handle -1 well at all (essentially
>> >> > unusable)
>> >>
>> >> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
>> >> This is an unhandled address-space wraparound problem.
>> >> This pattern is potentially common - and other downstream DWARF
>> >> consumers might make similar line table handling mistakes.
>> >
>> >That's the thing - I'm not sure we can really classify them as
>> >"mistakes". I think bfd.ld's tombstoning behavior is about the only
>> >thing we can reasonably say DWARF consumers should /probably/ be
>> >expected to handle - and I'd imagine in many cases they haven't been
>> >written intentionally to handle it, but whatever behavior they have
>> >has accidentally been sufficient for their needs.
>>
>> >> > - gdb's line table parsing ends up with different handling when breaking
>> >> > on gc'd functions (minor functionality issue)
>> >>
>> >> This is just a behavior difference, not affecting users.
>> >> It did break a test if linked with LLD (gdb intrinsically has lots of
>> >> failing tests even if built with GCC+GNU ld).
>> >>
>> >> Previous behavior (when an address is zero): a breakpoint on a
>> >> --gc-sections discarded function will be redirected to a larger line
>> >> number with debug info, even if that line can be an unrelated different
>> >> function.
>> >> New behavior is that the breakpoint is on a wrapped-around small address.
>> >>
>> >> GDB 9.3 will restore the previous behavior
>> >> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
>> >>
>> >> >
>> >> >I think there's enough risk in this work (even given the small number of
>> >> >bugs found so far), given there's a pretty wide array of debug info
>> >> >consumers out there, that we should change lld's default to match the
>> >> >long-lived bfd strategy. This would address my original motivation for
>> >> >raising all this (empty functions prematurely terminating the list), while
>> >> >letting users who want to experiment with it, or need it (like Alexey), can
>> >> >opt-in to the -1/-2 behavior.
>> >>
>> >> I think we can only confidently say that there is enough risk in using
>> >> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> >> other .debug_* can cause problems.
>>
>> >Given how many DWARF parsers we've had to cleanup or migrate off in
>> >transitioning to DWARFv5 inside Google, I think that's a fair bit of
>> >evidence the set of parsers isn't as narrow/closed as we'd like and
>> >thus the number of places that might have issues isn't known/easily
>> >exhaustively tested. So I'm not so concerned about the bugs we've
>> >seen, but what that might indicate about the things that we don't know
>> >about and can't test (because we don't know about them). (of course,
>> >the flipside of that is that if we don't know about them, we can't
>> >tell people who own them to go check if they work with this opt-in
>> >feature - so they'll break whenever we turn this on by default - but
>> >perhaps in the interim we can get at least a few big LLD customers to
>> >deploy the feature and flush out some of the issues - happy for Google
>> >to use this internally, hopefully we can encourage Apple folks, Sony
>> >of course already has this semantic so nothing to find there most
>> >likely - Chromium, maybe Firefox, etc)
>>
>> From the other side — when we already switched new behavior ON as default —
>> then it is easier to discover all these unknown cases. We have an option restoring
>> old behavior(which should be fine for all of the current users). Thus, everybody
>> who needs old behavior is able to continue using it.
>
>The cost to those users figuring out that it's a problem and finding
>the flags/adding them, etc, is non-zero.
>But, yes, as I said - eventually some of that will happen, sooner or
>later. But reducing how much of it happens is valuable too.
>
>> That makes transition more understandable. All problems would be explicitly seen. Then,
>> If we will know about new problems - we could adapt the current solution. Similar to this suggestion:
>>
>> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
>
>The issue I have with this is that it's reflective of the known
>breakage in some tools that happen to have a quick turnaround/feedback
>- and assumes all the other sections are significantly less likely to
>have problems with this format - and I don't think we have enough data
>to make that estimate.
>
>I think having a clear flag for bfd behavior or new behavior would be
>good, keeping it bfd-by-default for now, evangelizing (patch notes,
>outreach to other projects and tool developers) the new functionality
>& then changing the default shortly after an LLVM release (perhaps 12
>- but perhaps longer) to give it time to bake for trunk-following
>users (ample time for them to pick it up on their schedule, see what
>other tools might break, whether they're isolated enough to expect a
>quick turnaround or whether we should get them updated and then wait a
>while longer.
>> This would help while dwarf users are preparing their code to the new solution.
>>
>> Is option restoring old behavior not enough to solve problems caused by new tombstone value?
>
>I don't personally think it is enough - given some of the known
>breakage - I think that points to a fair bit of reasonably possible
>unknown breakage. Having a few big (like Apple and Google) users
>switch over by default & potentially flush out a few more issues (or
>not -but build confidence that there aren't more issues) I think is an
>appropriate way to roll out a change like this.
>
>(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
>(rnglists/loclists) or in debug_range/debug_loc that uses base address
>specifiers, where the zero-without-addend doesn't have a chance to
>make an empty range (because it's not a start/end pair, it's a
>start+offset pair - so the offset remains non-zero)... if bfd
>tombstoning breaks DWARFv5 parsing (or base address specifiers in
>range/loc) in common consumers I might be more inclined to support
>enabling new tombstoning by default sooner (though if we could enable
>it just for DWARFv5 that might be nice - but not practical, since the
>linker doesn't know what DWARF it's linking & could be linking
>multiple versions))
>
>- Dave

The argument that (.debug_line=>-1 is risky so other .debug_*=>-1 are risky) looks a bit
like a slippery slope argument to me.

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* => -1

has the merit that people will be aware that .debug_* sections can generally have -1
as tombstone values. This is a step towards the full transition (.debug_line => -1).
We keep it for one or two release so that LLD 11 or 12 linked binaries can be debugged by LLDB 10.

If we do b), I don't see how the situation will be different in llvm-project
12.x or 13.x and the same unsafety argument could be made against the
transition.  I don't see how the half transition is more risky than compiler
generating a new DW_FORM_* which wasn't used before.

With -z dead-reloc-in-nonalloc=.debug_*=0 and a detailed release note, I don't
find a) more problematic than b).

>> >> With consideration for satefy for the upcoming release/11.x, we can make
>> >> two choices:
>> >>
>> >> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
>> >> b) .debug_ranges&.debug_loc => -2, other .debug_* => 0
>> >>
>> >> Delaying .debug_line => -1 for one or two release sounds good to me.
>> >> So LLD 11 or 12 linked binaries can be debugged by LLDB 10. This is a
>> >> nice property.
>> >>
>> >> This write-up proposes b), but I'd say a) is likely sufficient. With the
>> >> available information, I cannot yet say that a) will have more risk.
>> >
>> >Risk is about the unknowns - and it still seems like a lot of
>> >unknowns. While there are probably many more consumers that read
>> >.debug_line than other sections, reading debug_info (for instance) is
>> >necessary for inline frames in symbolizing - still probably one of the
>> >most common uses of DWARF I'd guess. (what about stack unwinding using
>> >debug_frame? that'd worry me a bit if anyone got /that/ wrong because
>> >of this change)
>> >
>> >> > - chromium/firefox have some tools that were broken:
>> >> > https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5
>> >>
>> >> This is potentially related to other .debug_* (not .debug_line)
>> >> I hope Chromium developers can chime in here:) The breakage was
>> >> unfortunate but I don't know how we could have avoided that. IMHO this
>> >> is no different from "clang started to emit a new DW_FORM_* and a
>> >> postprocessing tool of .debug chokes on that" Whether we want to
>> >> suppress that particular DW_FORM_* definitely should depend on how
>> >> likely it can cause problems, but we can't yet say we have to hold off
>> >> on a feature for a solved (precisely, mitigated) problem.
>> >
>> >LLVM has no custom forms and I'd be super cautious about adding any
>> >that were on by default because of how bad that breakage would be.
>> >
>> >I'm not so concerned about the problems we know - but what they tell
>> >us about the problems that might arise from use cases we don't know.
>> >All the other projects out there that might have custom DWARF parsers
>> >to do some ad-hoc things.
>> >
>> >(also, ultimately - given how far-reaching this is, I think we'll want
>> >some tidier flags that are more user-focussed. I'd hope for a flag
>> >that gives BFD-like semantics (though I'd be OK with fixing debug_loc
>> >(using 1 instead of 0) to work the same as debug_ranges while we're
>> >there - a minor divergence from BFD, but highly likely to not cause
>> >problems/fall out naturally from a simple implementation of parsing
>> >that section) - something that's been in-use and tested by basically
>> >everyone for decades. And another flag for the new semantics (-2 for
>> >debug_loc/debug_ranges, -1 everywhere else). Customizable per-section
>> >expression-based support I think is a recipe for platform divergence &
>> >I'd rather it not be available/supported at all, but if you really
>> >want to keep it in, I'd at least rather it not be the feature we
>> >promote to users about how they can test/opt in/out of the behavior
>> >when they're seeing breakages or want to test the future semantics)
>> >
>> >> >I'm not sure how to get the word out to DWARF consumers that they should
>> >> >consider this new experimental behavior. Ray's done a good job
>> >> >evangelizing/discussing this with gdb and lldb at least - and of course
>> >> >having turned it on by default briefly has found some users (like Chromium)
>> >> >that we probably wouldn't have found no matter how long we left this as an
>> >> >experimental option... so some things are going to break when we switch no
>> >> >matter what.
>> >>
>> >> Thank you for following up with some GNU folks on their lists!
>> >> If folks want to follow along the thread:
>> >> https://sourceware.org/pipermail/binutils/2020-June/111376.html
>> >>
>> >> We have informed binutils, elfutils-devel (elfutils has a few debug
>> >> tools) and gdb. I don't recall that anyone has thought about problems
>> >> with a tombstone value.
>> >>
>> >> >
>> >> >P.S: Sony's already been using the -1 technique with their debugger and
>> >> >linker for a while, so they may want to keep this on by default for SCE -
>> >> >but I'm not sure how to do that in-tree.
>> >> >
>> >> >
>> >> >Clang doesn't know which lld
>> >> >version it's running, so whether the flag can be specified, I would think?
>> >> >(so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use
>> >> >-1", I think) - if there is a way to make that decision in the compiler
>> >.> >driver+linker, then we'd have a question of "default new behavior except
>> >> >when tuning for LLDB and GDB" or "default bfd behavior except when tuning
>> >> >for SCE".
>> >>
>> >> I've been involed in another thread on SHF_LINK_ORDER (https://sourceware.org/pipermail/binutils/2020-
>> >July/112415.html ).
>> >> We may need a way to tell codegen about the used linker.
>> >>
>> >> pcc proposed -mbinutils-version= - This is nice in that some MC
>> >> decisions related to -fno-integrated-as can use this option as well.
>> >> jyknight proposed -mlinker-version= and syntax like -fuse-ld=bfd:2.34
>> >>
>> >> This may get more complex if the generated object file want to be linked
>> >> with more than one linker. This discussion probably deserves its own
>> >> thread.
>>
>>
>> --
>> Alexey Lapshin
>>