[llvm-dev] Range lists, zero-length functions, linker gc
David Blaikie via llvm-dev
llvm-dev at lists.llvm.org
Fri May 29 14:00:14 PDT 2020
On Fri, May 29, 2020 at 9:21 AM Robinson, Paul <paul.robinson at sony.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Fangrui Song <maskray at google.com>
> > Sent: Friday, May 29, 2020 1:07 AM
> > To: David Blaikie <dblaikie at gmail.com>
> > Cc: Robinson, Paul <paul.robinson at sony.com>; Alexey Lapshin
> > <alapshin at accesssoftek.com>; Sriraman Tallam <tmsriram at google.com>; Wei Mi
> > <wmi at google.com>; Adrian Prantl <aprantl at apple.com>; Jonas Devlieghere
> > <jdevlieghere at apple.com>; Alexey Lapshin <a.v.lapshin at mail.ru>; Eric
> > Christopher <echristo at gmail.com>; peter.smith at arm.com;
> > grimar at accesssoftek.com; llvm-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc
> >
> > On 2020-05-28, David Blaikie wrote:
> > >On Thu, May 28, 2020 at 2:52 PM Robinson, Paul <paul.robinson at sony.com>
> > >wrote:
> > >
> > >> As has been mentioned elsewhere, Sony generally fixes up references
> > from
> > >> debug info to stripped functions (of any length) using -1, because
> > that’s a
> > >> less-likely-to-be-real address than 0x0 or 0x1. (0x0 is a typical base
> > >> address for shared libraries, I’d think using it has the potential to
> > >> mislead various consumers.) For .debug_ranges we use -2, because both
> > a
> > >> 0/0 pair and a -1/-1 pair have a reserved meaning in that section.
> > >>
> > >
> > >Any harm in using -2 everywhere, for consistency?
> >
> > When resolving a relocation, in certain cases we have to give an undefined
> > symbol a value.
> > This can happen with:
> >
> > * an undefined weak symbol
> > * an undefined global symbol in --noinhibit-exec mode (a buggy --gc-
> > sections implementation can trigger this as well)
> > * a relocation referencing an undefined symbol in a non-SHF_ALLOC section
> >
> > We always respect the addend in a relocation entry for an absolute/PC-
> > relative (I can use "most" here)
> > relocation (R_ARM_THM_PC8, R_AARCH64_ADR_PREL_PG_HI21, R_X86_64_64,
> > local exec TLS relocation types, ...)
> > Ignoring the addend (using -2 everywhere) will break this consistency.
> >
> > The relocated code may do pointer subtraction which would work if addends
> > were
> > respected, but will break using -2 everywhere.
>
> I suspect David meant "any harm to using -2 in all .debug_* sections?"
> and not literally everywhere. Sony does special cases only for the
> .debug_* sections.
Right - thanks for the clarification.
> I've been meaning to propose that DWARF v6 reserve a special address for
> this kind of situation. Whether the committee would be willing to make
> it be -1 or -2 for all targets, or make it target-defined, I don't know.
> (Dreading the inevitable argument over whether addresses are signed or
> unsigned, or more to the point whether they wrap. They've been unsigned
> and wrapping was undefined on the small set of machines I'm familiar with.)
> Certainly the toolchain community would benefit from making it be the
> same everywhere.
>
> Personally I'd vote for -1, and make pre-v5 .debug_loc/.debug_ranges
> sections be an extra-special case using -2. We can (I hope) standardize
> on -1 for v6 onward, and document -1/-2 on the DWARF wiki as recommended
> practice for prior versions.
That'd make linking difficult - the unix linkers at least, currently
don't have to identify the DWARF version when linking - having to pass
an extra linking flag or have the linker parse any DWARF (what if an
object file contains more than one CU & the linker has to apply
different relocations in different parts of the object file because of
that?) would be a significant cost/problem, I think.
Though I like the tidiness of -1 everywhere, that backwards
compatibility with debug_ranges (& debug_loc similarly) is a problem.
Though ld.bfd does special case debug_ranges (& should special case
debug_loc), perhaps that's the solution. -2 for debug_ranges and
debug_loc, -1 everywhere else (which effectively means everywhere in
DWARFv5 onwards)?
>
>
> >
> > The relocated code can be allocatable or not. Non-allocatable non-debug
> > code can have meaningful pointer subtraction as well. This is why I am
> > not too fond of (using a fixed value everywhere).
> >
> > >(also, I had a silly idea, but what would happen if we added a CU
> > attribute
> > >with an address value that was a reference to a weak always-unused
> > symbol,
> > >that way the linker would fix it up with whatever its preferred magic
> > value
> > >was, and the consumer would then know what the magic value was that
> > >represented dead code? (though this would only work if the value were
> > used
> > >consistently everywhere - which is zero for gold/lld (well, almost... you
> > >can still create situations where a non-zero value is used even for a
> > >low_pc), but wouldn't work for binutils ld (1 in debug_ranges, 0
> > elsewhere)
> > >or Sony (-2 in debug_ranges, -1 elsewhere)... - so, wouldn't actually
> > work
> > >for any producer currently, so maybe there's little value in that as a
> > >feature))
> >
> > For a non-SHF_ALLOC section, LLD currently considers it a GC root if all
> > the conditions below are satisfied:
> >
> > * not SHT_REL[A]
> > * not SHF_LINK_ORDER
> > * not in a section group
> >
> > (I managed to lobby the ideas to GNU ld. GNU ld from binutils 2.35
> > onwards will have mostly compatible semantics with LLD)
> >
> > There is a cost fragmenting a .debug_* section: sizeof(Elf64_Shdr)=64 ->
> > each section takes 64 bytes in the section header table.
Soryr, I missed a step here - you're talking about the cost to
fragmenting .debug_* sections as an alternative to choosing a special
address value to resolve for dead code? (by removing the DWARF that
refers to the dead code, uinstead of keeping it and having to write a
special address value into it?)
Unfortunately, no matter the cost - that solution doesn't apply to
Split DWARF. Maybe at some point we'll want to have some output from
the linker that lists the dead/live code, and use that for building a
dwp (like dsymutil) but I don't think we can predicate correctness on
such a thing - in part because we'd still want to be able to read the
.dwo files without post-processing for more interactive/iterative
development scenarios. So we'd still need a special address value to
write into debug_addr when using Split DWARF, and I think it's
important to allow the non-split case to look like the split case
where it doesn't /have/ to diverge - if divergence provides benefits,
that's nice, but I don't think it'd be good to make that divergence
/necessary/.
> > SHF_LINK_ORDER
> > has semantics of a lightweight section group. Assume we don't want to
> > have one .debug_* for each function section, this .debug_* will be a GC
> > root. Relocations from it (even if the symbol is weak) will retain the
> > sections defining the symbols.
>
> We did some quick research into per-function .debug_info fragments a
> while back, putting the subprogram info into the same section group as
> the function; it was not an unqualified win. The very large number of
> sections costs processing time, and cross-section references added to
> the relocation count (I believe these can generally be resolved by MC
> in a non-fragmented .debug_info section).
Yeah, you either pay more relocations (or size cost to use signatures
instead) and/or more size to duplicate some DIEs (like type units
currently duplicate fundamental/non-user-defined types into the type
unit).
(& it's a non-starter for Split DWARF anyway)
> James Henderson might have
> the actual results stashed somewhere.
>
> That approach *might* still be faster than post-processing a unified
> section, which IIUC is what D59553 does.
>
> >
> > So, this trick can't work without refining the --gc-sections rules
> > further.
>
> If I understand the objection, yeah, we can't have .debug_* sections
> being gc roots.
I'm not sure I follow, here - "this trick" being "splitting debug info
into droppable chunks for each function/subprogram and putting those
chunks in the same comdat group as the function code itself" - that
trick wouldn't work because the current rules would move that debug
info from (where it currently is in one big debug_info section) a
non-gc root to (where it would go - becoming part of a comdat group) a
gc-root? OK, right. Agreed that's another reason (apart from the Split
DWARF one, and the size/reloc tradeoff ones) that would be
problematic.
- Dave
>
> --paulr
>
> >
> > >
> > >> If you’re looking only at zero-length functions, you can stop there;
> > but
> > >> I’m not sure why stopping there solves much of a real problem, as
> > >> zero-length functions seem like a weird corner case.
> > >>
> > >
> > >They're the case that breaks existing usage by terminating the range list
> > >early - the other existing usage seems to be fine with "resolve to
> > addend"
> > >strategy that lld and gold use - in that it moves most dead/deduplicated
> > >functions outside the executable range and so consumers never come asking
> > >for "what code is at instruction 5" because they're never executing code
> > at
> > >a pc of 5. But, yes, this existing solution doesn't work once you have
> > code
> > >mapped into low address spaces or have utterly massive functions that
> > might
> > >have a length that would reach into the executable address space even
> > when
> > >their start is remapped to zero.
> >
> > For posterity, David gave me an example offline: void f1() { } void f2() {
> > } int main() { f1(); }
> >
> > clang -fuse-ld=bfd -ffunction-sections -Wl,--gc-sections -g a.c -o a.bfd
> > llvm-dwarfdump -debug-ranges a.bfd
> > =>
> > R_X86_64_64 relocations in .debug_ranges are resolved to 1, ignoring the
> > addend
> >
> > (Behavior introduced in
> > https://sourceware.org/git/?p=binutils-
> > gdb.git;a=blobdiff;f=bfd*ChangeLog;h=8fbaed21fa2c8238459acb637545583f3cfbb
> > fdf;hp=18a3a67be3a5980998c4461b5a739e54f3551b17;hb=e4067dbb2a3368dbf908b39
> > c5435c84d51abc9f3;hpb=c0621d88b096cc046adf6ed484baea9ba5bfe721)
> >
> > The comments below are also insightful. I need to ponder more (and need
> > to read the DWARF v4 and v5 specs more as I am not so familiar these
> > DWARF constructs). But it is too late now. Will probably comment
> > another day :)
> >
> > >
> > >> Linkers know how to strip dead functions (gc) or deduplicate them (icf,
> > >> COMDAT) and people do this all the time, in some cases (COMDAT) without
> > >> explicitly asking for it, so non-zero-length functions seem like the
> > much
> > >> more interesting case. In that situation, -1 (or -2) seems like a much
> > >> wiser choice of blessed-as-not-real address, versus 0x0 or 0x1.
> > >>
> > >>
> > >>
> > >> Stripping non-zero-length functions does mean you have to care about
> > more
> > >> sections. For example .debug_locs would want to be fixed up the same
> > way
> > >> as .debug_ranges, not because a debugger would care but so that dumpers
> > >> would not run into the 0/0 brick wall.
> > >>
> > >
> > >Yep - in theory a consumer could actually use a loclist across multiple
> > >sections (if a global variable got hoisted into a register for a function
> > >for instance), but I don't know of any producers doing this today - until
> > >then, yeah, it's just a dumping problem and ld.bfd does produce DWARF
> > that
> > >has that problem (because it resolves both relocations to dead code
> > >(begin/end of a range) to zero in all sections except debug_ranges, so
> > >terminates the loclist list early) - binutils objdump avoids dumping the
> > >following corrupted fragment by only dumping hunks of debug_loc starting
> > at
> > >places referenced from debug_info. Without debug_info it won't dump
> > >anything from debug_loc - and if the references from debug_info, parsed
> > >until the 0,0 terminator don't cover the whole debug_loc section, it
> > prints
> > >messages saying there are "gaps".
> > >
> > >Agreed that you'd want debug_loc to have the same special handling as
> > >debug_ranges if it has special handling. Though ideally we'd pick a value
> > >that works equally everywhere? (-2, by the sounds of it)
> > >
> > >
> > >> We also fix up lengths in .debug_aranges to zero, although there might
> > be
> > >> history behind that tactic that I’m not aware of; it seems like it
> > ought to
> > >> be unnecessary, if consumers are aware of the special address(es).
> > >>
> > >
> > >Yeah, no idea about debug_aranges... I'd have thought it'd be fine with
> > the
> > >same approach as debug_ranges, but I haven't looked at debug_aranges in a
> > >long time.
> > >
> > >I guess the only remaining question is: Since it's possible to have code
> > on
> > >some systems down at address zero, or close enough to it that [0, length)
> > >might overlap with real exxecutable code addresses - does anyone know of
> > >the inverse: where code is mapped up near uint32 max? Such that that
> > usage
> > >wouldn't be able to sacrifice uint32 max - 1 to use as a blessed value
> > here?
> > >
> > >- Dave
> > >
> > >
> > >>
> > >>
> > >> --paulr
> > >>
> > >>
> > >>
> > >> *From:* Alexey Lapshin <alapshin at accesssoftek.com>
> > >> *Sent:* Thursday, May 28, 2020 9:03 AM
> > >> *To:* Sriraman Tallam <tmsriram at google.com>; Wei Mi <wmi at google.com>;
> > >> Robinson, Paul <paul.robinson at sony.com>; Adrian Prantl
> > <aprantl at apple.com>;
> > >> Jonas Devlieghere <jdevlieghere at apple.com>; Alexey Lapshin <
> > >> a.v.lapshin at mail.ru>; Eric Christopher <echristo at gmail.com>; Fangrui
> > Song
> > >> <maskray at google.com>; David Blaikie <dblaikie at gmail.com>;
> > >> llvm-dev at lists.llvm.org
> > >> *Subject:* Re: [llvm-dev] Range lists, zero-length functions, linker gc
> > >>
> > >>
> > >>
> > >> Hi David,
> > >>
> > >>
> > >>
> > >> >So there have been several recent discussions about the issues around
> > >>
> > >> >DWARF-agnostic linking and gc-sections, linkonce function definitions
> > >> being
> > >>
> > >> >dropped, etc - and just how much DWARF-awareness would be suitable
> > >>
> > >> >in a linker to help with this situation.
> > >>
> > >>
> > >> > I'd like to discuss a narrower instance of this issue: Zero length
> > >> gc'd/deduplicated functions.
> > >>
> > >> > LLVM seems to at least produce zero length functions in a few cases:
> > >> > * non-void function without a return statement
> > >> > * function definition containing only llvm_unreachable
> > >> > (both of these trap at -O0, but at higher optimization levels even
> > the
> > >> trap
> > >>
> > >> > instruction is removed & you get the full power UB of control
> > >> flowing off
> > >>
> > >> > the end of the function into whatever other bytes are after that
> > >> function)
> > >>
> > >> > So, for context, debug_ranges (this whole issue doesn't exist in
> > >> DWARFv5,
> > >>
> > >> > FWIW) is a list of address pairs, terminated by a pair of zeros.
> > >>
> > >> > With function sections, or even just with normal C++ inline
> > functions,
> > >>
> > >> > the CU will have a range entry for that function that consists of two
> > >> relocations
> > >>
> > >> > - to the start and end of the function. Generally the start of the
> > >> function is the
> > >>
> > >> > start of the section, and the end is "start of function + length of
> > >> function (aka addend)".
> > >>
> > >> > Usually any relocation to the section would keep that section
> > "alive"
> > >> during linking -
> > >>
> > >> > but that would cause debug info to defeat linker GC and
> > deduplication.
> > >> So there's
> > >>
> > >> > special rules for how linkers handle these relocations in debug info
> > to
> > >> allow the
> > >>
> > >> > sections to be dropped - what do you write in the bytes that
> > requested
> > >> the relocation?
> > >>
> > >> > Binutils ld: Special cases only debug_ranges, resolving all
> > relocations
> > >> to dead
> > >>
> > >> > code to 1. In other debug sections, these values are all resolved to
> > >> zero.
> > >>
> > >> > Gold and lld: Special cases all debug info sections - resolving all
> > >> relocations
> > >>
> > >> > to "addend" (so begin usually goes to zero, end goes to "size of
> > >> function")
> > >>
> > >> > These special rules are designed to ensure omitted/gc'd/deduplicated
> > >> functions
> > >>
> > >> > don't cause the range list to terminate prematurely (which would
> > happen
> > >> if begin/end
> > >>
> > >> > were both resolved to zero).
> > >>
> > >> >But with an empty function, gold and lld's strategy here fails to
> > avoid
> > >> terminating a
> > >>
> > >> >range list by accident.
> > >>
> > >> > What should we do about it?
> > >>
> > >> > 1) Ensure no zero-length functions exist? (doesn't address backwards
> > >>
> > >> > compatibility/existing functions/other compilers)
> > >> > 2) adopt the binutils approach to this (at least in debug_ranges -
> > maybe
> > >> in all
> > >>
> > >> > debug sections? (doing it in other sections could break )
> > >> > 3) Revisit the discussion about using an even more 'blessed' value,
> > >>
> > >> > like int max-1? (
> > https://urldefense.com/v3/__https://reviews.llvm.org/D59553__;!!JmoZiZGBv3
> > RvKRSx!sRjL4Vdx9oC8TPFhKZ-QbL7LtpIL-1Zdb4OydT2xVhpDTRyUixtaozLYiewZqMLtoA$
> > >>
> > <https://urldefense.com/v3/__https:/reviews.llvm.org/D59553__;!!JmoZiZGBv3
> > RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63HSqgMOg$
> > >
> > >> )
> > >>
> > >> > (I don't have links to all the recent threads about this discussion
> > - I
> > >> think D59553
> > >>
> > >> > might've spawned a separate broader discussion/non-review - oh,
> > Alexey
> > >> wrote a
> > >>
> > >> > good summary with links to other discussions here:
> > >>
> > >> > https://urldefense.com/v3/__http://lists.llvm.org/pipermail/llvm-
> > dev/2019-September/135068.html__;!!JmoZiZGBv3RvKRSx!sRjL4Vdx9oC8TPFhKZ-
> > QbL7LtpIL-1Zdb4OydT2xVhpDTRyUixtaozLYiey_aMV0lQ$
> > >> <https://urldefense.com/v3/__http:/lists.llvm.org/pipermail/llvm-
> > dev/2019-
> > September/135068.html__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE
> > 70_57b4_rsj1TN0qB8NpBvVKtY638NIRu2g$>
> > >> )
> > >>
> > >> > Thoughts?
> > >>
> > >>
> > >>
> > >> I think for the problem of "zero length functions and .debug_ranges"
> > >> binutils approach looks good:
> > >>
> > >> >Special cases only debug_ranges, resolving all relocations to
> > >> >dead code to 1. In other debug sections, these values are all resolved
> > to
> > >> >zero.
> > >>
> > >> But, this would not completely solve the problem from
> > >>
> > https://urldefense.com/v3/__https://reviews.llvm.org/D59553__;!!JmoZiZGBv3
> > RvKRSx!sRjL4Vdx9oC8TPFhKZ-QbL7LtpIL-1Zdb4OydT2xVhpDTRyUixtaozLYiewZqMLtoA$
> > >>
> > <https://urldefense.com/v3/__https:/reviews.llvm.org/D59553__;!!JmoZiZGBv3
> > RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63HSqgMOg$
> > >
> > >> - Overlapped address ranges. Binutils approach will solve the problem
> > if
> > >> the address range specified as start_address:end_address. While
> > resolving
> > >> relocations, it would replace such a range with 1:1.
> > >> However, It would not work if address ranges were specified as
> > >> start_address:length since the length is not relocated. This case could
> > be
> > >> additionally fixed by fast scan debug_info for High_PC defined as
> > length
> > >> and changing it to 1. Something which you suggested here:
> > >> https://urldefense.com/v3/__http://lists.llvm.org/pipermail/llvm-
> > dev/2020-May/141599.html__;!!JmoZiZGBv3RvKRSx!sRjL4Vdx9oC8TPFhKZ-
> > QbL7LtpIL-1Zdb4OydT2xVhpDTRyUixtaozLYiexb8NU_Fw$
> > >> <https://urldefense.com/v3/__http:/lists.llvm.org/pipermail/llvm-
> > dev/2020-
> > May/141599.html__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b
> > 4_rsj1TN0qB8NpBvVKtY63PsubKJQ$>
> > >> .
> > >>
> > >> So it looks like following solution could fix both problems and be
> > >> relatively fast:
> > >>
> > >> "Resolve all relocations from debug sections into dead code to 1. Parse
> > >> debug sections and replace HighPc of an address range pointing to dead
> > code
> > >> and specified as length to 1".
> > >>
> > >> As the result all address ranges pointing into dead code would be
> > marked
> > >> as zero length.
> > >>
> > >> There still exist another problem:
> > >>
> > >> DWARF4: "A range list entry (but not a base address selection or end of
> > >> list entry) whose beginning and
> > >> ending addresses are equal has no effect because the size of the range
> > >> covered by such an
> > >> entry is zero."
> > >>
> > >> DWARF5: "A bounded range entry whose beginning and ending address
> > offsets
> > >> are equal
> > >> (including zero) indicates an empty range and may be ignored."
> > >>
> > >> These rules allow us to ignore zero-length address ranges. I.e., some
> > tool
> > >> reading DWARF is permitted to ignore related DWARF entries. In that
> > case,
> > >> there could be ignored essential descriptions. That problem could
> > happen
> > >> with -flto=thin example
> > https://urldefense.com/v3/__https://reviews.llvm.org/D54747*1503720__;Iw!!
> > JmoZiZGBv3RvKRSx!sRjL4Vdx9oC8TPFhKZ-QbL7LtpIL-
> > 1Zdb4OydT2xVhpDTRyUixtaozLYiezSujGHwQ$
> > >>
> > <https://urldefense.com/v3/__https:/reviews.llvm.org/D54747*1503720__;Iw!!
> > JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY
> > 637ju_eQw$>
> > >> . In this example, all type definitions except one were replaced with
> > >> declarations by thinlto. The definition, which was left, is in a piece
> > of
> > >> debug info related to deleted code. According to zero-length rule, that
> > >> definition could be ignored, and finally, incomplete debug info could
> > be
> > >> used.
> > >>
> > >> So, it probably should be forbidden to generate debug_info, which could
> > >> become incomplete after removing pieces related to zero length address
> > >> ranges. Otherwise, creating zero-length address ranges could lead to
> > >> incomplete debug info.
> > >>
> > >>
> > >>
> > >> Thank you, Alexey.
> > >>
> > >>
> > >>
More information about the llvm-dev
mailing list