[llvm-dev] Range lists, zero-length functions, linker gc

Thu May 28 22:06:35 PDT 2020

On 2020-05-28, David Blaikie wrote:
>On Thu, May 28, 2020 at 2:52 PM Robinson, Paul <paul.robinson at sony.com>
>wrote:
>
>> As has been mentioned elsewhere, Sony generally fixes up references from
>> debug info to stripped functions (of any length) using -1, because that’s a
>> less-likely-to-be-real address than 0x0 or 0x1.  (0x0 is a typical base
>> address for shared libraries, I’d think using it has the potential to
>> mislead various consumers.)  For .debug_ranges we use -2, because both a
>> 0/0 pair and a -1/-1 pair have a reserved meaning in that section.
>>
>
>Any harm in using -2 everywhere, for consistency?

When resolving a relocation, in certain cases we have to give an undefined symbol a value.
This can happen with:

* an undefined weak symbol
* an undefined global symbol in --noinhibit-exec mode (a buggy --gc-sections implementation can trigger this as well)
* a relocation referencing an undefined symbol in a non-SHF_ALLOC section

We always respect the addend in a relocation entry for an absolute/PC-relative (I can use "most" here)
relocation (R_ARM_THM_PC8, R_AARCH64_ADR_PREL_PG_HI21, R_X86_64_64,
local exec TLS relocation types, ...)
Ignoring the addend (using -2 everywhere) will break this consistency.

The relocated code may do pointer subtraction which would work if addends were
respected, but will break using -2 everywhere.

The relocated code can be allocatable or not. Non-allocatable non-debug
code can have meaningful pointer subtraction as well. This is why I am
not too fond of (using a fixed value everywhere).

>(also, I had a silly idea, but what would happen if we added a CU attribute
>with an address value that was a reference to a weak always-unused symbol,
>that way the linker would fix it up with whatever its preferred magic value
>was, and the consumer would then know what the magic value was that
>represented dead code? (though this would only work if the value were used
>consistently everywhere - which is zero for gold/lld (well, almost... you
>can still create situations where a non-zero value is used even for a
>low_pc), but wouldn't work for binutils ld (1 in debug_ranges, 0 elsewhere)
>or Sony (-2 in debug_ranges, -1 elsewhere)... - so, wouldn't actually work
>for any producer currently, so maybe there's little value in that as a
>feature))

For a non-SHF_ALLOC section, LLD currently considers it a GC root if all
the conditions below are satisfied:

* not SHT_REL[A]
* not SHF_LINK_ORDER
* not in a section group

(I managed to lobby the ideas to GNU ld. GNU ld from binutils 2.35
onwards will have mostly compatible semantics with LLD)

There is a cost fragmenting a .debug_* section: sizeof(Elf64_Shdr)=64 ->
each section takes 64 bytes in the section header table. SHF_LINK_ORDER
has semantics of a lightweight section group. Assume we don't want to
have one .debug_* for each function section, this .debug_* will be a GC
root. Relocations from it (even if the symbol is weak) will retain the
sections defining the symbols.

So, this trick can't work without refining the --gc-sections rules further.

>
>>  If you’re looking only at zero-length functions, you can stop there; but
>> I’m not sure why stopping there solves much of a real problem, as
>> zero-length functions seem like a weird corner case.
>>
>
>They're the case that breaks existing usage by terminating the range list
>early - the other existing usage seems to be fine with "resolve to addend"
>strategy that lld and gold use - in that it moves most dead/deduplicated
>functions outside the executable range and so consumers never come asking
>for "what code is at instruction 5" because they're never executing code at
>a pc of 5. But, yes, this existing solution doesn't work once you have code
>mapped into low address spaces or have utterly massive functions that might
>have a length that would reach into the executable address space even when
>their start is remapped to zero.

For posterity, David gave me an example offline: void f1() { } void f2() { } int main() { f1(); }

clang -fuse-ld=bfd -ffunction-sections -Wl,--gc-sections -g a.c -o a.bfd
llvm-dwarfdump -debug-ranges a.bfd
=>
R_X86_64_64 relocations in .debug_ranges are resolved to 1, ignoring the addend

(Behavior introduced in
https://sourceware.org/git/?p=binutils-gdb.git;a=blobdiff;f=bfd/ChangeLog;h=8fbaed21fa2c8238459acb637545583f3cfbbfdf;hp=18a3a67be3a5980998c4461b5a739e54f3551b17;hb=e4067dbb2a3368dbf908b39c5435c84d51abc9f3;hpb=c0621d88b096cc046adf6ed484baea9ba5bfe721)

The comments below are also insightful. I need to ponder more (and need
to read the DWARF v4 and v5 specs more as I am not so familiar these
DWARF constructs). But it is too late now. Will probably comment
another day :)

>
>> Linkers know how to strip dead functions (gc) or deduplicate them (icf,
>> COMDAT) and people do this all the time, in some cases (COMDAT) without
>> explicitly asking for it, so non-zero-length functions seem like the much
>> more interesting case.  In that situation, -1 (or -2) seems like a much
>> wiser choice of blessed-as-not-real address, versus 0x0 or 0x1.
>>
>>
>>
>> Stripping non-zero-length functions does mean you have to care about more
>> sections.  For example .debug_locs would want to be fixed up the same way
>> as .debug_ranges, not because a debugger would care but so that dumpers
>> would not run into the 0/0 brick wall.
>>
>
>Yep - in theory a consumer could actually use a loclist across multiple
>sections (if a global variable got hoisted into a register for a function
>for instance), but I don't know of any producers doing this today - until
>then, yeah, it's just a dumping problem and ld.bfd does produce DWARF that
>has that problem (because it resolves both relocations to dead code
>(begin/end of a range) to zero in all sections except debug_ranges, so
>terminates the loclist list early) - binutils objdump avoids dumping the
>following corrupted fragment by only dumping hunks of debug_loc starting at
>places referenced from debug_info. Without debug_info it won't dump
>anything from debug_loc - and if the references from debug_info, parsed
>until the 0,0 terminator don't cover the whole debug_loc section, it prints
>messages saying there are "gaps".
>
>Agreed that you'd want debug_loc to have the same special handling as
>debug_ranges if it has special handling. Though ideally we'd pick a value
>that works equally everywhere? (-2, by the sounds of it)
>
>
>> We also fix up lengths in .debug_aranges to zero, although there might be
>> history behind that tactic that I’m not aware of; it seems like it ought to
>> be unnecessary, if consumers are aware of the special address(es).
>>
>
>Yeah, no idea about debug_aranges... I'd have thought it'd be fine with the
>same approach as debug_ranges, but I haven't looked at debug_aranges in a
>long time.
>
>I guess the only remaining question is: Since it's possible to have code on
>some systems down at address zero, or close enough to it that [0, length)
>might overlap with real exxecutable code addresses - does anyone know of
>the inverse: where code is mapped up near uint32 max? Such that that usage
>wouldn't be able to sacrifice uint32 max - 1 to use as a blessed value here?
>
>- Dave
>
>
>>
>>
>> --paulr
>>
>>
>>
>> *From:* Alexey Lapshin <alapshin at accesssoftek.com>
>> *Sent:* Thursday, May 28, 2020 9:03 AM
>> *To:* Sriraman Tallam <tmsriram at google.com>; Wei Mi <wmi at google.com>;
>> Robinson, Paul <paul.robinson at sony.com>; Adrian Prantl <aprantl at apple.com>;
>> Jonas Devlieghere <jdevlieghere at apple.com>; Alexey Lapshin <
>> a.v.lapshin at mail.ru>; Eric Christopher <echristo at gmail.com>; Fangrui Song
>> <maskray at google.com>; David Blaikie <dblaikie at gmail.com>;
>> llvm-dev at lists.llvm.org
>> *Subject:* Re: [llvm-dev] Range lists, zero-length functions, linker gc
>>
>>
>>
>> Hi David,
>>
>>
>>
>> >So there have been several recent discussions about the issues around
>>
>> >DWARF-agnostic linking and gc-sections, linkonce function definitions
>> being
>>
>> >dropped, etc - and just how much DWARF-awareness would be suitable
>>
>> >in a linker to help with this situation.
>>
>>
>> > I'd like to discuss a narrower instance of this issue: Zero length
>> gc'd/deduplicated functions.
>>
>> > LLVM seems to at least produce zero length functions in a few cases:
>> > * non-void function without a return statement
>> > * function definition containing only llvm_unreachable
>> > (both of these trap at -O0, but at higher optimization levels even the
>> trap
>>
>> > instruction is removed & you get the full power UB of control
>> flowing off
>>
>> > the end of the function into whatever other bytes are after that
>> function)
>>
>> > So, for context, debug_ranges (this whole issue doesn't exist in
>> DWARFv5,
>>
>> > FWIW) is a list of address pairs, terminated by a pair of zeros.
>>
>> > With function sections, or even just with normal C++ inline functions,
>>
>> > the CU will have a range entry for that function that consists of two
>> relocations
>>
>> > - to the start and end of the function. Generally the start of the
>> function is the
>>
>> > start of the section, and the end is "start of function + length of
>> function (aka addend)".
>>
>> >  Usually any relocation to the section would keep that section "alive"
>> during linking -
>>
>> > but that would cause debug info to defeat linker GC and deduplication.
>> So there's
>>
>> > special rules for how linkers handle these relocations in debug info to
>> allow the
>>
>> > sections to be dropped - what do you write in the bytes that requested
>> the relocation?
>>
>> > Binutils ld: Special cases only debug_ranges, resolving all relocations
>> to dead
>>
>> > code to 1. In other debug sections, these values are all resolved to
>> zero.
>>
>> > Gold and lld: Special cases all debug info sections - resolving all
>> relocations
>>
>> > to "addend" (so begin usually goes to zero, end goes to "size of
>> function")
>>
>> > These special rules are designed to ensure omitted/gc'd/deduplicated
>> functions
>>
>> > don't cause the range list to terminate prematurely (which would happen
>> if begin/end
>>
>> > were both resolved to zero).
>>
>> >But with an empty function, gold and lld's strategy here fails to avoid
>> terminating a
>>
>> >range list by accident.
>>
>> > What should we do about it?
>>
>> >  1) Ensure no zero-length functions exist? (doesn't address backwards
>>
>> > compatibility/existing functions/other compilers)
>> > 2) adopt the binutils approach to this (at least in debug_ranges - maybe
>> in all
>>
>> > debug sections? (doing it in other sections could break )
>> >  3) Revisit the discussion about using an even more 'blessed' value,
>>
>> > like int max-1? ( https://reviews.llvm.org/D59553
>> <https://urldefense.com/v3/__https:/reviews.llvm.org/D59553__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63HSqgMOg$>
>>  )
>>
>> >  (I don't have links to all the recent threads about this discussion - I
>> think D59553
>>
>> > might've spawned a separate broader discussion/non-review - oh, Alexey
>> wrote a
>>
>> > good summary with links to other discussions here:
>>
>> >  http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html
>> <https://urldefense.com/v3/__http:/lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY638NIRu2g$>
>>  )
>>
>> > Thoughts?
>>
>>
>>
>> I think for the problem of "zero length functions and .debug_ranges"
>> binutils approach looks good:
>>
>> >Special cases only debug_ranges, resolving all relocations to
>> >dead code to 1. In other debug sections, these values are all resolved to
>> >zero.
>>
>> But, this would not completely solve the problem from
>> https://reviews.llvm.org/D59553
>> <https://urldefense.com/v3/__https:/reviews.llvm.org/D59553__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63HSqgMOg$>
>> - Overlapped address ranges. Binutils approach will solve the problem if
>> the address range specified as start_address:end_address. While resolving
>> relocations, it would replace such a range with 1:1.
>> However, It would not work if address ranges were specified as
>> start_address:length since the length is not relocated. This case could be
>> additionally fixed by fast scan debug_info for High_PC defined as length
>> and changing it to 1. Something which you suggested here:
>> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141599.html
>> <https://urldefense.com/v3/__http:/lists.llvm.org/pipermail/llvm-dev/2020-May/141599.html__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63PsubKJQ$>
>> .
>>
>> So it looks like following solution could fix both problems and be
>> relatively fast:
>>
>> "Resolve all relocations from debug sections into dead code to 1. Parse
>> debug sections and replace HighPc of an address range pointing to dead code
>> and specified as length to 1".
>>
>> As the result all address ranges pointing into dead code would be marked
>> as zero length.
>>
>> There still exist another problem:
>>
>> DWARF4: "A range list entry (but not a base address selection or end of
>> list entry) whose beginning and
>> ending addresses are equal has no effect because the size of the range
>> covered by such an
>> entry is zero."
>>
>> DWARF5: "A bounded range entry whose beginning and ending address offsets
>> are equal
>> (including zero) indicates an empty range and may be ignored."
>>
>> These rules allow us to ignore zero-length address ranges. I.e., some tool
>> reading DWARF is permitted to ignore related DWARF entries. In that case,
>> there could be ignored essential descriptions. That problem could happen
>> with -flto=thin example https://reviews.llvm.org/D54747#1503720
>> <https://urldefense.com/v3/__https:/reviews.llvm.org/D54747*1503720__;Iw!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY637ju_eQw$>
>> . In this example, all type definitions except one were replaced with
>> declarations by thinlto. The definition, which was left, is in a piece of
>> debug info related to deleted code. According to zero-length rule, that
>> definition could be ignored, and finally, incomplete debug info could be
>> used.
>>
>> So, it probably should be forbidden to generate debug_info, which could
>> become incomplete after removing pieces related to zero length address
>> ranges. Otherwise, creating zero-length address ranges could lead to
>> incomplete debug info.
>>
>>
>>
>> Thank you, Alexey.
>>
>>
>>