[cfe-dev] [llvm-dev] -fpic ELF default: reclaim some -fno-semantic-interposition optimization opportunities?

Fangrui Song via cfe-dev cfe-dev at lists.llvm.org
Sun Jun 6 21:22:01 PDT 2021


Personally I care more about the function case.
The function case improves performance (default ld
-Bsymbolic-non-weak-functions.
https://sourceware.org/pipermail/binutils/2021-May/116748.html).

For the variable case (copy relocations) I care less. I just don't want GNU
folks to make the scheme too complex.
https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8

Anyway, my replies to copy relocations are below.

On 2021-06-06, Joerg Sonnenberger via cfe-dev wrote:
>On Sun, Jun 06, 2021 at 10:50:41AM -0700, Fāng-ruì Sòng wrote:
>> On Sun, Jun 6, 2021 at 7:08 AM Joerg Sonnenberger via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>> >
>> > On Sat, Jun 05, 2021 at 06:08:57PM -0700, Fāng-ruì Sòng via llvm-dev wrote:
>> > > On 2021-06-06, Joerg Sonnenberger wrote:
>> > > > On Fri, Jun 04, 2021 at 03:26:53PM -0700, Fāng-ruì Sòng via llvm-dev wrote:
>> > > > > Fixing the last point is actually easy: let -fno-pic use GOT when
>> > > > > taking the address of an non-definition function.
>> > > >
>> > > > I'd far prefer to have an attribute to explicitly say that the address
>> > > > of a given symbol should always be computed indirectly (e.g. via GOT).
>> > > > That gives the explicit control necessary for libraries without
>> > > > penalizing the larger executables like clang.
>> > > >
>> > > > Joerg
>> > >
>> > > Taking the address (in code) of a non-definition function is rare,
>> > > rarer after optimization. At least when building clang, I cannot find
>> > > any penalizing.
>> >
>> > I was not talking about just functions. I can't even think of a case
>> > where pointer equality for function pointers matters. But the case I
>> > care far more about is being able to avoid copy relocations for global
>> > variables and that's the same problem (loading the address of a symbol).
>> >
>> > Joerg
>>
>> On the Clang side, `-fno-pic -fno-direct-access-external-data` uses
>> GOT to access a default visibility global variable today.
>> If all TUs use this option and assembly files do the right thing, copy
>> relocations can be avoided.
>
>Most code in the wild doesn't use visibility flags and would be
>penalized by that. An attribute would allow explicitly opting out of it
>of direct access for system headers and other libraries.

OpenBSD has PIE enabled by default on most architectures since OpenBSD 5.3.
All(most?) major Linux distributions have configured their GCC with
--enable-default-pie now.
FreeBSD has switched to default PIE for 64-bit architectures this year.
Users who care about -fno-pic performance are very few now.

The static linking scheme is shifting to the static PIE model as well.
(The trend was led by OpenBSD, followed by musl in 2015, followed by
glibc world in 2017
https://sourceware.org/bugzilla/show_bug.cgi?id=19574)

Global variable access can hardly take 1% time of an application.  Using
a direct variable access or an indirect access via a prefilled GOT entry
is optimization in that 0.xx% case.

```
extern int var;
int foo() { return var; }
```

I know i386 and ppc32 can take a large great performance hit if we use GOT.
If we want to default -fno-pic to -fno-direct-access-external-data,
we can leave such arch behind. I just checked, -target i386 and -target ppc32
-fno-direct-access-external-data do not use GOT - the backend has not
implemented the non-pic GOT scheme.

>> I know some folks prefer eliminating copy relocations for ABI and
>> security reasons.
>> I deliberately make the scope narrow to functions because functions
>> are where we can improve performance.
>
>For functions there are two cases: "unnamed" address use and "named"
>address use. Kind of similar to what we have already for global
>variables on whether they can be merged or not. Unnamed as in "I don't
>care if it is the canonical address", so the linker is free to introduce
>a PLT slot. This works fine on all architectures and without any
>penalties if the binding is local. There might be some flag needed here
>because the glibc implementation of the dynamic linker wants to do some
>wonky fixup on the PLT, but that's a glibc specific issue and outside
>the scope of LLVM. For the named address use we do care about the
>canonical address and that's where the distinction of attributed vs
>default assumption makes a difference: loading a pointer from the GOT vs
>doing a (PC relative) address load. On i386 the former didn't have
>patchable relocation support for a long time and I'm not sure it exists
>nowadays, i.e. allow the linker to relax the mov into lea.

The x86-64 mov->lea scheme is called GOTPCRELX optimization.

i386 has the `mov foo at GOT(%reg1), %reg2` => `lea foo at GOTOFF(%reg1), %reg2` optimization.
Anyway i386 performance probably doesn't matters for anything now.

>It can be
>even more complicated on other archs where address computations are
>complicated like Sparc. The attribute infrastructure here is the same as
>would be needed for global variables and those are where the more
>expensive issues are. Copy relocations e.g. for a constant array can be
>arbitrarily expensive and are an ABI maintainance nightmare, so finally
>having a way that is cheap to avoid them would be a great step forward.

Yes, I have seen such a large constant array, perhaps from some old ffmpeg
assembly code, or something like that.

There is a minor security risk (relro data can become writeable; ld.lld has
fixed the problem for non-linker-script case).

>Proposal for this would be to have an attribute to specify the "owner"
>of the implementation as a string and a matching clang option to specify
>a non-default owner (e.g. __attributed__((definedby("libc"))) and
>-fdefining=libc) and the empty string being the default, meaning the
>main binary.

How does your "definedby" scheme improve external variable access performance?

Windows/macOS/Solaris do record whether the symbols are imported from,
but the information is only recorded after linking.
Object files don't record imports. This provides flexibility reorganizing libraries
without needing to fix up the code.

>Joerg
>_______________________________________________
>cfe-dev mailing list
>cfe-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


More information about the cfe-dev mailing list