[PATCH] D71100: [lld][RISCV] Fixup PC-relative relocations to undefined weak symbols.
John Baldwin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 17 17:19:14 PST 2019
bsdjhb marked an inline comment as done.
bsdjhb added inline comments.
================
Comment at: lld/ELF/InputSection.cpp:982
break;
+ case R_PC:
+ if (config->emachine == EM_RISCV && rel.sym->isUndefWeak() &&
----------------
MaskRay wrote:
> MaskRay wrote:
> > jrtc27 wrote:
> > > jrtc27 wrote:
> > > > jrtc27 wrote:
> > > > > MaskRay wrote:
> > > > > > Adding the code here is not correct. The most appropriate place is `getRelocTargetVA`. However, I don't recommend adding any logic at all because weak undefined symbols do not have defined semantics. This means multiple things:
> > > > > >
> > > > > > 1) their behaviors may be different in different toolchains
> > > > > > 2) their behavior may change with various configuration (-no-pie, -pie, -shared) even in the same toolchain
> > > > > >
> > > > > > The binutils behaviors may sometime not easy to mimic in lld because their internals are so different.
> > > > > >
> > > > > > I did add the following two lines for EM_PPC, but that was because I had no choice but to support that glibc use case.
> > > > > >
> > > > > > else if (config->emachine == EM_PPC)
> > > > > > dest = p;
> > > > > >
> > > > > > For anything newly developed, we should try our best to avoid weak undefined symbols.
> > > > > >
> > > > > > Can you share with me some instructions to build FreeBSD RISC-V?
> > > > > *Calling* weak undefined symbols does not necessarily have well-defined semantics, depending on the architecture, but referencing them absolutely does (namely that the result of the code sequence is NULL) and is required for many systems, not just FreeBSD. The issue here is that, with `-mcmodel=medany` (LLVM's medium) in position-dependent code generation, the addressing mode is PC-relative, which breaks if linked at an address beyond 2 GiB as the 32-bit immediate pair is too small to negate PC, but the whole point of medany is to allow linking code at any address, including above the 2 GiB mark (otherwise you'd just use medlow).
> > > > The alternative is to generate code using the GOT for extern weak symbols, as is done on AArch64 (which also otherwise uses PC-relative addressing), but that would require changes for both GCC and LLVM.
> > > Moreover, even if calling undefined weak functions is not well-defined in the ABI, it still needs to link without error to *something*, otherwise you can't write the perfectly-legitimate common construct that is:
> > >
> > > ```
> > > extern void foo(void) __attribute__((weak));
> > > if (&foo)
> > > foo();
> > > ```
> > >
> > > The call can be turned into whatever you like at link time if the symbol is undefined, because it should be guarded by an if (and it's undefined behaviour otherwise, I would assume), but the linker needs to do something that's not failing with an error.
> > No. http://www.sco.com/developers/gabi/latest/ch4.symtab.html
> >
> > > The behavior of weak symbols in areas not specified by this document is implementation defined. Weak symbols are intended primarily for use in system software. Applications using weak symbols are unreliable since changes in the runtime environment might cause the execution to fail.
> >
> > You may argue that we are targeting system software, but I still want to say that the GNU ld behavior may vary according to (1) instruction types, or (2) executable vs shared objects. lld's treatment is still different from GNU ld and gold in some cases on other architectures.
> >
> > ```
> > extern void foo(void) __attribute__((weak));
> > if (&foo)
> > foo();
> > ```
> >
> > I hope we can make some efforts to get rid of such constructs in FreeBSD first, before trying a linker workaround.
> >
> > I haven't checked the logic below clearly. It appears to be more of a hack.
> >
> > ```hiRel && hiRel->type == R_RISCV_PCREL_HI20
> > && hiRel->sym->isUndefWeak() && hiRel->addend == 0
> > ```
> > The alternative is to generate code using the GOT for extern weak symbols, as is done on AArch64 (which also otherwise uses PC-relative addressing), but that would require changes for both GCC and LLVM.
>
> Fixing GCC/llvm sounds a good idea to me. I can check how to fix it in llvm. Apparently PPC64 also decided to use GOT when they migrated from PPC32. (I am now just very concerned that some binutils/gcc decisions were made before people think hard about the implications... from the binutils bugs I reported, I at least can confirm that many bugs were copied from elf32-arm.c and elf*-mips.c ...)
We can't really remove all weak symbols from FreeBSD's kernel. FreeBSD's kernel uses linker sets extensively to build link-time generated arrays of pointers to objects. If a linker set is empty, the start and stop symbols for that linker set are weak-undefined symbols that FreeBSD requires to resolve to NULL. Removing linker sets from FreeBSD and/or mandating that all linker sets are non-empty are not feasible.
I've opened an issue on the RISC-V psabi doc to track the ABI question:
https://github.com/riscv/riscv-elf-psabi-doc/issues/126
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71100/new/
https://reviews.llvm.org/D71100
More information about the llvm-commits
mailing list