[lld] [lld] Add target support for SystemZ (s390x) (PR #75643)

Fri Feb 16 17:51:15 PST 2024

rui314 wrote:

In mold, the .plt entry loads the target from .got.plt, and for s390x, mold initializes all .got.plt entries to point to the beginning of .plt or the PLT stub.

Likewise, in  mold, the .plt.got entry loads the target from .got. In this case, no lazy resolution is involved.

.plt and .plt.got are mutually exclusive. You can think of .plt.got in mold as an optimization; if we already have a .got entry for a symbol, we don't need to resolve it again lazily at runtime because it's address is already available in .got at load-time. That's why we have .plt.got besides .plt.

.plt and .plt.got are mutually exclusive. Therefore, each function takes 16 bytes for the PLT entry.

> Even in the "fast" path (once lazy resolution has happened), we now always use basr. This clobbers r0 - which is allowed by the ABI, but there might be some complications with other stubs maybe (e.g. mcount stubs? we'd need to check). Also, the basr might have performance implications as on some microarchitecture implementations it might confuse the call/return stack tracking by the branch predictor.

That's what I thought too. I thought that since the ABI requires r14 to be used as a return address, basr with other register as a return address wouldn't be considered as a function call in the microarchitecture and doesn't confuse the Return Address Stack, but that's just my assumption. It'd be awesome if you can ask the processor team how RAS works on s390x.

> The addresses in .got.plt now no longer implement the target function ABI, but implicitly require r0 and r1 to be set up correctly. This would break "PLT inlining" via the R_390_GOTPLT family of instructions. This is not currently used by the default toolchains, but as long as the relocations are there, I guess they need to work as expected ...

PLT inlining implies disabling lazy symbol resolution, no? I believe if PLT inlining is enabled, the inlined PLT refers to the GOT entry of the function instead of PLT. Therefore, in the mold's scheme, inlined PLT would branch to .plt.got instead of to .plt, and .plt.got doesn't assume anything about r0/r1 as it doesn't do lazy symbol resolution. So I guess that'd just work?

https://github.com/llvm/llvm-project/pull/75643