[libc-dev] [musl] Powerpc Linux 'scv' system call ABI proposal take 2
Rich Felker via libc-dev
libc-dev at lists.llvm.org
Thu Apr 16 09:52:57 PDT 2020
On Thu, Apr 16, 2020 at 06:42:32PM +0200, Florian Weimer wrote:
> * Rich Felker:
> > On Thu, Apr 16, 2020 at 06:48:44AM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> > My preference would be that it work just like the i386 AT_SYSINFO
> >> > where you just replace "int $128" with "call *%%gs:16" and the kernel
> >> > provides a stub in the vdso that performs either scv or the old
> >> > mechanism with the same calling convention.
> >> The i386 mechanism has received some criticism because it provides an
> >> effective means to redirect execution flow to anyone who can write to
> >> the TCB. I am not sure if it makes sense to copy it.
> > Indeed that's a good point. Do you have ideas for making it equally
> > efficient without use of a function pointer in the TCB?
> We could add a shared non-writable mapping at a 64K offset from the
> thread pointer and store the function pointer or the code there. Then
> it would be safe.
> However, since this is apparently tied to POWER9 and we already have a
> POWER9 multilib, and assuming that we are going to backport the kernel
> change, I would tweak the selection criterion for that multilib to
> include the new HWCAP2 flag. If a user runs this glibc on a kernel
> which does not have support, they will get set baseline (POWER8)
> multilib, which still works. This way, outside the dynamic loader, no
> run-time dispatch is needed at all. I guess this is not at all the
> answer you were looking for. 8-)
How does this work with -static? :-)
> If a single binary is needed, I would perhaps follow what Arm did for
> -moutline-atomics: lay out the code so that its easy to execute for
> the non-POWER9 case, assuming that POWER9 machines will be better at
> predicting things than their predecessors.
> Or you could also put the function pointer into a RELRO segment. Then
> there's overlap with the __libc_single_threaded discussion, where
> people objected to this kind of optimization (although I did not
> propose to change the TCB ABI, that would be required for
> __libc_single_threaded because it's an external interface).
Of course you can use a normal global, but now every call point needs
to setup a TOC pointer (= two entry points and more icache lines for
otherwise trivial functions).
I think my choice would be just making the inline syscall be a single
call insn to an asm source file that out-of-lines the loading of TOC
pointer and call through it or branch based on hwcap so that it's not
repeated all over the place.
Alternatively, it would perhaps work to just put hwcap in the TCB and
branch on it rather than making an indirect call to a function pointer
in the TCB, so that the worst you could do by clobbering it is execute
the wrong syscall insn and thereby get SIGILL.
More information about the libc-dev