[libc-dev] [musl] Powerpc Linux 'scv' system call ABI proposal take 2
Florian Weimer via libc-dev
libc-dev at lists.llvm.org
Thu Apr 16 11:12:19 PDT 2020
* Rich Felker:
> On Thu, Apr 16, 2020 at 06:42:32PM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> > On Thu, Apr 16, 2020 at 06:48:44AM +0200, Florian Weimer wrote:
>> >> * Rich Felker:
>> >> > My preference would be that it work just like the i386 AT_SYSINFO
>> >> > where you just replace "int $128" with "call *%%gs:16" and the kernel
>> >> > provides a stub in the vdso that performs either scv or the old
>> >> > mechanism with the same calling convention.
>> >> The i386 mechanism has received some criticism because it provides an
>> >> effective means to redirect execution flow to anyone who can write to
>> >> the TCB. I am not sure if it makes sense to copy it.
>> > Indeed that's a good point. Do you have ideas for making it equally
>> > efficient without use of a function pointer in the TCB?
>> We could add a shared non-writable mapping at a 64K offset from the
>> thread pointer and store the function pointer or the code there. Then
>> it would be safe.
>> However, since this is apparently tied to POWER9 and we already have a
>> POWER9 multilib, and assuming that we are going to backport the kernel
>> change, I would tweak the selection criterion for that multilib to
>> include the new HWCAP2 flag. If a user runs this glibc on a kernel
>> which does not have support, they will get set baseline (POWER8)
>> multilib, which still works. This way, outside the dynamic loader, no
>> run-time dispatch is needed at all. I guess this is not at all the
>> answer you were looking for. 8-)
> How does this work with -static? :-)
-static is not supported. 8-) (If you use the unsupported static
libraries, you get POWER8 code.)
(Just to be clear, in case someone doesn't get the joke: This is about
a potential approach for a heavily constrained, vertically integrated
environment. It does not reflect general glibc recommendations.)
>> If a single binary is needed, I would perhaps follow what Arm did for
>> -moutline-atomics: lay out the code so that its easy to execute for
>> the non-POWER9 case, assuming that POWER9 machines will be better at
>> predicting things than their predecessors.
>> Or you could also put the function pointer into a RELRO segment. Then
>> there's overlap with the __libc_single_threaded discussion, where
>> people objected to this kind of optimization (although I did not
>> propose to change the TCB ABI, that would be required for
>> __libc_single_threaded because it's an external interface).
> Of course you can use a normal global, but now every call point needs
> to setup a TOC pointer (= two entry points and more icache lines for
> otherwise trivial functions).
> I think my choice would be just making the inline syscall be a single
> call insn to an asm source file that out-of-lines the loading of TOC
> pointer and call through it or branch based on hwcap so that it's not
> repeated all over the place.
I don't know how problematic control flow out of an inline asm is on
POWER. But this is basically the -moutline-atomics approach.
> Alternatively, it would perhaps work to just put hwcap in the TCB and
> branch on it rather than making an indirect call to a function pointer
> in the TCB, so that the worst you could do by clobbering it is execute
> the wrong syscall insn and thereby get SIGILL.
The HWCAP is already in the TCB. I expect this is what generic glibc
builds are going to use (perhaps with a bit of tweaking favorable to
POWER8 implementations, but we'll see).
More information about the libc-dev