[libc-dev] [musl] Powerpc Linux 'scv' system call ABI proposal take 2

David Laight via libc-dev libc-dev at lists.llvm.org
Tue Apr 21 08:31:08 PDT 2020


From: Adhemerval Zanella
> Sent: 21 April 2020 16:01
> 
> On 21/04/2020 11:39, Rich Felker wrote:
> > On Tue, Apr 21, 2020 at 12:28:25PM +0000, David Laight wrote:
> >> From: Nicholas Piggin
> >>> Sent: 20 April 2020 02:10
> >> ...
> >>>>> Yes, but does it really matter to optimize this specific usage case
> >>>>> for size? glibc, for instance, tries to leverage the syscall mechanism
> >>>>> by adding some complex pre-processor asm directives.  It optimizes
> >>>>> the syscall code size in most cases.  For instance, kill in static case
> >>>>> generates on x86_64:
> >>>>>
> >>>>> 0000000000000000 <__kill>:
> >>>>>    0:   b8 3e 00 00 00          mov    $0x3e,%eax
> >>>>>    5:   0f 05                   syscall
> >>>>>    7:   48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
> >>>>>    d:   0f 83 00 00 00 00       jae    13 <__kill+0x13>
> >>
> >> Hmmm... that cmp + jae is unnecessary here.
> >
> > It's not.. Rather the objdump was just mistakenly done without -r so
> > it looks like a nop jump rather than a conditional tail call to the
> > function that sets errno.
> >
> 
> Indeed, the output with -r is:
> 
> 0000000000000000 <__kill>:
>    0:   b8 3e 00 00 00          mov    $0x3e,%eax
>    5:   0f 05                   syscall
>    7:   48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
>    d:   0f 83 00 00 00 00       jae    13 <__kill+0x13>
>                         f: R_X86_64_PLT32       __syscall_error-0x4
>   13:   c3                      retq

Yes, I probably should have remembered it looked like that :-)
...
> >> I also suspect it gets predicted very badly.
> >
> > I doubt that. This is a very standard idiom and the size of the offset
> > (which is necessarily 32-bit because it has a relocation on it) is
> > orthogonal to the condition on the jump.

Yes, it only gets mispredicted as badly as any other conditional jump.
I believe modern intel x86 will randomly predict it taken (regardless
of the direction) and then hit a TLB fault on text.unlikely :-)

> > FWIW a syscall like kill takes global kernel-side locks to be able to
> > address a target process by pid, and the rate of meaningful calls you
> > can make to it is very low (since it's bounded by time for target
> > process to act on the signal). Trying to optimize it for speed is
> > pointless, and even size isn't important locally (although in
> > aggregate, lots of wasted small size can add up to more pages = more
> > TLB entries = ...).
> 
> I agree and I would prefer to focus on code simplicity to have a
> platform neutral way to handle error and let the compiler optimize
> it than messy with assembly macros to squeeze this kind of
> micro-optimizations.

syscall entry does get micro-optimised.
Real speed-ups can probably be found by optimising other places.
I've a patch i need to resumbit that should improve the reading
of iov[] from user space.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


More information about the libc-dev mailing list