[cfe-dev] Even abs() comes with a performance penalty

Craig Topper via cfe-dev cfe-dev at lists.llvm.org
Sun Sep 6 14:06:55 PDT 2020


I was mostly speaking to abssi2. What data dependency exists for cmov that
doesn’t exist for cdq+add+xor?

On Sun, Sep 6, 2020 at 1:19 PM Stefan Kanthak <stefan.kanthak at nexgo.de>
wrote:

> "Craig Topper" <craig.topper at gmail.com> wrote:
>
>
>
> > Sorry. I made a mistake. Cmov has been 1 cycle since Broadwell.
>
>
>
> Doesn't matter, no need to worry: all instructions used below run in 1
> cycle on
>
> recent CPUs ... just like Jcc. The question/point is but whether the CPU
> can/does
>
> speculate ahead.
>
>
>
> Stefan
>
>
>
> On Sun, Sep 6, 2020 at 12:39 PM Craig Topper <craig.topper at gmail.com>
> wrote:
>
>
>
> > cmov has been 1 cycle since Sandy Bridge. Moves execute in the register
>
> > renamer since Ivy Bridge. So mov+neg+cmov should be faster than
> cdq+add+xor
>
> > on modern CPUs. Furthermore, cdq really ties the hands of the register
>
> > allocator so probably doesn't make sense in a larger function with abs
>
> > mixed with other code.
>
> >
>
> > ~Craig
>
> >
>
> >
>
> > On Sun, Sep 6, 2020 at 12:30 PM Stefan Kanthak via cfe-dev <
>
> > cfe-dev at lists.llvm.org> wrote:
>
> >
>
> >> --- bugs-bunny.c ---
>
> >> // Copyleft © 2014-2020, Stefan Kanthak <stefan.kanthak at nexgo.de>
>
> >>
>
> >> #ifdef __amd64__
>
> >> __int128_t __absti2(__int128_t argument) {
>
> >>     return argument < 0 ? -argument : argument;
>
> >> }
>
> >> #else
>
> >> long long __absdi2(long long argument) {
>
> >> #ifdef BUNNY
>
> >>     return __builtin_llabs(argument);
>
> >> #else
>
> >>     return argument < 0 ? -argument : argument;
>
> >> #endif // BUNNY
>
> >> }
>
> >>
>
> >> long __abssi2(long argument) {
>
> >> #ifdef BUNNY
>
> >>     return __builtin_labs(argument);
>
> >> #else
>
> >>     return argument < 0 ? -argument : argument;
>
> >> #endif // BUNNY
>
> >> }
>
> >> #endif // __amd64__
>
> >> --- EOF ---
>
> >>
>
> >> Run clang -c -o- -O3 -S -target amd64-pc-linux bugs-bunny.c
>
> >>
>
> >> Left: inperformant original code # right: proper code,
>
> >>                                  #        faster and 3 bytes shorter
>
> >>
>
> >> __absti2:      # @__absti2
>
> >> # %bb.0:                         # .intel_syntax noprefix
>
> >>       xorl     %edx, %edx        #        mov    rax, rsi
>
> >>       movq     %rdi, %rax        #        cqo
>
> >>       negq     %rax              #        mov    rax, rdx
>
> >>       sbbq     %rsi, %rdx        #        add    rdi, rdx
>
> >>       testq    %rsi, %rsi        #        adc    rsi, rdx
>
> >>       cmovnsq  %rdi, %rax        #        xor    rax, rdi
>
> >>       cmovnsq  %rsi, %rdx        #        xor    rdx, rsi
>
> >>       retq                       #        ret
>
> >>
>
> >> CMOVcc introduces a data dependency here, WITHOUT necessity!
>
> >>
>
> >>
>
> >> Run clang -c -o- -O3 -S -target i386-pc-linux bugs-bunny.c
>
> >>
>
> >> Left: inperformant original code # right: proper code, runs even on real
>
> >>                                  #        i386, not just PentiumPro+
>
> >>
>
> >> ___abssi2:    # @__abssi2
>
> >> # %bb.0:                         # .intel_syntax noprefix
>
> >>       movl    4(%esp), %ecx      #        mov    eax, [esp+4]
>
> >>       movl    %ecx, %eax         #        cdq
>
> >>       negl    %eax               #        add    eax, edx
>
> >>       cmovll  %ecx, %eax         #        xor    eax, edx
>
> >>       retl                       #        ret
>
> >>
>
> >>
>
> >> Writing shorter code for __absdi2() for i386 is left as an
>
> >> exercise to the reader.
>
> >>
>
> >> _______________________________________________
>
> >> cfe-dev mailing list
>
> >> cfe-dev at lists.llvm.org
>
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
> >>
>
>
>
> --
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200906/b7bb15b5/attachment-0001.html>


More information about the cfe-dev mailing list