[cfe-dev] Even abs() comes with a performance penalty
Craig Topper via cfe-dev
cfe-dev at lists.llvm.org
Sun Sep 6 14:06:55 PDT 2020
I was mostly speaking to abssi2. What data dependency exists for cmov that
doesn’t exist for cdq+add+xor?
On Sun, Sep 6, 2020 at 1:19 PM Stefan Kanthak <stefan.kanthak at nexgo.de>
wrote:
> "Craig Topper" <craig.topper at gmail.com> wrote:
>
>
>
> > Sorry. I made a mistake. Cmov has been 1 cycle since Broadwell.
>
>
>
> Doesn't matter, no need to worry: all instructions used below run in 1
> cycle on
>
> recent CPUs ... just like Jcc. The question/point is but whether the CPU
> can/does
>
> speculate ahead.
>
>
>
> Stefan
>
>
>
> On Sun, Sep 6, 2020 at 12:39 PM Craig Topper <craig.topper at gmail.com>
> wrote:
>
>
>
> > cmov has been 1 cycle since Sandy Bridge. Moves execute in the register
>
> > renamer since Ivy Bridge. So mov+neg+cmov should be faster than
> cdq+add+xor
>
> > on modern CPUs. Furthermore, cdq really ties the hands of the register
>
> > allocator so probably doesn't make sense in a larger function with abs
>
> > mixed with other code.
>
> >
>
> > ~Craig
>
> >
>
> >
>
> > On Sun, Sep 6, 2020 at 12:30 PM Stefan Kanthak via cfe-dev <
>
> > cfe-dev at lists.llvm.org> wrote:
>
> >
>
> >> --- bugs-bunny.c ---
>
> >> // Copyleft © 2014-2020, Stefan Kanthak <stefan.kanthak at nexgo.de>
>
> >>
>
> >> #ifdef __amd64__
>
> >> __int128_t __absti2(__int128_t argument) {
>
> >> return argument < 0 ? -argument : argument;
>
> >> }
>
> >> #else
>
> >> long long __absdi2(long long argument) {
>
> >> #ifdef BUNNY
>
> >> return __builtin_llabs(argument);
>
> >> #else
>
> >> return argument < 0 ? -argument : argument;
>
> >> #endif // BUNNY
>
> >> }
>
> >>
>
> >> long __abssi2(long argument) {
>
> >> #ifdef BUNNY
>
> >> return __builtin_labs(argument);
>
> >> #else
>
> >> return argument < 0 ? -argument : argument;
>
> >> #endif // BUNNY
>
> >> }
>
> >> #endif // __amd64__
>
> >> --- EOF ---
>
> >>
>
> >> Run clang -c -o- -O3 -S -target amd64-pc-linux bugs-bunny.c
>
> >>
>
> >> Left: inperformant original code # right: proper code,
>
> >> # faster and 3 bytes shorter
>
> >>
>
> >> __absti2: # @__absti2
>
> >> # %bb.0: # .intel_syntax noprefix
>
> >> xorl %edx, %edx # mov rax, rsi
>
> >> movq %rdi, %rax # cqo
>
> >> negq %rax # mov rax, rdx
>
> >> sbbq %rsi, %rdx # add rdi, rdx
>
> >> testq %rsi, %rsi # adc rsi, rdx
>
> >> cmovnsq %rdi, %rax # xor rax, rdi
>
> >> cmovnsq %rsi, %rdx # xor rdx, rsi
>
> >> retq # ret
>
> >>
>
> >> CMOVcc introduces a data dependency here, WITHOUT necessity!
>
> >>
>
> >>
>
> >> Run clang -c -o- -O3 -S -target i386-pc-linux bugs-bunny.c
>
> >>
>
> >> Left: inperformant original code # right: proper code, runs even on real
>
> >> # i386, not just PentiumPro+
>
> >>
>
> >> ___abssi2: # @__abssi2
>
> >> # %bb.0: # .intel_syntax noprefix
>
> >> movl 4(%esp), %ecx # mov eax, [esp+4]
>
> >> movl %ecx, %eax # cdq
>
> >> negl %eax # add eax, edx
>
> >> cmovll %ecx, %eax # xor eax, edx
>
> >> retl # ret
>
> >>
>
> >>
>
> >> Writing shorter code for __absdi2() for i386 is left as an
>
> >> exercise to the reader.
>
> >>
>
> >> _______________________________________________
>
> >> cfe-dev mailing list
>
> >> cfe-dev at lists.llvm.org
>
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
> >>
>
>
>
> --
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200906/b7bb15b5/attachment-0001.html>
More information about the cfe-dev
mailing list