[cfe-dev] Even abs() comes with a performance penalty
Craig Topper via cfe-dev
cfe-dev at lists.llvm.org
Sun Sep 6 12:39:49 PDT 2020
cmov has been 1 cycle since Sandy Bridge. Moves execute in the register
renamer since Ivy Bridge. So mov+neg+cmov should be faster than cdq+add+xor
on modern CPUs. Furthermore, cdq really ties the hands of the register
allocator so probably doesn't make sense in a larger function with abs
mixed with other code.
~Craig
On Sun, Sep 6, 2020 at 12:30 PM Stefan Kanthak via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> --- bugs-bunny.c ---
> // Copyleft © 2014-2020, Stefan Kanthak <stefan.kanthak at nexgo.de>
>
> #ifdef __amd64__
> __int128_t __absti2(__int128_t argument) {
> return argument < 0 ? -argument : argument;
> }
> #else
> long long __absdi2(long long argument) {
> #ifdef BUNNY
> return __builtin_llabs(argument);
> #else
> return argument < 0 ? -argument : argument;
> #endif // BUNNY
> }
>
> long __abssi2(long argument) {
> #ifdef BUNNY
> return __builtin_labs(argument);
> #else
> return argument < 0 ? -argument : argument;
> #endif // BUNNY
> }
> #endif // __amd64__
> --- EOF ---
>
> Run clang -c -o- -O3 -S -target amd64-pc-linux bugs-bunny.c
>
> Left: inperformant original code # right: proper code,
> # faster and 3 bytes shorter
>
> __absti2: # @__absti2
> # %bb.0: # .intel_syntax noprefix
> xorl %edx, %edx # mov rax, rsi
> movq %rdi, %rax # cqo
> negq %rax # mov rax, rdx
> sbbq %rsi, %rdx # add rdi, rdx
> testq %rsi, %rsi # adc rsi, rdx
> cmovnsq %rdi, %rax # xor rax, rdi
> cmovnsq %rsi, %rdx # xor rdx, rsi
> retq # ret
>
> CMOVcc introduces a data dependency here, WITHOUT necessity!
>
>
> Run clang -c -o- -O3 -S -target i386-pc-linux bugs-bunny.c
>
> Left: inperformant original code # right: proper code, runs even on real
> # i386, not just PentiumPro+
>
> ___abssi2: # @__abssi2
> # %bb.0: # .intel_syntax noprefix
> movl 4(%esp), %ecx # mov eax, [esp+4]
> movl %ecx, %eax # cdq
> negl %eax # add eax, edx
> cmovll %ecx, %eax # xor eax, edx
> retl # ret
>
>
> Writing shorter code for __absdi2() for i386 is left as an
> exercise to the reader.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200906/0ca9364f/attachment-0001.html>
More information about the cfe-dev
mailing list