<div dir="ltr"><div>cmov has been 1 cycle since Sandy Bridge. Moves execute in the register renamer since Ivy Bridge. So mov+neg+cmov should be faster than cdq+add+xor on modern CPUs. Furthermore, cdq really ties the hands of the register allocator so probably doesn't make sense in a larger function with abs mixed with other code.</div><br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 6, 2020 at 12:30 PM Stefan Kanthak via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">--- bugs-bunny.c ---<br>
// Copyleft © 2014-2020, Stefan Kanthak <<a href="mailto:stefan.kanthak@nexgo.de" target="_blank">stefan.kanthak@nexgo.de</a>><br>
<br>
#ifdef __amd64__<br>
__int128_t __absti2(__int128_t argument) {<br>
return argument < 0 ? -argument : argument;<br>
}<br>
#else<br>
long long __absdi2(long long argument) {<br>
#ifdef BUNNY<br>
return __builtin_llabs(argument);<br>
#else<br>
return argument < 0 ? -argument : argument;<br>
#endif // BUNNY<br>
}<br>
<br>
long __abssi2(long argument) {<br>
#ifdef BUNNY<br>
return __builtin_labs(argument);<br>
#else<br>
return argument < 0 ? -argument : argument;<br>
#endif // BUNNY<br>
}<br>
#endif // __amd64__<br>
--- EOF ---<br>
<br>
Run clang -c -o- -O3 -S -target amd64-pc-linux bugs-bunny.c<br>
<br>
Left: inperformant original code # right: proper code,<br>
# faster and 3 bytes shorter<br>
<br>
__absti2: # @__absti2<br>
# %bb.0: # .intel_syntax noprefix<br>
xorl %edx, %edx # mov rax, rsi<br>
movq %rdi, %rax # cqo<br>
negq %rax # mov rax, rdx<br>
sbbq %rsi, %rdx # add rdi, rdx<br>
testq %rsi, %rsi # adc rsi, rdx<br>
cmovnsq %rdi, %rax # xor rax, rdi<br>
cmovnsq %rsi, %rdx # xor rdx, rsi<br>
retq # ret<br>
<br>
CMOVcc introduces a data dependency here, WITHOUT necessity!<br>
<br>
<br>
Run clang -c -o- -O3 -S -target i386-pc-linux bugs-bunny.c<br>
<br>
Left: inperformant original code # right: proper code, runs even on real<br>
# i386, not just PentiumPro+<br>
<br>
___abssi2: # @__abssi2<br>
# %bb.0: # .intel_syntax noprefix<br>
movl 4(%esp), %ecx # mov eax, [esp+4]<br>
movl %ecx, %eax # cdq<br>
negl %eax # add eax, edx<br>
cmovll %ecx, %eax # xor eax, edx<br>
retl # ret<br>
<br>
<br>
Writing shorter code for __absdi2() for i386 is left as an<br>
exercise to the reader.<br>
<br>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div>