<div dir="ltr">Clang never generates calls to ___paritysi2, ___paritydi2, ___cmpdi2, or ___ucmpdi2 on X86 so its not clear the performance of this matters at all.<div><br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 6, 2020 at 12:31 PM Stefan Kanthak via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><<a href="https://compiler-rt.llvm.org/index.html" rel="noreferrer" target="_blank">https://compiler-rt.llvm.org/index.html</a>> boasts:<br>
<br>
| The builtins library provides optimized implementations of this<br>
| and other low-level routines, either in target-independent C form,<br>
| or as a heavily-optimized assembly.<br>
<br>
Really?<br>
<br>
Left: inperformant code shipped in # Right: slightly improved code,<br>
clang_rt.builtins-* # which the optimiser REALLY<br>
# should have generated<br>
<br>
___cmpdi2:<br>
mov ecx, [esp+16] # mov ecx, [esp+16]<br>
xor eax, eax # xor eax, eax<br>
cmp [esp+8], ecx # cmp ecx, [esp+8]<br>
jl @f # jg @f<br>
mov eax, 2 # mov eax, 2<br>
jg @f # jl @f<br>
mov ecx, [esp+4] #<br>
mov edx, [esp+12] # mov ecx, [esp+12]<br>
mov eax, 0 # xor eax, eax<br>
cmp ecx, edx # cmp ecx, [esp+4]<br>
jb @f # ja @f<br>
cmp edx, ecx #<br>
mov eax, 1 #<br>
adc eax, 0 # adc eax, 1<br>
@@: # @@:<br>
ret # ret<br>
<br>
# 3 instructions less, 10 bytes saved<br>
<br>
___ucmpdi2:<br>
mov ecx, [esp+16] # mov ecx, [esp+16]<br>
xor eax, eax # xor eax, eax<br>
cmp [esp+8], ecx # cmp ecx, [esp+8]<br>
jb @f # ja @f<br>
mov eax, 2 # mov eax, 2<br>
ja @f # jb @f<br>
mov ecx, [esp+4] #<br>
mov edx, [esp+12] # mov ecx, [esp+12]<br>
mov eax, 0 # xor eax, eax<br>
cmp ecx, edx # cmp ecx, [esp+4]<br>
jb @f # ja @f<br>
cmp edx, ecx #<br>
mov eax, 1 #<br>
adc eax, 0 # adc eax, 1<br>
@@: # @@:<br>
ret # ret<br>
<br>
# 3 instructions less, 10 bytes saved<br>
<br>
<br>
Now properly written code, of course branch-free, faster and shorter:<br>
<br>
# Copyright (C) 2004-2020, Stefan Kanthak <<a href="mailto:stefan.kanthak@nexgo.de" target="_blank">stefan.kanthak@nexgo.de</a>><br>
<br>
___cmpdi2:<br>
mov ecx, [esp+4]<br>
mov edx, [esp+12]<br>
cmp ecx, edx<br>
mov eax, [esp+8]<br>
sbb eax, [esp+16]<br>
setl ah<br>
cmp edx, ecx<br>
mov edx, [esp+16]<br>
sbb edx, [esp+8]<br>
setl al<br>
sub al, ah<br>
movsx eax, al<br>
inc eax<br>
ret<br>
<br>
___ucmpdi2:<br>
mov ecx, [esp+4]<br>
mov edx, [esp+12]<br>
cmp ecx, edx<br>
mov eax, [esp+8]<br>
sbb eax, [esp+16]<br>
sbb eax, eax<br>
cmp edx, ecx<br>
mov edx, [esp+16]<br>
sbb edx, [esp+8]<br>
adc eax, 1<br>
ret<br>
<br>
<br>
AGAIN:<br>
Remove every occurance of the word "optimized" on the above web page.<br>
<br>
'nuff said<br>
Stefan<br>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div>