<div dir="ltr">The -O0 code isn't the same as the builtins library, it's worse.<div><br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 9, 2020 at 8:50 AM Craig Topper <<a href="mailto:craig.topper@gmail.com">craig.topper@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Turn on optimizations.<div><br clear="all"><div><div dir="ltr">~Craig</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 9, 2020 at 4:28 AM Stefan Kanthak <<a href="mailto:stefan.kanthak@nexgo.de" target="_blank">stefan.kanthak@nexgo.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">"Craig Topper" <<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>> wrote:<br>
<br>
> __builtin_parity uses setnp on older x86 and popcnt with sse4.2<br>
<br>
Reality check, PLEASE:<br>
<br>
--- bug.c ---<br>
int main(int argc, char *argv[]) {<br>
return __builtin_parity(argc);<br>
}<br>
--- EOF ---<br>
<br>
clang -o- -target i386-pc-linux -S bug.c<br>
clang version 10.0.0<br>
Target: i386-pc-linux<br>
<br>
pushl %ebp<br>
movl %esp, %ebp<br>
subl $8, %esp<br>
movl 8(%ebp), %eax<br>
movl $0, -4(%ebp)<br>
movl 8(%ebp), %ecx<br>
movl %ecx, %edx<br>
shrl %edx<br>
andl $1431655765, %edx # imm = 0x55555555<br>
subl %edx, %ecx<br>
movl %ecx, %edx<br>
andl $858993459, %edx # imm = 0x33333333<br>
shrl $2, %ecx<br>
andl $858993459, %ecx # imm = 0x33333333<br>
addl %ecx, %edx<br>
movl %edx, %ecx<br>
shrl $4, %ecx<br>
addl %ecx, %edx<br>
andl $252645135, %edx # imm = 0xF0F0F0F<br>
imull $16843009, %edx, %ecx # imm = 0x1010101<br>
shrl $24, %ecx<br>
andl $1, %ecx<br>
movl %eax, -8(%ebp) # 4-byte Spill<br>
movl %ecx, %eax<br>
addl $8, %esp<br>
popl %ebp<br>
retl<br>
<br>
<br>
clang -o- -target amd64-pc-linux -S bug.c<br>
<br>
pushq %rbp<br>
.cfi_def_cfa_offset 16<br>
.cfi_offset %rbp, -16<br>
movq %rsp, %rbp<br>
.cfi_def_cfa_register %rbp<br>
movl $0, -4(%rbp)<br>
movl %edi, -8(%rbp)<br>
movl -8(%rbp), %eax<br>
movl %eax, %ecx<br>
shrl %ecx<br>
andl $1431655765, %ecx # imm = 0x55555555<br>
subl %ecx, %eax<br>
movl %eax, %ecx<br>
andl $858993459, %ecx # imm = 0x33333333<br>
shrl $2, %eax<br>
andl $858993459, %eax # imm = 0x33333333<br>
addl %eax, %ecx<br>
movl %ecx, %eax<br>
shrl $4, %eax<br>
addl %eax, %ecx<br>
andl $252645135, %ecx # imm = 0xF0F0F0F<br>
imull $16843009, %ecx, %eax # imm = 0x1010101<br>
shrl $24, %eax<br>
andl $1, %eax<br>
popq %rbp<br>
.cfi_def_cfa %rsp, 8<br>
retq<br>
<br>
JFTR: this is the same unoptimised code as shipped in the builtins library!<br>
<br>
Stefan<br>
<br>
> On Sun, Sep 6, 2020 at 1:32 PM Stefan Kanthak <<a href="mailto:stefan.kanthak@nexgo.de" target="_blank">stefan.kanthak@nexgo.de</a>><br>
> wrote:<br>
> <br>
>> "Craig Topper" <<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>> wrote;<br>
>><br>
>><br>
>><br>
>> > Clang never generates calls to ___paritysi2, ___paritydi2, ___cmpdi2, or<br>
>><br>
>> > ___ucmpdi2 on X86 so its not clear the performance of this matters at<br>
>> all.<br>
>><br>
>><br>
>><br>
>> So you can safely remove them for X86 and all the other targets where such<br>
>><br>
>> unoptimized code is never called!<br>
>><br>
>> But fix these routines for targets where they are called.<br>
>><br>
>><br>
>><br>
>> The statement does NOT make any exceptions, and it does not say<br>
>><br>
>> | ships unoptimized routines the compiler never calls<br>
>><br>
>> but<br>
>><br>
>> | optimized target-independent implementations<br>
>><br>
>><br>
>><br>
>> Stefan<br>
>><br>
>><br>
>><br>
>> BTW: do builtins like __builtin_*parity* exist?<br>
>><br>
>> If yes: do they generate the same bad code?<br>
>><br>
>><br>
>><br>
>> > On Sun, Sep 6, 2020 at 12:31 PM Stefan Kanthak via cfe-dev <<br>
>><br>
>> > <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br>
>><br>
>> ><br>
>><br>
>> >> <<a href="https://compiler-rt.llvm.org/index.html" rel="noreferrer" target="_blank">https://compiler-rt.llvm.org/index.html</a>> boasts:<br>
>><br>
>> >><br>
>><br>
>> >> | The builtins library provides optimized implementations of this<br>
>><br>
>> >> | and other low-level routines, either in target-independent C form,<br>
>><br>
>> >> | or as a heavily-optimized assembly.<br>
>><br>
>> >><br>
>><br>
>> >> Really?<br>
>><br>
>> >><br>
>><br>
>> >> Left: inperformant code shipped in # Right: slightly improved code,<br>
>><br>
>> >> clang_rt.builtins-* # which the optimiser<br>
>> REALLY<br>
>><br>
>> >> # should have generated<br>
>><br>
>> >><br>
>><br>
>> >> ___cmpdi2:<br>
>><br>
>> >> mov ecx, [esp+16] # mov ecx, [esp+16]<br>
>><br>
>> >> xor eax, eax # xor eax, eax<br>
>><br>
>> >> cmp [esp+8], ecx # cmp ecx, [esp+8]<br>
>><br>
>> >> jl @f # jg @f<br>
>><br>
>> >> mov eax, 2 # mov eax, 2<br>
>><br>
>> >> jg @f # jl @f<br>
>><br>
>> >> mov ecx, [esp+4] #<br>
>><br>
>> >> mov edx, [esp+12] # mov ecx, [esp+12]<br>
>><br>
>> >> mov eax, 0 # xor eax, eax<br>
>><br>
>> >> cmp ecx, edx # cmp ecx, [esp+4]<br>
>><br>
>> >> jb @f # ja @f<br>
>><br>
>> >> cmp edx, ecx #<br>
>><br>
>> >> mov eax, 1 #<br>
>><br>
>> >> adc eax, 0 # adc eax, 1<br>
>><br>
>> >> @@: # @@:<br>
>><br>
>> >> ret # ret<br>
>><br>
>> >><br>
>><br>
>> >> # 3 instructions less, 10 bytes<br>
>> saved<br>
>><br>
>> >><br>
>><br>
>> >> ___ucmpdi2:<br>
>><br>
>> >> mov ecx, [esp+16] # mov ecx, [esp+16]<br>
>><br>
>> >> xor eax, eax # xor eax, eax<br>
>><br>
>> >> cmp [esp+8], ecx # cmp ecx, [esp+8]<br>
>><br>
>> >> jb @f # ja @f<br>
>><br>
>> >> mov eax, 2 # mov eax, 2<br>
>><br>
>> >> ja @f # jb @f<br>
>><br>
>> >> mov ecx, [esp+4] #<br>
>><br>
>> >> mov edx, [esp+12] # mov ecx, [esp+12]<br>
>><br>
>> >> mov eax, 0 # xor eax, eax<br>
>><br>
>> >> cmp ecx, edx # cmp ecx, [esp+4]<br>
>><br>
>> >> jb @f # ja @f<br>
>><br>
>> >> cmp edx, ecx #<br>
>><br>
>> >> mov eax, 1 #<br>
>><br>
>> >> adc eax, 0 # adc eax, 1<br>
>><br>
>> >> @@: # @@:<br>
>><br>
>> >> ret # ret<br>
>><br>
>> >><br>
>><br>
>> >> # 3 instructions less, 10 bytes<br>
>> saved<br>
>><br>
>> >><br>
>><br>
>> >><br>
>><br>
>> >> Now properly written code, of course branch-free, faster and shorter:<br>
>><br>
>> >><br>
>><br>
>> >> # Copyright (C) 2004-2020, Stefan Kanthak <<a href="mailto:stefan.kanthak@nexgo.de" target="_blank">stefan.kanthak@nexgo.de</a>><br>
>><br>
>> >><br>
>><br>
>> >> ___cmpdi2:<br>
>><br>
>> >> mov ecx, [esp+4]<br>
>><br>
>> >> mov edx, [esp+12]<br>
>><br>
>> >> cmp ecx, edx<br>
>><br>
>> >> mov eax, [esp+8]<br>
>><br>
>> >> sbb eax, [esp+16]<br>
>><br>
>> >> setl ah<br>
>><br>
>> >> cmp edx, ecx<br>
>><br>
>> >> mov edx, [esp+16]<br>
>><br>
>> >> sbb edx, [esp+8]<br>
>><br>
>> >> setl al<br>
>><br>
>> >> sub al, ah<br>
>><br>
>> >> movsx eax, al<br>
>><br>
>> >> inc eax<br>
>><br>
>> >> ret<br>
>><br>
>> >><br>
>><br>
>> >> ___ucmpdi2:<br>
>><br>
>> >> mov ecx, [esp+4]<br>
>><br>
>> >> mov edx, [esp+12]<br>
>><br>
>> >> cmp ecx, edx<br>
>><br>
>> >> mov eax, [esp+8]<br>
>><br>
>> >> sbb eax, [esp+16]<br>
>><br>
>> >> sbb eax, eax<br>
>><br>
>> >> cmp edx, ecx<br>
>><br>
>> >> mov edx, [esp+16]<br>
>><br>
>> >> sbb edx, [esp+8]<br>
>><br>
>> >> adc eax, 1<br>
>><br>
>> >> ret<br>
>><br>
>> >><br>
>><br>
>> >><br>
>><br>
>> >> AGAIN:<br>
>><br>
>> >> Remove every occurance of the word "optimized" on the above web page.<br>
>><br>
>> >><br>
>><br>
>> >> 'nuff said<br>
>><br>
>> >> Stefan<br>
>><br>
>> >> _______________________________________________<br>
>><br>
>> >> cfe-dev mailing list<br>
>><br>
>> >> <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
>><br>
>> >> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
>><br>
>> >><br>
>><br>
>><br>
> <br>
> -- <br>
> ~Craig<br>
><br>
</blockquote></div>
</blockquote></div>