[cfe-dev] "Optimized implementations"?
Craig Topper via cfe-dev
cfe-dev at lists.llvm.org
Wed Sep 9 08:50:42 PDT 2020
Turn on optimizations.
~Craig
On Wed, Sep 9, 2020 at 4:28 AM Stefan Kanthak <stefan.kanthak at nexgo.de>
wrote:
> "Craig Topper" <craig.topper at gmail.com> wrote:
>
> > __builtin_parity uses setnp on older x86 and popcnt with sse4.2
>
> Reality check, PLEASE:
>
> --- bug.c ---
> int main(int argc, char *argv[]) {
> return __builtin_parity(argc);
> }
> --- EOF ---
>
> clang -o- -target i386-pc-linux -S bug.c
> clang version 10.0.0
> Target: i386-pc-linux
>
> pushl %ebp
> movl %esp, %ebp
> subl $8, %esp
> movl 8(%ebp), %eax
> movl $0, -4(%ebp)
> movl 8(%ebp), %ecx
> movl %ecx, %edx
> shrl %edx
> andl $1431655765, %edx # imm = 0x55555555
> subl %edx, %ecx
> movl %ecx, %edx
> andl $858993459, %edx # imm = 0x33333333
> shrl $2, %ecx
> andl $858993459, %ecx # imm = 0x33333333
> addl %ecx, %edx
> movl %edx, %ecx
> shrl $4, %ecx
> addl %ecx, %edx
> andl $252645135, %edx # imm = 0xF0F0F0F
> imull $16843009, %edx, %ecx # imm = 0x1010101
> shrl $24, %ecx
> andl $1, %ecx
> movl %eax, -8(%ebp) # 4-byte Spill
> movl %ecx, %eax
> addl $8, %esp
> popl %ebp
> retl
>
>
> clang -o- -target amd64-pc-linux -S bug.c
>
> pushq %rbp
> .cfi_def_cfa_offset 16
> .cfi_offset %rbp, -16
> movq %rsp, %rbp
> .cfi_def_cfa_register %rbp
> movl $0, -4(%rbp)
> movl %edi, -8(%rbp)
> movl -8(%rbp), %eax
> movl %eax, %ecx
> shrl %ecx
> andl $1431655765, %ecx # imm = 0x55555555
> subl %ecx, %eax
> movl %eax, %ecx
> andl $858993459, %ecx # imm = 0x33333333
> shrl $2, %eax
> andl $858993459, %eax # imm = 0x33333333
> addl %eax, %ecx
> movl %ecx, %eax
> shrl $4, %eax
> addl %eax, %ecx
> andl $252645135, %ecx # imm = 0xF0F0F0F
> imull $16843009, %ecx, %eax # imm = 0x1010101
> shrl $24, %eax
> andl $1, %eax
> popq %rbp
> .cfi_def_cfa %rsp, 8
> retq
>
> JFTR: this is the same unoptimised code as shipped in the builtins library!
>
> Stefan
>
> > On Sun, Sep 6, 2020 at 1:32 PM Stefan Kanthak <stefan.kanthak at nexgo.de>
> > wrote:
> >
> >> "Craig Topper" <craig.topper at gmail.com> wrote;
> >>
> >>
> >>
> >> > Clang never generates calls to ___paritysi2, ___paritydi2, ___cmpdi2,
> or
> >>
> >> > ___ucmpdi2 on X86 so its not clear the performance of this matters at
> >> all.
> >>
> >>
> >>
> >> So you can safely remove them for X86 and all the other targets where
> such
> >>
> >> unoptimized code is never called!
> >>
> >> But fix these routines for targets where they are called.
> >>
> >>
> >>
> >> The statement does NOT make any exceptions, and it does not say
> >>
> >> | ships unoptimized routines the compiler never calls
> >>
> >> but
> >>
> >> | optimized target-independent implementations
> >>
> >>
> >>
> >> Stefan
> >>
> >>
> >>
> >> BTW: do builtins like __builtin_*parity* exist?
> >>
> >> If yes: do they generate the same bad code?
> >>
> >>
> >>
> >> > On Sun, Sep 6, 2020 at 12:31 PM Stefan Kanthak via cfe-dev <
> >>
> >> > cfe-dev at lists.llvm.org> wrote:
> >>
> >> >
> >>
> >> >> <https://compiler-rt.llvm.org/index.html> boasts:
> >>
> >> >>
> >>
> >> >> | The builtins library provides optimized implementations of this
> >>
> >> >> | and other low-level routines, either in target-independent C form,
> >>
> >> >> | or as a heavily-optimized assembly.
> >>
> >> >>
> >>
> >> >> Really?
> >>
> >> >>
> >>
> >> >> Left: inperformant code shipped in # Right: slightly improved
> code,
> >>
> >> >> clang_rt.builtins-* # which the optimiser
> >> REALLY
> >>
> >> >> # should have generated
> >>
> >> >>
> >>
> >> >> ___cmpdi2:
> >>
> >> >> mov ecx, [esp+16] # mov ecx, [esp+16]
> >>
> >> >> xor eax, eax # xor eax, eax
> >>
> >> >> cmp [esp+8], ecx # cmp ecx, [esp+8]
> >>
> >> >> jl @f # jg @f
> >>
> >> >> mov eax, 2 # mov eax, 2
> >>
> >> >> jg @f # jl @f
> >>
> >> >> mov ecx, [esp+4] #
> >>
> >> >> mov edx, [esp+12] # mov ecx, [esp+12]
> >>
> >> >> mov eax, 0 # xor eax, eax
> >>
> >> >> cmp ecx, edx # cmp ecx, [esp+4]
> >>
> >> >> jb @f # ja @f
> >>
> >> >> cmp edx, ecx #
> >>
> >> >> mov eax, 1 #
> >>
> >> >> adc eax, 0 # adc eax, 1
> >>
> >> >> @@: # @@:
> >>
> >> >> ret # ret
> >>
> >> >>
> >>
> >> >> # 3 instructions less, 10 bytes
> >> saved
> >>
> >> >>
> >>
> >> >> ___ucmpdi2:
> >>
> >> >> mov ecx, [esp+16] # mov ecx, [esp+16]
> >>
> >> >> xor eax, eax # xor eax, eax
> >>
> >> >> cmp [esp+8], ecx # cmp ecx, [esp+8]
> >>
> >> >> jb @f # ja @f
> >>
> >> >> mov eax, 2 # mov eax, 2
> >>
> >> >> ja @f # jb @f
> >>
> >> >> mov ecx, [esp+4] #
> >>
> >> >> mov edx, [esp+12] # mov ecx, [esp+12]
> >>
> >> >> mov eax, 0 # xor eax, eax
> >>
> >> >> cmp ecx, edx # cmp ecx, [esp+4]
> >>
> >> >> jb @f # ja @f
> >>
> >> >> cmp edx, ecx #
> >>
> >> >> mov eax, 1 #
> >>
> >> >> adc eax, 0 # adc eax, 1
> >>
> >> >> @@: # @@:
> >>
> >> >> ret # ret
> >>
> >> >>
> >>
> >> >> # 3 instructions less, 10 bytes
> >> saved
> >>
> >> >>
> >>
> >> >>
> >>
> >> >> Now properly written code, of course branch-free, faster and shorter:
> >>
> >> >>
> >>
> >> >> # Copyright (C) 2004-2020, Stefan Kanthak <stefan.kanthak at nexgo.de>
> >>
> >> >>
> >>
> >> >> ___cmpdi2:
> >>
> >> >> mov ecx, [esp+4]
> >>
> >> >> mov edx, [esp+12]
> >>
> >> >> cmp ecx, edx
> >>
> >> >> mov eax, [esp+8]
> >>
> >> >> sbb eax, [esp+16]
> >>
> >> >> setl ah
> >>
> >> >> cmp edx, ecx
> >>
> >> >> mov edx, [esp+16]
> >>
> >> >> sbb edx, [esp+8]
> >>
> >> >> setl al
> >>
> >> >> sub al, ah
> >>
> >> >> movsx eax, al
> >>
> >> >> inc eax
> >>
> >> >> ret
> >>
> >> >>
> >>
> >> >> ___ucmpdi2:
> >>
> >> >> mov ecx, [esp+4]
> >>
> >> >> mov edx, [esp+12]
> >>
> >> >> cmp ecx, edx
> >>
> >> >> mov eax, [esp+8]
> >>
> >> >> sbb eax, [esp+16]
> >>
> >> >> sbb eax, eax
> >>
> >> >> cmp edx, ecx
> >>
> >> >> mov edx, [esp+16]
> >>
> >> >> sbb edx, [esp+8]
> >>
> >> >> adc eax, 1
> >>
> >> >> ret
> >>
> >> >>
> >>
> >> >>
> >>
> >> >> AGAIN:
> >>
> >> >> Remove every occurance of the word "optimized" on the above web page.
> >>
> >> >>
> >>
> >> >> 'nuff said
> >>
> >> >> Stefan
> >>
> >> >> _______________________________________________
> >>
> >> >> cfe-dev mailing list
> >>
> >> >> cfe-dev at lists.llvm.org
> >>
> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> >> >>
> >>
> >>
> >
> > --
> > ~Craig
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200909/ea334430/attachment.html>
More information about the cfe-dev
mailing list