[cfe-dev] "Optimized implementations"?
Stefan Kanthak via cfe-dev
cfe-dev at lists.llvm.org
Sun Sep 6 13:21:10 PDT 2020
"Craig Topper" <craig.topper at gmail.com> wrote;
> Clang never generates calls to ___paritysi2, ___paritydi2, ___cmpdi2, or
> ___ucmpdi2 on X86 so its not clear the performance of this matters at all.
So you can safely remove them for X86 and all the other targets where such
unoptimized code is never called!
But fix these routines for targets where they are called.
The statement does NOT make any exceptions, and it does not say
| ships unoptimized routines the compiler never calls
but
| optimized target-independent implementations
Stefan
BTW: do builtins like __builtin_*parity* exist?
If yes: do they generate the same bad code?
> On Sun, Sep 6, 2020 at 12:31 PM Stefan Kanthak via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> <https://compiler-rt.llvm.org/index.html> boasts:
>>
>> | The builtins library provides optimized implementations of this
>> | and other low-level routines, either in target-independent C form,
>> | or as a heavily-optimized assembly.
>>
>> Really?
>>
>> Left: inperformant code shipped in # Right: slightly improved code,
>> clang_rt.builtins-* # which the optimiser REALLY
>> # should have generated
>>
>> ___cmpdi2:
>> mov ecx, [esp+16] # mov ecx, [esp+16]
>> xor eax, eax # xor eax, eax
>> cmp [esp+8], ecx # cmp ecx, [esp+8]
>> jl @f # jg @f
>> mov eax, 2 # mov eax, 2
>> jg @f # jl @f
>> mov ecx, [esp+4] #
>> mov edx, [esp+12] # mov ecx, [esp+12]
>> mov eax, 0 # xor eax, eax
>> cmp ecx, edx # cmp ecx, [esp+4]
>> jb @f # ja @f
>> cmp edx, ecx #
>> mov eax, 1 #
>> adc eax, 0 # adc eax, 1
>> @@: # @@:
>> ret # ret
>>
>> # 3 instructions less, 10 bytes saved
>>
>> ___ucmpdi2:
>> mov ecx, [esp+16] # mov ecx, [esp+16]
>> xor eax, eax # xor eax, eax
>> cmp [esp+8], ecx # cmp ecx, [esp+8]
>> jb @f # ja @f
>> mov eax, 2 # mov eax, 2
>> ja @f # jb @f
>> mov ecx, [esp+4] #
>> mov edx, [esp+12] # mov ecx, [esp+12]
>> mov eax, 0 # xor eax, eax
>> cmp ecx, edx # cmp ecx, [esp+4]
>> jb @f # ja @f
>> cmp edx, ecx #
>> mov eax, 1 #
>> adc eax, 0 # adc eax, 1
>> @@: # @@:
>> ret # ret
>>
>> # 3 instructions less, 10 bytes saved
>>
>>
>> Now properly written code, of course branch-free, faster and shorter:
>>
>> # Copyright (C) 2004-2020, Stefan Kanthak <stefan.kanthak at nexgo.de>
>>
>> ___cmpdi2:
>> mov ecx, [esp+4]
>> mov edx, [esp+12]
>> cmp ecx, edx
>> mov eax, [esp+8]
>> sbb eax, [esp+16]
>> setl ah
>> cmp edx, ecx
>> mov edx, [esp+16]
>> sbb edx, [esp+8]
>> setl al
>> sub al, ah
>> movsx eax, al
>> inc eax
>> ret
>>
>> ___ucmpdi2:
>> mov ecx, [esp+4]
>> mov edx, [esp+12]
>> cmp ecx, edx
>> mov eax, [esp+8]
>> sbb eax, [esp+16]
>> sbb eax, eax
>> cmp edx, ecx
>> mov edx, [esp+16]
>> sbb edx, [esp+8]
>> adc eax, 1
>> ret
>>
>>
>> AGAIN:
>> Remove every occurance of the word "optimized" on the above web page.
>>
>> 'nuff said
>> Stefan
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
More information about the cfe-dev
mailing list