[llvm-dev] Where's the optimiser gone (part 11): use the proper instruction for sign extension
Stefan Kanthak via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 4 01:08:44 PST 2019
"Craig Topper" <craig.topper at gmail.com> wrote:
> It's fairly difficult to use CDQ in LLVM without tying the hands of the
> register allocator.
Its hands are but already tied when it has to return a quadword in EDX:EAX,
uses the DIV/IDIV and MUL/IMUL instructions or any shifts with variable
shift count.
In the case of llsign() it uses 3 registers, although the job can be done
with just EAX and EDX.
> It would potentially require a new post-RA combine pass
> to detect the "mov edx, eax; sar edx, 31" pattern. It's going to be even
> harder to bias register allocation in hopes of using CDQ for the lsign case.
>
> CDQ is implemented in the shifter unit on a least the last several
> generations of Intel CPUs so its going to perform similarly to SAR. And the
> move only requires decoder bandwidth and no execution resources on recent
> CPUs. Do you performance data for this optimization?
No, I don't have such data.
Regarding the llsign() function: instead of "mov edx, eax; sar edx, 31"
the compiler SHOULD generate EITHER a "cdq" OR a "mov edx, ecx" here.
Except for this final step AND the use of ECX instead of EDX it did a
pretty good job; compare the generated code against GCC's, ICC's or MSVC's,
which emit AWFUL code in that instance.
Regarding the lsign() function: setCC r8 and other operations on partial
registers are typically slower than operations on the full registers, or
introduce dependencies.
regards
Stefan
> On Sun, Mar 3, 2019 at 11:08 PM Stefan Kanthak via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>):
>>
>> long lsign(long x)
>> {
>> return (x > 0) - (x < 0);
>> }
>>
>>
>> long long llsign(long long x)
>> {
>> return (x > 0) - (x < 0);
>> }
>>
>>
>> While the code generated for the "long" version of this function is quite
>> OK, the code for the "long long" version misses an obvious optimisation:
>>
>>
>> lsign: # @lsign
>> mov eax, dword ptr [esp + 4] | mov eax, dword ptr [esp +
>> 4]
>> xor ecx, ecx |
>> test eax, eax | cdq
>> setg cl | neg eax
>> sar eax, 31 | adc edx, edx
>> add eax, ecx | mov eax, edx
>> ret | ret
>>
>> llsign: # @llsign
>> xor ecx, ecx | xor edx, edx
>> mov eax, dword ptr [esp + 8] | mov eax, dword ptr [esp +
>> 8]
>> cmp ecx, dword ptr [esp + 4] | cmp edx, dword ptr [esp +
>> 4]
>> sbb ecx, eax | sbb edx, eax
>> setl cl | cdq
>> sar eax, 31 | setl al
>> movzx ecx, cl | movzx eax, al
>> add eax, ecx | add eax, edx
>> mov edx, eax | ret
>> sar edx, 31
>> ret
>>
>> NOTE: not just here this sequence SHOULD be replaced with
>>
>> mov edx, eax | cdq
>> sar edx, 31
>>
>> Although CDQ is the proper instruction for sign extension, LLVM/clang
>> doesn't
>> seem to like it.
>>
>> stay tuned
>> Stefan Kanthak
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
More information about the llvm-dev
mailing list