[llvm-dev] Where's the optimiser gone (part 11): use the proper instruction for sign extension

Stefan Kanthak via llvm-dev llvm-dev at lists.llvm.org
Mon Mar 4 01:08:44 PST 2019


"Craig Topper" <craig.topper at gmail.com> wrote:

> It's fairly difficult to use CDQ in LLVM without tying the hands of the
> register allocator.

Its hands are but already tied when it has to return a quadword in EDX:EAX,
uses the DIV/IDIV and MUL/IMUL instructions or any shifts with variable
shift count.
In the case of llsign() it uses 3 registers, although the job can be done
with just EAX and EDX.

> It would potentially require a new post-RA combine pass
> to detect the "mov edx, eax; sar edx, 31" pattern. It's going to be even
> harder to bias register allocation in hopes of using CDQ for the lsign case.
> 
> CDQ is implemented in the shifter unit on a least the last several
> generations of Intel CPUs so its going to perform similarly to SAR. And the
> move only requires decoder bandwidth and no execution resources on recent
> CPUs. Do you performance data for this optimization?

No, I don't have such data.

Regarding the llsign() function: instead of "mov edx, eax; sar edx, 31"
the compiler SHOULD generate EITHER a "cdq" OR a "mov edx, ecx" here.
Except for this final step AND the use of ECX instead of EDX it did a
pretty good job; compare the generated code against GCC's, ICC's or MSVC's,
which emit AWFUL code in that instance.

Regarding the lsign() function: setCC r8 and other operations on partial
registers are typically slower than operations on the full registers, or
introduce dependencies.

regards
Stefan

> On Sun, Mar 3, 2019 at 11:08 PM Stefan Kanthak via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> 
>> Compile with -O3 -m32 (see <https://godbolt.org/z/yCpBpM>):
>>
>> long lsign(long x)
>> {
>>     return (x > 0) - (x < 0);
>> }
>>
>>
>> long long llsign(long long x)
>> {
>>     return (x > 0) - (x < 0);
>> }
>>
>>
>> While the code generated for the "long" version of this function is quite
>> OK, the code for the "long long" version misses an obvious optimisation:
>>
>>
>> lsign: # @lsign
>>     mov     eax, dword ptr [esp + 4]    |    mov     eax, dword ptr [esp +
>> 4]
>>     xor     ecx, ecx                    |
>>     test    eax, eax                    |    cdq
>>     setg    cl                          |    neg     eax
>>     sar     eax, 31                     |    adc     edx, edx
>>     add     eax, ecx                    |    mov     eax, edx
>>     ret                                 |    ret
>>
>> llsign: # @llsign
>>     xor     ecx, ecx                    |    xor     edx, edx
>>     mov     eax, dword ptr [esp + 8]    |    mov     eax, dword ptr [esp +
>> 8]
>>     cmp     ecx, dword ptr [esp + 4]    |    cmp     edx, dword ptr [esp +
>> 4]
>>     sbb     ecx, eax                    |    sbb     edx, eax
>>     setl    cl                          |    cdq
>>     sar     eax, 31                     |    setl    al
>>     movzx   ecx, cl                     |    movzx   eax, al
>>     add     eax, ecx                    |    add     eax, edx
>>     mov     edx, eax                    |    ret
>>     sar     edx, 31
>>     ret
>>
>> NOTE: not just here this sequence SHOULD be replaced with
>>
>>     mov     edx, eax                    |    cdq
>>     sar     edx, 31
>>
>> Although CDQ is the proper instruction for sign extension, LLVM/clang
>> doesn't
>> seem to like it.
>>
>> stay tuned
>> Stefan Kanthak
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>


More information about the llvm-dev mailing list