[PATCH] X86: fold SSE2/AVX2 logical shift by immediate amout into zero vector when possible

Thu Jul 11 04:20:51 PDT 2013

On 10/07/13 22:15, Nadav Rotem wrote:
> The patch LGTM.  I have a few comments:
>
> This is a NOP:

The IR level optimizers know about this one.
>
> +define <8 x i16> @test_srlw_1(<8 x i16> %InVec) {
> +entry:
> +  %shl = lshr <8 x i16> %InVec, <i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16
> 0, i16 0>
> +  ret <8 x i16> %shl
> +}
> +
> +; CHECK: test_srlw_1:
> +; CHECK: psrlw   $0, %xmm0
> +; CHECK-NEXT: ret
> +
>
> I think that this is also a missed optimization.  32 > 31.

I.e. according to LLVM semantics the result is undefined (undef).
Interestingly the IR level optimizers don't turn this into undef.

Ciao, Duncan.

>
> +define <4 x i32> @test_srad_3(<4 x i32> %InVec) {
> +entry:
> +  %shl = ashr <4 x i32> %InVec, <i32 32, i32 32, i32 32, i32 32>
> +  ret <4 x i32> %shl
> +}
> +
> +; CHECK: test_srad_3:
> +; CHECK: psrad   $32, %xmm0
> +; CHECK-NEXT: ret
>
>
> Nadav
>
> On Jul 10, 2013, at 1:00 PM, Eric Christopher <echristo at gmail.com
> <mailto:echristo at gmail.com>> wrote:
>
>> Nadav might be someone good to review this.
>>
>> -eric
>>
>> On Wed, Jul 10, 2013 at 6:46 AM,  <Andrea_DiBiagio at sn.scee.net
>> <mailto:Andrea_DiBiagio at sn.scee.net>> wrote:
>>> Ping.
>>>
>>> (See attached file: patch.diff)
>>> Andrea DiBiagio/SN R&D/BS/UK/SCEE wrote on 01/07/2013 12:01:44:
>>>
>>>> Friendly ping.
>>>>
>>>>> From: Andrea DiBiagio/SN R&D/BS/UK/SCEE
>>>>> Hi all,
>>>>>
>>>>> I'd like to contribute a patch that teaches the x86 backend how to
>>>>> combine SSE2/AVX2 packed logical shifts by immediate amount into
>>>>> vectors of all 0s.
>>>>>
>>>>> SSE2/AVX2 logical shift by immediate amount where the amount is
>>>>> greater than or
>>>>> equal to the vector element size always return a vector of all 0s.
>>>>>
>>>>> Example:
>>>>> pslld $35, %xmm0   # SSE2 packed doubleword logical shift left.
>>>>>                    # %xmm0 is a vector of packed int (MVT::v4i32).
>>>>>
>>>>> The shift from this example will return a vector of all zeros in %xmm0
>>> and
>>>>> therefore it could be easily rewritten for example as:
>>>>> xorps %xmm0, %xmm0
>>>>>
>>>>> This patch adds a new target combine rule in X86ISelLowering.cpp to
>>>>> make sure that we simplify when possible vector shifts into zero
>>> vectors.
>>>>>
>>>>> I added two new tests to verify that vector shifts are correctly folded
>>> into
>>>>> vectors of all 0s when the immediate amount is equal or exceeds
>>>>> the vector element size.
>>>>>
>>>>> Thanks,
>>>>> Andrea Di Biagio
>>>>> SN Systems - Sony Computer Entertainment
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>