[PATCH] [AArch64] Support ISD::SIGN_EXTEND_INREG

Tue Jan 7 23:21:36 PST 2014

Hi Ana,

I'm attaching a new patch and now we can generate more SSHLL instructions.
Refer to my test cases, please!

However, there is a hole in lowering shuffle_vector, so we couldn't
generate instruction uzip1 yet.

Kevin is giving follow-up and will upstream a separate patch to generate
uzip1, and he will also adding more CHECK in my test case to capture this
instruction.

Thanks,
-Jiangning

2014/1/8 Jiangning Liu <liujiangning1 at gmail.com>

> Hi Ana,
>
> Consider more about this optimization, and now I think the second patch I
> gave is incorrect.
>
> Basically we have the following three cases to cover,
>
> 1) sext_inreg(v2i16, v2i8)
> sshll   v0.8h, v0.8b, #0
> shuffle_vector(<8xi16>, <8xi16>, <i32, i32> (0, 4))
>
> 2) sext_inreg(v4i16, v4i8)
> sshll   v0.8h, v0.bb, #0
> shuffle_vector(<8xi16>, <8xi16>, <i32, i32, i32, i32> (0, 2, 4, 6))
>
> 3) sext_inreg(v2i32, v2i16)
> sshll   v0.4s, v0.4h, #0
> shuffle_vector(<4xi32>, <4xi32>, <i32, i32> (0, 2))
>
> All other cases should have been covered by the 1st patch.
>
> Thanks,
> -Jiangning
>
>
>
> 2014/1/7 Jiangning Liu <liujiangning1 at gmail.com>
>
>> Ana,
>>
>> I see your point now.
>>
>> Actually
>> with my patch
>> sign_extend_inreg(v8i16, v8i8) can generate SXTL(8b->8h) as shown with my
>> test case below,
>>
>>
>> define <8 x i8> @test_sext_inreg_v8i8i16(<8 x i8> %v1, <8 x i8> %v2)
>> nounwind readnone {
>> ; CHECK-LABEL: test_sext_inreg_v8i8i16
>> ; CHECK: sshll   v0.8h, v0.8b, #0
>> ; CHECK: sshll   v1.8h, v1.8b, #0
>>   %1 = sext <8 x i8> %v1 to <8 x i16>
>>   %2 = sext <8 x i8> %v2 to <8 x i16>
>>   %3 = shufflevector <8 x i16> %1, <8 x i16> %2, <8 x i32> <i32 0, i32 2,
>> i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
>>   %4 = trunc <8 x i16> %3 to <8 x i8>
>>   ret <8 x i8> %4
>> }
>>
>> And
>> sign_extend_inreg(v2i64, v2i32) doesn't exist, because we always use
>> sign_extend(v2i64, v2i32) to solve it as shown by test case below,
>>
>> define <2 x i32> @test_sext_inreg_v2i32i64(<2 x i32> %v1, <2 x i32> %v2)
>> nounwind readnone {
>> ; CHECK-LABEL: test_sext_inreg_v2i32i64
>> ; CHECK: sshll v0.2d, v0.2s, #0
>> ; CHECK: sshll v1.2d, v1.2s, #0
>>   %1 = sext <2 x i32> %v1 to <2 x i64>
>>   %2 = sext <2 x i32> %v2 to <2 x i64>
>>   %3 = shufflevector <2 x i64> %1, <2 x i64> %2, <2 x i32> <i32 0, i32 2>
>>   %4 = trunc <2 x i64> %3 to <2 x i32>
>>   ret <2 x i32> %4
>> }
>>
>> However, yes
>> ,
>> sign_extend_inreg(v2i32, v2i16) would be an issue, so I modified my patch
>> as attached and changed the test test_sext_inreg_v2i16i32 to be like below
>> by using sshll instruction.
>>
>>
>> define <2 x i16> @test_sext_inreg_v2i16i32(<2 x i16> %v1, <2 x i16> %v2)
>> nounwind readnone {
>> ; CHECK-LABEL: test_sext_inreg_v2i16i32
>> ; CHECK: sshll   v0.4s, v0.4h, #0
>> ; CHECK: sshll   v1.4s, v1.4h, #0
>>
>>   %1 = sext <2 x i16> %v1 to <2 x i32>
>>   %2 = sext <2 x i16> %v2 to <2 x i32>
>>   %3 = shufflevector <2 x i32> %1, <2 x i32> %2, <2 x i32> <i32 0, i32 2>
>>   %4 = trunc <2 x i32> %3 to <2 x i16>
>>   ret <2 x i16> %4
>> }
>>
>> The solution is by doing combine to capture this special sha/shl pair. Do
>> we have more missing cases?
>>
>> Thanks,
>> -Jiangning
>>
>>
>>
>> 2014/1/7 Ana Pazos <apazos at codeaurora.org>
>>
>>> Hi Jiangning,
>>>
>>>
>>>
>>> The test cases I see failure are
>>>
>>> sign_extend_inreg(v2i32, v2i16) and
>>>
>>> sign_extend_inreg(v4i16, v4i8)     - sorry I had a typo v8i8 but I meant
>>> v4i8 which confused you.
>>>
>>>
>>>
>>> So it seems your patch addresses both cases I was concerned about.
>>>
>>>
>>>
>>> But for such cases I think the SXTL instruction could be used instead of
>>> the combo shift right + shift left.
>>>
>>>
>>>
>>> For example sign_extend_inreg(v2i32, v2i16):
>>>
>>> -        Input are 16-bit values in a 2S register
>>>
>>> -        Reinterpret register as 4H register
>>>
>>> -        SXTL (4S <– 4H)
>>>
>>> -        Ins/uzp1 (to extract the vector indexes 0, 2 we need into a 2S
>>> register)
>>>
>>>
>>>
>>> The same can be done for sign_extend_inreg(v8i16, v8i8) and
>>> sign_extend_inreg(v2i64, v2i32).
>>>
>>>
>>>
>>> I think in some cases the extraction of vector indexes we are interested
>>> in will be a no-op and an instruction will be saved.
>>>
>>>
>>>
>>> I am just suggesting to use a hardware instruction that does the sign
>>> extension for those vector types it supports.
>>>
>>>
>>>
>>> Do you agree?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Ana.
>>>
>>>
>>>
>>> *From:* Jiangning Liu [mailto:liujiangning1 at gmail.com]
>>> *Sent:* Sunday, January 05, 2014 10:44 PM
>>> *To:* Ana Pazos
>>> *Cc:* llvm-commits at cs.uiuc.edu for LLVM; mcrosier at codeaurora.org
>>> *Subject:* Re: [PATCH] [AArch64] Support ISD::SIGN_EXTEND_INREG
>>>
>>>
>>>
>>> Hi
>>>
>>> Ana,
>>>
>>> Sorry, I don't quite understand what you said. Do you have a small test
>>> to articulate what you mentioned?
>>>
>>> For sign_extend_inreg(v2i32, v2i16), my test case below should show my
>>> patch work,
>>>
>>> define <2 x i16> @test_sext_inreg_v2i16i32(<2 x i16> %v1, <2 x i16> %v2)
>>> nounwind readnone {
>>> ; CHECK-LABEL: test_sext_inreg_v2i16i32
>>> ; CHECK: shl     v0.2s, v0.2s, #16
>>> ; CHECK: sshr    v0.2s, v0.2s, #16
>>> ; CHECK: shl     v1.2s, v1.2s, #16
>>> ; CHECK: sshr    v1.2s, v1.2s, #16
>>>   %1 = sext <2 x i16> %v1 to <2 x i32>
>>>   %2 = sext <2 x i16> %v2 to <2 x i32>
>>>   %3 = shufflevector <2 x i32> %1, <2 x i32> %2, <2 x i32> <i32 0, i32 2>
>>>   %4 = trunc <2 x i32> %3 to <2 x i16>
>>>   ret <2 x i16> %4
>>> }
>>>
>>> For sign_extend_inreg(v4i16, v8i8), is this a valid? I thought it should
>>> be sign_extend_inreg(v8i16, v8i8). If this is the case, my test below
>>> should also show my patch work,
>>>
>>> define <8 x i8> @test_sext_inreg_v8i8i16(<8 x i8> %v1, <8 x i8> %v2)
>>> nounwind readnone {
>>> ; CHECK-LABEL: test_sext_inreg_v8i8i16
>>> ; CHECK: sshll   v0.8h, v0.8b, #0
>>> ; CHECK: sshll   v1.8h, v1.8b, #0
>>>   %1 = sext <8 x i8> %v1 to <8 x i16>
>>>   %2 = sext <8 x i8> %v2 to <8 x i16>
>>>   %3 = shufflevector <8 x i16> %1, <8 x i16> %2, <8 x i32> <i32 0, i32
>>> 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
>>>   %4 = trunc <8 x i16> %3 to <8 x i8>
>>>   ret <8 x i8> %4
>>> }
>>>
>>> Thanks,
>>> -Jiangning
>>>
>>
>>
>>
>> --
>> Thanks,
>> -Jiangning
>>
>
>
>
> --
> Thanks,
> -Jiangning
>

-- 
Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140108/1dc7879c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sext_inreg_llvm_3.patch
Type: application/octet-stream
Size: 11587 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140108/1dc7879c/attachment.obj>