[PATCH] ARM/NEON: Improve codegen for long sext/zext operations.

Mon Mar 18 20:21:39 PDT 2013

Hello,

Great, thanks!  Would one of you mind committing for me?
Thanks again,

Pete

On Mon, Mar 18, 2013 at 8:34 AM, Arnold Schwaighofer
<aschwaighofer at apple.com> wrote:
> LGTM, too.
>
> Thanks,
> Arnold
>
> On Mar 18, 2013, at 9:04 AM, Pete Couperus <pjcoup at gmail.com> wrote:
>
>> Hello,
>>
>> OK, thanks, that helps a lot.  Here is the patch with the cost-model
>> changes included.
>> Let me know if it doesn't look right, or if there is anything else
>> that I should fix.
>> Thanks!
>>
>> Pete
>>
>>
>> On Sat, Mar 16, 2013 at 8:43 AM, Arnold Schwaighofer
>> <aschwaighofer at apple.com> wrote:
>>> Yes that sound right. I see for example:
>>>
>>> vmovl.s8
>>> vmovl.s8
>>> vmovl.s16
>>> vmovl.s16
>>> vmovl.s16
>>> vmovl.s16
>>>
>>> for
>>>
>>> %r = sext <16 x i8> %v0 to <16 x i32>
>>>
>>>
>>> That would entail an entry of:
>>>
>>> { ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8,  6 },
>>>
>>>
>>> On Mar 16, 2013, at 9:28 AM, Renato Golin <renato.golin at linaro.org> wrote:
>>>
>>>> On 16 March 2013 02:51, Pete Couperus <pjcoup at gmail.com> wrote:
>>>> Thanks for the review.  I'd be happy to modify the cost values.
>>>> Is there a guideline for what this should be?  I surmise looking at
>>>> the comments and the other entries that this isn't quite as simple as
>>>> something like the number of vmovl's.  Or should I just remove the
>>>> entries corresponding to the cases covered in the patch?
>>>>
>>>> Hi Pete,
>>>>
>>>> The costs for the operations you're changing are:
>>>>
>>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v8i32,  Custom);
>>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v8i32,  Custom);
>>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v16i32, Custom);
>>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v16i32, Custom);
>>>>
>>>> They don't have specific entries (AFAICS) in the cost tables, so it ends up just exploding the cost. Fair enough, they were a bunch of strh instructions.
>>>>
>>>> What you have to do is to add entries in ARMTargetTransformInfo.cpp -> ARMTTI::getCastInstrCost() like:
>>>>
>>>>    { ISD::SIGN_EXTEND, MVT::v16i64, MVT::v16i32, 4 },
>>>>    { ISD::ZERO_EXTEND, MVT::v16i64, MVT::v16i32, 4 },
>>>>    { ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i32, 3 },
>>>>    { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, 3 },
>>>>
>>>> ---> Arnold, please correct me if I'm wrong <---
>>>>
>>>> And the ones you're adding new:
>>>>
>>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v4i64,  Custom);
>>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v4i64,  Custom);
>>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v8i64,  Custom);
>>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v8i64,  Custom);
>>>>
>>>> That need similar entries on the table in ARMTTI taking each instruction as an ideal cost of 1.
>>>>
>>>> Format is { ISD, DestType, SourceType, Cost };
>>>>
>>>> That should improve some vectorized benchmarks, but I don't expect it to show above the noise level in LNT, so let's look for correctness rather than performance for now.
>>>>
>>>> cheers,
>>>> --renato
>>>
>> <szext_cost.diff>
>