[PATCH] ARM/NEON: Improve codegen for long sext/zext operations.

Mon Mar 18 08:34:38 PDT 2013

LGTM, too.

Thanks,
Arnold

On Mar 18, 2013, at 9:04 AM, Pete Couperus <pjcoup at gmail.com> wrote:

> Hello,
> 
> OK, thanks, that helps a lot.  Here is the patch with the cost-model
> changes included.
> Let me know if it doesn't look right, or if there is anything else
> that I should fix.
> Thanks!
> 
> Pete
> 
> 
> On Sat, Mar 16, 2013 at 8:43 AM, Arnold Schwaighofer
> <aschwaighofer at apple.com> wrote:
>> Yes that sound right. I see for example:
>> 
>> vmovl.s8
>> vmovl.s8
>> vmovl.s16
>> vmovl.s16
>> vmovl.s16
>> vmovl.s16
>> 
>> for
>> 
>> %r = sext <16 x i8> %v0 to <16 x i32>
>> 
>> 
>> That would entail an entry of:
>> 
>> { ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8,  6 },
>> 
>> 
>> On Mar 16, 2013, at 9:28 AM, Renato Golin <renato.golin at linaro.org> wrote:
>> 
>>> On 16 March 2013 02:51, Pete Couperus <pjcoup at gmail.com> wrote:
>>> Thanks for the review.  I'd be happy to modify the cost values.
>>> Is there a guideline for what this should be?  I surmise looking at
>>> the comments and the other entries that this isn't quite as simple as
>>> something like the number of vmovl's.  Or should I just remove the
>>> entries corresponding to the cases covered in the patch?
>>> 
>>> Hi Pete,
>>> 
>>> The costs for the operations you're changing are:
>>> 
>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v8i32,  Custom);
>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v8i32,  Custom);
>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v16i32, Custom);
>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v16i32, Custom);
>>> 
>>> They don't have specific entries (AFAICS) in the cost tables, so it ends up just exploding the cost. Fair enough, they were a bunch of strh instructions.
>>> 
>>> What you have to do is to add entries in ARMTargetTransformInfo.cpp -> ARMTTI::getCastInstrCost() like:
>>> 
>>>    { ISD::SIGN_EXTEND, MVT::v16i64, MVT::v16i32, 4 },
>>>    { ISD::ZERO_EXTEND, MVT::v16i64, MVT::v16i32, 4 },
>>>    { ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i32, 3 },
>>>    { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, 3 },
>>> 
>>> ---> Arnold, please correct me if I'm wrong <---
>>> 
>>> And the ones you're adding new:
>>> 
>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v4i64,  Custom);
>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v4i64,  Custom);
>>> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v8i64,  Custom);
>>> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v8i64,  Custom);
>>> 
>>> That need similar entries on the table in ARMTTI taking each instruction as an ideal cost of 1.
>>> 
>>> Format is { ISD, DestType, SourceType, Cost };
>>> 
>>> That should improve some vectorized benchmarks, but I don't expect it to show above the noise level in LNT, so let's look for correctness rather than performance for now.
>>> 
>>> cheers,
>>> --renato
>> 
> <szext_cost.diff>