[PATCH] ARM/NEON: Improve codegen for long sext/zext operations.

Sat Mar 16 08:43:43 PDT 2013

Yes that sound right. I see for example:

vmovl.s8        
vmovl.s8 
vmovl.s16
vmovl.s16
vmovl.s16
vmovl.s16

for

%r = sext <16 x i8> %v0 to <16 x i32>

That would entail an entry of:

 { ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8,  6 },

On Mar 16, 2013, at 9:28 AM, Renato Golin <renato.golin at linaro.org> wrote:

> On 16 March 2013 02:51, Pete Couperus <pjcoup at gmail.com> wrote:
> Thanks for the review.  I'd be happy to modify the cost values.
> Is there a guideline for what this should be?  I surmise looking at
> the comments and the other entries that this isn't quite as simple as
> something like the number of vmovl's.  Or should I just remove the
> entries corresponding to the cases covered in the patch?
> 
> Hi Pete,
> 
> The costs for the operations you're changing are:
> 
> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v8i32,  Custom);
> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v8i32,  Custom);
> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v16i32, Custom);
> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v16i32, Custom);
> 
> They don't have specific entries (AFAICS) in the cost tables, so it ends up just exploding the cost. Fair enough, they were a bunch of strh instructions.
> 
> What you have to do is to add entries in ARMTargetTransformInfo.cpp -> ARMTTI::getCastInstrCost() like:
> 
>     { ISD::SIGN_EXTEND, MVT::v16i64, MVT::v16i32, 4 },
>     { ISD::ZERO_EXTEND, MVT::v16i64, MVT::v16i32, 4 },
>     { ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i32, 3 },
>     { ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, 3 },
> 
> ---> Arnold, please correct me if I'm wrong <---
> 
> And the ones you're adding new:
> 
> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v4i64,  Custom);
> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v4i64,  Custom);
> +    setOperationAction(ISD::SIGN_EXTEND, MVT::v8i64,  Custom);
> +    setOperationAction(ISD::ZERO_EXTEND, MVT::v8i64,  Custom);
> 
> That need similar entries on the table in ARMTTI taking each instruction as an ideal cost of 1.
> 
> Format is { ISD, DestType, SourceType, Cost };
> 
> That should improve some vectorized benchmarks, but I don't expect it to show above the noise level in LNT, so let's look for correctness rather than performance for now.
> 
> cheers,
> --renato