[PATCH] Optimize sext 4xi8,4xi16 to 4xi64
Arnold Schwaighofer
aschwaighofer at apple.com
Tue Mar 5 12:55:28 PST 2013
On Mar 5, 2013, at 2:47 PM, Muhammad Tauqir Ahmad <muhammad.t.ahmad at intel.com> wrote:
>> Basically, you have to add entries or make sure that they have the right cost for
>>
>> ;CHECK: cost of 3 {{.*}} sext
>> %Y = sext <4 x i8> undef to <4 x i64>
>> ;CHECK: cost of 3 {{.*}} sext
>> %Y = sext <4 x i16> %undef to <4 x i64>
>>
>
> There already are tests by Elena Demikhovsky which I will update once
> I figure out what to change the costs to.
>
Okay, great thanks.
>>
>> If we don't already get the cost right - probably we don't - you need to edit the file lib/Target/X86/X86TargetTransformInfo.cpp:X86TTI::getCastInstrCost and add the appropriate costs to the appropriate table.
>>
>> Something like:
>>
>> static const TypeConversionCostTblEntry<MVT> AVXConversionTbl[] = {
>> { ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 1 },
>> + { ISD::SIGN_EXTEND, MVT::v4i8, MVT::v4i64, 3 },
>> + { ISD::SIGN_EXTEND, MVT::v4i16, MVT::v4i64, 3 },
> I think the ordering of the input/output types should be the other way round.
Yes, I probably got it the wrong way :).
> Yes, Nadav asked me to do this yesterday and I am still trying to
> figure out how to change those. :)
>
> The costs for the sign-extend pairs covered by this patch were added
> by Elena Demikhovsky but I am not sure how accurate they need to be
> and since the previous sequence produced 8 instructions, each
> instruction dependant on the previous, and the cost was 8 -- now 6
> instructions are being generated, each instruction dependant on the
> previous, can I just update the costs to 6?
>
Yes.
> In other words, is it arbitrary? Is the above "accurate enough" for
> our purposes assuming a relative scale is being used?
No, it is not arbitrary. We roughly model ideal throughput for the instructions. 6 is fine.
Thanks,
Arnold
More information about the llvm-commits
mailing list