[PATCH] Optimize sext 4xi8,4xi16 to 4xi64

Tue Mar 5 12:55:28 PST 2013

On Mar 5, 2013, at 2:47 PM, Muhammad Tauqir Ahmad <muhammad.t.ahmad at intel.com> wrote:

>> Basically, you have to add entries or make sure that they have the right cost for
>> 
>> ;CHECK: cost of 3 {{.*}} sext
>> %Y = sext <4 x i8> undef to <4 x i64>
>> ;CHECK: cost of 3 {{.*}} sext
>> %Y = sext <4 x i16> %undef to <4 x i64>
>> 
> 
> There already are tests by Elena Demikhovsky which I will update once
> I figure out what to change the costs to.
> 
Okay, great thanks.

>> 
>> If we don't already get the cost right - probably we don't - you need to edit the file lib/Target/X86/X86TargetTransformInfo.cpp:X86TTI::getCastInstrCost and add the appropriate costs to the appropriate table.
>> 
>> Something like:
>> 
>> static const TypeConversionCostTblEntry<MVT> AVXConversionTbl[] = {
>>    { ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 1 },
>> +  { ISD::SIGN_EXTEND, MVT::v4i8, MVT::v4i64, 3 },
>> +  { ISD::SIGN_EXTEND, MVT::v4i16, MVT::v4i64, 3 },
> I think the ordering of the input/output types should be the other way round.

Yes, I probably got it the wrong way :).

> Yes, Nadav asked me to do this yesterday and I am still trying to
> figure out how to change those. :)
> 
> The costs for the sign-extend pairs covered by this patch were added
> by Elena Demikhovsky but I am not sure how accurate they need to be
> and since the previous sequence produced 8 instructions, each
> instruction dependant on the previous, and the cost was 8 -- now 6
> instructions are being generated, each instruction dependant on the
> previous, can I just update the costs to 6?
> 

Yes.

> In other words, is it arbitrary? Is the above "accurate enough" for
> our purposes assuming a relative scale is being used?

No, it is not arbitrary. We roughly model ideal throughput for the instructions. 6 is fine.

Thanks,
Arnold