[llvm] r176898 - ARM cost model: Increase the cost for vector casts that use the stack

Wed Mar 13 09:31:07 PDT 2013

Hi Pete,

On Mar 13, 2013, at 9:43 AM, Pete Couperus <pjcoup at gmail.com> wrote:

> Hello Arnold,
> 
> Is this change to work around the ARM backend's poor sext/zext v8i8 ->
> v8i32 lowering (http://llvm.org/bugs/show_bug.cgi?id=14867)?

Yes. Although, I would not call it a work-around but merely stating the facts.

> If so, maybe we should mark this as a place to revisit when that gets fixed.

Any improvement to lowering should cause us to revisit/update the cost model.  But yes, the best way to make sure that this will happen is to add a regression test that will fire once we get rid of the terrible code gen. I will do this. I also updated the bug above.

r176955 added a regression test.

> I have a fix for that bug in my tree, although it may not be the best
> way to fix it.
> 

Improving arm vector code generation is also on my ever growing list of things todo ;-). Patches much appreciated. 

But, first I want to make sure we are not regressing because of vectorization. 

My approach to this is first to fix glaring issues on our test suite. Next/in parallel I use automatically generated code fragments (both LLVM IR and C code) to systematically find issues.

Best,
Arnold

> Pete
> 
> 
> 
> On Tue, Mar 12, 2013 at 2:19 PM, Arnold Schwaighofer
> <aschwaighofer at apple.com> wrote:
>> Author: arnolds
>> Date: Tue Mar 12 16:19:22 2013
>> New Revision: 176898
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=176898&view=rev
>> Log:
>> ARM cost model: Increase the cost for vector casts that use the stack
>> 
>> Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend
>> currently lowers those using stack accesses.
>> 
>> This was responsible for a significant degradation on
>> MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1
>> where we vectorize one loop to a vector factor of 16. After this patch we select
>> a vector factor of 4 which will generate reasonable code.
>> 
>> unsigned char cle[32];
>> 
>> void test(short c) {
>>  unsigned short compte;
>>  for (compte = 0; compte <= 31; compte++) {
>>    cle[compte] = cle[compte] ^ c;
>>  }
>> }
>> 
>> radar://13220512
>> 
>> Modified:
>>    llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp
>>    llvm/trunk/test/Analysis/CostModel/ARM/cast.ll
>> 
>> Modified: llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp?rev=176898&r1=176897&r2=176898&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp (original)
>> +++ llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp Tue Mar 12 16:19:22 2013
>> @@ -194,6 +194,14 @@ unsigned ARMTTI::getCastInstrCost(unsign
>>     { ISD::TRUNCATE,    MVT::v4i32, MVT::v4i64, 0 },
>>     { ISD::TRUNCATE,    MVT::v4i16, MVT::v4i32, 1 },
>> 
>> +    // Operations that we legalize using load/stores to the stack.
>> +    { ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 16*2 + 4*4 },
>> +    { ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 16*2 + 4*3 },
>> +    { ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 8*2 + 2*4 },
>> +    { ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 8*2 + 2*3 },
>> +    { ISD::TRUNCATE,    MVT::v16i8, MVT::v16i32, 4*1 + 16*2 + 2*1 },
>> +    { ISD::TRUNCATE,    MVT::v8i8, MVT::v8i32, 2*1 + 8*2 + 1 },
>> +
>>     // Vector float <-> i32 conversions.
>>     { ISD::SINT_TO_FP,  MVT::v4f32, MVT::v4i32, 1 },
>>     { ISD::UINT_TO_FP,  MVT::v4f32, MVT::v4i32, 1 },
>> 
>> Modified: llvm/trunk/test/Analysis/CostModel/ARM/cast.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/ARM/cast.ll?rev=176898&r1=176897&r2=176898&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/Analysis/CostModel/ARM/cast.ll (original)
>> +++ llvm/trunk/test/Analysis/CostModel/ARM/cast.ll Tue Mar 12 16:19:22 2013
>> @@ -152,6 +152,20 @@ define i32 @casts() {
>>   ; CHECK: cost of 10 {{.*}} uitofp
>>   %r69 = uitofp i64 undef to double
>> 
>> +  ; Vector cast cost of instructions lowering the cast to the stack.
>> +  ; CHECK: cost of 24 {{.*}} sext
>> +  %r70 = sext <8 x i8> undef to <8 x i32>
>> +  ; CHECK: cost of 48 {{.*}} sext
>> +  %r71 = sext <16 x i8> undef to <16 x i32>
>> +  ; CHECK: cost of 22 {{.*}} zext
>> +  %r72 = zext <8 x i8> undef to <8 x i32>
>> +  ; CHECK: cost of 44 {{.*}} zext
>> +  %r73 = zext <16 x i8> undef to <16 x i32>
>> +  ; CHECK: cost of 19 {{.*}} trunc
>> +  %r74 = trunc <8 x i32> undef to <8 x i8>
>> +  ; CHECK: cost of 38 {{.*}} trunc
>> +  %r75 = trunc <16 x i32> undef to <16 x i8>
>> +
>>   ;CHECK: cost of 0 {{.*}} ret
>>   ret i32 undef
>> }
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits