ARM Cast Cost Table

Tue Jan 29 11:20:51 PST 2013

+1 for smaller tests. To verify that the cost model returns the right cost you can write something like:

target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:64:128-a0:0:64-n32-S64"
target triple = "armv7--linux-gnueabihf"

%T0 = type <4 x i16>
%T1 = type <4 x i32>
define void @func0(%T0* %loadaddr, %T1* %storeaddr) {
  %v0 = load %T0* %loadaddr
  %r = sext %T0 %v0 to %T1
  store %T1 %r, %T1* %storeaddr
  ret void
}

%T2 = type <4 x i16>
%T3 = type <4 x i32>
define void @func1(%T2* %loadaddr, %T2* %loadaddr2, %T3* %storeaddr) {
  %v0 = load %T2* %loadaddr
  %v1 = load %T2* %loadaddr2
  %r = sext %T2 %v0 to %T3
  %r2 = sext %T2 %v1 to %T3
  %r3 = mul %T3 %r, %r2 
  store %T3 %r3, %T3* %storeaddr
  ret void
}

Now, I choose those two examples for a reason. If we run

> llc -mcpu=cortex-a9 < x.ll

We get:

func0:                                  @ @func0
@ BB#0:
	vldr	d16, [r0]
	vmovl.s16	q8, d16
	vst1.64	{d16, d17}, [r1]
	bx	lr

func1:                                  @ @func1
@ BB#0:
	vldr	d16, [r1]
	vldr	d17, [r0]
	vmull.s16	q8, d17, d16
	vst1.64	{d16, d17}, [r2]
	bx	lr

In the case of func0 we pay for sign-extension while in func1 it can be merged into the arithmetic. The question is now which cost should we return: the optimistic or the pessimistic one? I would lean towards the optimistic one (as you do) but I think we should get an agreement on which one to use in such cases.

Thanks,
Arnold

On Jan 29, 2013, at 12:46 PM, Nadav Rotem <nrotem at apple.com> wrote:

> Hi Renato, 
> 
> Thanks for working on this. I have some comments. 
> 
> The changes to ARMTargetMachine.h are unrelated to the cost model. Lets commit them in a separate patch. 
> 
> The code in ARMTTI::getCastInstrCos looks good.
> 
> +                                 ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT());
> 
> Did we pass the 80-col ?  I am not sure. 
> 
> In your test cases you execute both LLC and OPT. If you are checking the LLC generates the right pattern then this test should be in tests/CodeGen/ARM/.  Can you make the tests smaller ?  You can write a two-line function in LL that takes the arguments and performs the operation on it. 
> 
> Thanks,
> Nadav 
> 
> 
> On Jan 29, 2013, at 10:32 AM, Renato Golin <renato.golin at linaro.org> wrote:
> 
>> Hi Nadav,
>> 
>> This is an entry level change, just to make sure everything is in the right place and the infrastructure is ready. The code change is trivial.
>> 
>> http://llvm-reviews.chandlerc.com/D345
>> 
>> I spent a bit more time on tests, to make sure the costs were correctly taken and exposing the instruction I expect to do the cast clearly stated. 
>> 
>> Both tests refer to the same source code, but one is vectorized and the other is not. With time, we can fill them with cost checks for other instructions, I just didn't do it because I wasn't sure they were correct (some I know aren't).
>> 
>> cheers,
>> --renato
>> <cast-cost.patch>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130129/3a6a6678/attachment.html>