<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">+1 for smaller tests. To verify that the cost model returns the right cost you can write something like:<div><br></div><div><br></div><div><div>target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:64:128-a0:0:64-n32-S64"</div><div>target triple = "armv7--linux-gnueabihf"</div><div><br></div><div>%T0 = type <4 x i16></div><div>%T1 = type <4 x i32></div><div>define void @func0(%T0* %loadaddr, %T1* %storeaddr) {</div><div>  %v0 = load %T0* %loadaddr</div><div>  %r = sext %T0 %v0 to %T1</div><div>  store %T1 %r, %T1* %storeaddr</div><div>  ret void</div><div>}</div><div><br></div><div>%T2 = type <4 x i16></div><div>%T3 = type <4 x i32></div><div>define void @func1(%T2* %loadaddr, %T2* %loadaddr2, %T3* %storeaddr) {</div><div>  %v0 = load %T2* %loadaddr</div><div>  %v1 = load %T2* %loadaddr2</div><div>  %r = sext %T2 %v0 to %T3</div><div>  %r2 = sext %T2 %v1 to %T3</div><div>  %r3 = mul %T3 %r, %r2 </div><div>  store %T3 %r3, %T3* %storeaddr</div><div>  ret void</div><div>}</div></div><div><div><br></div><div>Now, I choose those two examples for a reason. If we run</div><div><br></div><div>> llc -mcpu=cortex-a9 < x.ll</div><div><br></div><div>We get:</div><div><br></div><div><div>func0:                                  @ @func0</div><div>@ BB#0:</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>vldr<span class="Apple-tab-span" style="white-space:pre">        </span>d16, [r0]</div><div><span class="Apple-tab-span" style="white-space:pre">    </span>vmovl.s16<span class="Apple-tab-span" style="white-space:pre">   </span>q8, d16</div><div><span class="Apple-tab-span" style="white-space:pre">      </span>vst1.64<span class="Apple-tab-span" style="white-space:pre">     </span>{d16, d17}, [r1]</div><div><span class="Apple-tab-span" style="white-space:pre">     </span>bx<span class="Apple-tab-span" style="white-space:pre">  </span>lr</div><div><br></div><div>func1:                                  @ @func1</div><div>@ BB#0:</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>vldr<span class="Apple-tab-span" style="white-space:pre">        </span>d16, [r1]</div><div><span class="Apple-tab-span" style="white-space:pre">    </span>vldr<span class="Apple-tab-span" style="white-space:pre">        </span>d17, [r0]</div><div><span class="Apple-tab-span" style="white-space:pre">    </span>vmull.s16<span class="Apple-tab-span" style="white-space:pre">   </span>q8, d17, d16</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>vst1.64<span class="Apple-tab-span" style="white-space:pre">     </span>{d16, d17}, [r2]</div><div><span class="Apple-tab-span" style="white-space:pre">     </span>bx<span class="Apple-tab-span" style="white-space:pre">  </span>lr</div></div><div><br></div><div>In the case of func0 we pay for sign-extension while in func1 it can be merged into the arithmetic. The question is now which cost should we return: the optimistic or the pessimistic one? I would lean towards the optimistic one (as you do) but I think we should get an agreement on which one to use in such cases.</div><div><br></div><div>Thanks,</div><div>Arnold</div><div><br></div><div><div>On Jan 29, 2013, at 12:46 PM, Nadav Rotem <<a href="mailto:nrotem@apple.com">nrotem@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Hi Renato, </div><div><br></div><div>Thanks for working on this. I have some comments. </div><div><br></div><div>The changes to ARMTargetMachine.h are unrelated to the cost model. Lets commit them in a separate patch. </div><div><br></div><div>The code in ARMTTI::getCastInstrCos looks good.</div><div><br></div><div>+                                 ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT());</div><div><br></div><div>Did we pass the 80-col ?  I am not sure. </div><div><br></div><div>In your test cases you execute both LLC and OPT. If you are checking the LLC generates the right pattern then this test should be in tests/CodeGen/ARM/.  Can you make the tests smaller ?  You can write a two-line function in LL that takes the arguments and performs the operation on it. </div><div><br></div><div>Thanks,</div><div>Nadav </div><div><br></div><div><br></div><div><div>On Jan 29, 2013, at 10:32 AM, Renato Golin <<a href="mailto:renato.golin@linaro.org">renato.golin@linaro.org</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr"><div style="">Hi Nadav,</div><div style=""><br></div><div style="">This is an entry level change, just to make sure everything is in the right place and the infrastructure is ready. The code change is trivial.</div><div>

<br></div><a href="http://llvm-reviews.chandlerc.com/D345">http://llvm-reviews.chandlerc.com/D345</a><br><div><br></div><div style="">I spent a bit more time on tests, to make sure the costs were correctly taken and exposing the instruction I expect to do the cast clearly stated. </div>

<div style=""><br></div><div style="">Both tests refer to the same source code, but one is vectorized and the other is not. With time, we can fill them with cost checks for other instructions, I just didn't do it because I wasn't sure they were correct (some I know aren't).</div>

<div style=""><br></div><div style="">cheers,<br></div><div style="">--renato</div></div>

<span><cast-cost.patch></span></blockquote></div><br></div></blockquote></div><br></div></body></html>