<div class="gmail_extra"><div class="gmail_quote">On Tue, Apr 24, 2012 at 11:58 AM, Lang Hames <span dir="ltr"><<a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Author: lhames<br>

Date: Tue Apr 24 13:58:36 2012<br>

New Revision: 155468<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=155468&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=155468&view=rev</a><br>

Log:<br>

Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. This fixes<br>

<rdar://problem/11291436>.<br></blockquote><div><br></div><div>Sorry I hadn't yet replied, but I really don't think this is the right fix...</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

+  case Intrinsic::arm_neon_vmulls:<br>

+  case Intrinsic::arm_neon_vmullu: {<br>

+    // Zext/sext intrinsic operands according to the intrinsic type, then try to<br>

+    // simplify them. This lets us try a SimplifyMulInst on the extended<br>

+    // operands. If the zext/sext instructions are unused when we're done then<br>

+    // delete them from the block.<br>

+    Value* Arg0 = II->getArgOperand(0);<br>

+    Value* Arg1 = II->getArgOperand(1);<br>

+    bool Zext = (II->getIntrinsicID() == Intrinsic::arm_neon_vmullu);<br>

+    Instruction *Arg0W =<br>

+      Zext ? CastInst::CreateZExtOrBitCast(Arg0, II->getType(), "", II) :<br>

+             CastInst::CreateSExtOrBitCast(Arg0, II->getType(), "", II);<br>

+    Value* Arg0WS = SimplifyInstruction(Arg0W);<br>

+    if (Arg0WS == 0) // If simplification fails just pass through the ext'd val.<br>

+      Arg0WS = Arg0W;<br></blockquote><div><br></div><div>Why are you simplifying these incrementally? SimplifyInstruction is not at all free, it can recurse deeply and add compile time with little benefit.</div><div><br>

</div><div>I would just check if they are the right size, and if not, extend them. We'll simplify everything if there are good ways to do so after-the-fact.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


+    Instruction *Arg1W =<br>

+      Zext ? CastInst::CreateZExtOrBitCast(Arg1, II->getType(), "", II) :<br>

+             CastInst::CreateSExtOrBitCast(Arg1, II->getType(), "", II);<br>

+    Value* Arg1WS = SimplifyInstruction(Arg1W);<br>

+    if (Arg1WS == 0)<br>

+      Arg1WS = Arg1W;<br></blockquote><div><br></div><div>Ditto.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

+    Instruction *SimplifiedInst = 0;<br>

+    if (Value* V = SimplifyMulInst(Arg0WS, Arg1WS, TD)) {<br></blockquote><div><br></div><div>I think this is an unnecessary gymnastic just to re-use the InstSimplify logic.</div><div><br></div><div>InstSimplify's constraints (as you pointed out in the review thread) are that it cannot form new instructions. That doesn't mean that the right approach in InstCombine is to form additional instructions that we think might be "cheap", and then see if InstSimplify can simplify through them. Is this a pattern already established elsewhere in InstCombine? I haven't seen it, but there is a lot of InstCombine I've not looked at...</div>

<div><br></div><div>Instead, I'd rather write the actual analysis your interested in performing on these intrinsics in InstCombine, with the knowledge (and taking the action) of extending as necessary.</div><div><br></div>

<div>The specific advantages I see are:</div><div><br></div><div>- Don't thrash the use lists by adding no-op instructions</div><div>- Don't allocate, insert, use, then delete instructions in a common case (no simplification).</div>

<div>- Able to extend the analysis in other ways that require forming new instructions.</div><div><br></div><div>Is there some particularly complex logic in SimplifyMulInst that you're trying to re-use? Maybe there is a way to factor that into something shared by both InstCombine and InstSimplify?</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

+      SimplifiedInst = ReplaceInstUsesWith(CI, V);<br>

+    }<br>

+    if (Arg0W->use_empty())<br>

+      Arg0W->eraseFromParent();<br>

+    if (Arg1W->use_empty())<br>

+      Arg1W->eraseFromParent();<br>

+    if (SimplifiedInst != 0)<br>

+      return SimplifiedInst;<br>

+    break;<br>

+  }<br>

+<br>

   case Intrinsic::stackrestore: {<br>

     // If the save is right next to the restore, remove the restore.  This can<br>

     // happen when variable allocas are DCE'd.<br>

<br>

Added: llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll?rev=155468&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll?rev=155468&view=auto</a><br>


==============================================================================<br>

--- llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll (added)<br>

+++ llvm/trunk/test/Transforms/InstCombine/2012-04-23-Neon-Intrinsics.ll Tue Apr 24 13:58:36 2012<br>

@@ -0,0 +1,68 @@<br>

+target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"<br>

+target triple = "thumbv7-apple-ios0"<br>

+<br>

+; RUN: opt -S -instcombine < %s | FileCheck %s<br>

+<br>

+define <4 x i32> @mulByZero(<4 x i16> %x) nounwind readnone ssp {<br>

+entry:<br>

+  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %x, <4 x i16> zeroinitializer) nounwind<br>

+  ret <4 x i32> %a<br>

+; CHECK: entry:<br>

+; CHECK-NEXT: ret <4 x i32> zeroinitializer<br>

+}<br>

+<br>

+define <4 x i32> @mulByOne(<4 x i16> %x) nounwind readnone ssp {<br>

+entry:<br>

+  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %x, <4 x i16> <i16 1, i16 1, i16 1, i16 1>) nounwind<br>

+  ret <4 x i32> %a<br>

+; CHECK: entry:<br>

+; CHECK-NEXT: %0 = sext <4 x i16> %x to <4 x i32><br>

+; CHECK-NEXT: ret <4 x i32> %0<br>

+}<br>

+<br>

+define <4 x i32> @constantMul() nounwind readnone ssp {<br>

+entry:<br>

+  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 3, i16 3, i16 3, i16 3>, <4 x i16> <i16 2, i16 2, i16 2, i16 2>) nounwind<br>

+  ret <4 x i32> %a<br>

+; CHECK: entry:<br>

+; CHECK-NEXT: ret <4 x i32> <i32 6, i32 6, i32 6, i32 6><br>

+}<br>

+<br>

+define <4 x i32> @constantMulS() nounwind readnone ssp {<br>

+entry:<br>

+  %b = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1>) nounwind<br>

+  ret <4 x i32> %b<br>

+; CHECK: entry:<br>

+; CHECK-NEXT: ret <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1><br>

+}<br>

+<br>

+define <4 x i32> @constantMulU() nounwind readnone ssp {<br>

+entry:<br>

+  %b = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1>) nounwind<br>

+  ret <4 x i32> %b<br>

+; CHECK: entry:<br>

+; CHECK-NEXT: ret <4 x i32> <i32 65535, i32 65535, i32 65535, i32 65535><br>

+}<br>

+<br>

+define <4 x i32> @complex1(<4 x i16> %x) nounwind readnone ssp {<br>

+entry:<br>

+  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 2, i16 2, i16 2, i16 2>, <4 x i16> %x) nounwind<br>

+  %b = add <4 x i32> zeroinitializer, %a<br>

+  ret <4 x i32> %b<br>

+; CHECK: entry:<br>

+; CHECK-NEXT: %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 2, i16 2, i16 2, i16 2>, <4 x i16> %x) nounwind<br>

+; CHECK-NEXT: ret <4 x i32> %a<br>

+}<br>

+<br>

+define <4 x i32> @complex2(<4 x i32> %x) nounwind readnone ssp {<br>

+entry:<br>

+  %a = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> <i16 3, i16 3, i16 3, i16 3>, <4 x i16> <i16 2, i16 2, i16 2, i16 2>) nounwind<br>

+  %b = add <4 x i32> %x, %a<br>

+  ret <4 x i32> %b<br>

+; CHECK: entry:<br>

+; CHECK-NEXT: %b = add <4 x i32> %x, <i32 6, i32 6, i32 6, i32 6><br>

+; CHECK-NEXT: ret <4 x i32> %b<br>

+}<br>

+<br>

+declare <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16>, <4 x i16>) nounwind readnone<br>

+declare <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16>, <4 x i16>) nounwind readnone<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div>