<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi Jan, <div><br></div><div>The IR may contain <4 x i8> to <4 x i64> sext conversions. This patch optimizes it from 8 cycles to 6. I am not sure why Muhammad is interested in this pattern.  </div><div><br></div><div>Nadav  </div><div><br></div><div><br><div><div>On Mar 20, 2013, at 6:57 AM, Jan Sjodin <<a href="mailto:jan_sjodin@yahoo.com">jan_sjodin@yahoo.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;">Is there a reason to expand it to <4 x i64> instead of <4 x i32>, and if so, shouldn't <4 x i32> be expanded as well? Would it be equally good to expand to <4 x i32> since not all processors have 256-bit registers?<br><br><br>- Jan<br><br><br><blockquote type="cite">________________________________<br>From: Nadav Rotem <<a href="mailto:nrotem@apple.com">nrotem@apple.com</a>><br>To: <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><span class="Apple-converted-space"> </span><br>Sent: Tuesday, March 19, 2013 2:38 PM<br>Subject: [llvm] r177421 - Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.<br><br>Author: nadav<br>Date: Tue Mar 19 13:38:27 2013<br>New Revision: 177421<br><br>URL: <a href="http://llvm.org/viewvc/llvm-project?rev=177421&view=rev">http://llvm.org/viewvc/llvm-project?rev=177421&view=rev</a><br>Log:<br>Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.<br>Patch by Ahmad, Muhammad T <<a href="mailto:muhammad.t.ahmad@intel.com">muhammad.t.ahmad@intel.com</a>><br><br><br>Modified:<br>   <span class="Apple-converted-space"> </span>llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>   <span class="Apple-converted-space"> </span>llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp<br>   <span class="Apple-converted-space"> </span>llvm/trunk/test/Analysis/CostModel/X86/cast.ll<br>   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/avx-sext.ll<br><br>Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=177421&r1=177420&r2=177421&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=177421&r1=177420&r2=177421&view=diff</a><br>==============================================================================<br>--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)<br>+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Mar 19 13:38:27 2013<br>@@ -11827,8 +11827,23 @@ SDValue X86TargetLowering::LowerSIGN_EXT<br>       // fall through<br>     case MVT::v4i32:<br>     case MVT::v8i16: {<br>-      SDValue Tmp1 = getTargetVShiftNode(X86ISD::VSHLI, dl, VT,<br>-                                         Op.getOperand(0), ShAmt, DAG);<br>+      // (sext (vzext x)) -> (vsext x)<br>+      SDValue Op0 = Op.getOperand(0);<br>+      SDValue Op00 = Op0.getOperand(0);<br>+      SDValue Tmp1;<br>+      // Hopefully, this VECTOR_SHUFFLE is just a VZEXT.<br>+      if (Op0.getOpcode() == ISD::BITCAST &&<br>+          Op00.getOpcode() == ISD::VECTOR_SHUFFLE)<br>+        Tmp1 = LowerVectorIntExtend(Op00, DAG);<br>+      if (Tmp1.getNode()) {<br>+        SDValue Tmp1Op0 = Tmp1.getOperand(0);<br>+        assert(Tmp1Op0.getOpcode() == X86ISD::VZEXT &&<br>+               "This optimization is invalid without a VZEXT.");<br>+        return DAG.getNode(X86ISD::VSEXT, dl, VT, Tmp1Op0.getOperand(0));<br>+      }<br>+<br>+      // If the above didn't work, then just use Shift-Left + Shift-Right.<br>+      Tmp1 = getTargetVShiftNode(X86ISD::VSHLI, dl, VT, Op0, ShAmt, DAG);<br>       return getTargetVShiftNode(X86ISD::VSRAI, dl, VT, Tmp1, ShAmt, DAG);<br>     }<br>   }<br><br>Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=177421&r1=177420&r2=177421&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=177421&r1=177420&r2=177421&view=diff</a><br>==============================================================================<br>--- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)<br>+++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Tue Mar 19 13:38:27 2013<br>@@ -257,8 +257,8 @@ unsigned X86TTI::getCastInstrCost(unsign<br>     { ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i1,  6 },<br>     { ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i1,  9 },<br>     { ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i1,  8 },<br>-    { ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8,  8 },<br>-    { ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 8 },<br>+    { ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8,  6 },<br>+    { ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 6 },<br>     { ISD::TRUNCATE,    MVT::v8i32, MVT::v8i64, 3 },<br>   };<br><br><br>Modified: llvm/trunk/test/Analysis/CostModel/X86/cast.ll<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/cast.ll?rev=177421&r1=177420&r2=177421&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/cast.ll?rev=177421&r1=177420&r2=177421&view=diff</a><br>==============================================================================<br>--- llvm/trunk/test/Analysis/CostModel/X86/cast.ll (original)<br>+++ llvm/trunk/test/Analysis/CostModel/X86/cast.ll Tue Mar 19 13:38:27 2013<br>@@ -44,9 +44,9 @@ define i32 @zext_sext(<8 x i1> %in) {<br>   %B = zext <8 x i16> undef to <8 x i32><br>   ;CHECK: cost of 1 {{.*}} sext<br>   %C = sext <4 x i32> undef to <4 x i64><br>-  ;CHECK: cost of 8 {{.*}} sext<br>+  ;CHECK: cost of 6 {{.*}} sext<br>   %C1 = sext <4 x i8> undef to <4 x i64><br>-  ;CHECK: cost of 8 {{.*}} sext<br>+  ;CHECK: cost of 6 {{.*}} sext<br>   %C2 = sext <4 x i16> undef to <4 x i64><br><br>   ;CHECK: cost of 1 {{.*}} zext<br><br>Modified: llvm/trunk/test/CodeGen/X86/avx-sext.ll<br>URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-sext.ll?rev=177421&r1=177420&r2=177421&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-sext.ll?rev=177421&r1=177420&r2=177421&view=diff</a><br>==============================================================================<br>--- llvm/trunk/test/CodeGen/X86/avx-sext.ll (original)<br>+++ llvm/trunk/test/CodeGen/X86/avx-sext.ll Tue Mar 19 13:38:27 2013<br>@@ -165,3 +165,24 @@ define <4 x i64> @sext_4i8_to_4i64(<4 x<br>   ret <4 x i64> %extmask<br>}<br><br>+; AVX: sext_4i8_to_4i64<br>+; AVX: vpmovsxbd<br>+; AVX: vpmovsxdq<br>+; AVX: vpmovsxdq<br>+; AVX: ret<br>+define <4 x i64> @load_sext_4i8_to_4i64(<4 x i8> *%ptr) {<br>+ %X = load <4 x i8>* %ptr<br>+ %Y = sext <4 x i8> %X to <4 x i64><br>+ ret <4 x i64>%Y<br>+}<br>+<br>+; AVX: sext_4i16_to_4i64<br>+; AVX: vpmovsxwd<br>+; AVX: vpmovsxdq<br>+; AVX: vpmovsxdq<br>+; AVX: ret<br>+define <4 x i64> @load_sext_4i16_to_4i64(<4 x i16> *%ptr) {<br>+ %X = load <4 x i16>* %ptr<br>+ %Y = sext <4 x i16> %X to <4 x i64><br>+ ret <4 x i64>%Y<br>+}<br><br><br>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</blockquote></div></blockquote></div><br></div></body></html>