[PATCH] D11760: [InstCombine] Fix SSE2/AVX2 vector shift by constant

Wed Aug 5 05:46:35 PDT 2015

andreadb accepted this revision.
andreadb added a comment.
This revision is now accepted and ready to land.

Hi Simon,

As pointed out in the comments below, I suggest to split this patch into two separate patches.
I'd like you to move the code that "combines" packed arithmetic shifts on a separate patch. That patch will also have to remove the target specific DAG combiner rules in the x86 backend (your patch will make those rules redundant).

It is okay in my opinion if this patch also fixes PR23821 (that fix is very small and probably makes sense to just have it in thius patch..) .

If you address the (minor) comments below, then the patch LGTM.

Thanks Simon!


================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:213
@@ +212,3 @@
+  if (CDV) {
+    // Shift amount vectors use the entire lower 64-bits integer.
+    auto VT = cast<VectorType>(CDV->getType());
----------------
I would probably be more specific and explicitly quote the architecture manual which says: "only the first 64-bits of a 128-bit count operand are checked to compute the count".
But it is up to you, That comment is probably already good enough :-).


================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:787-799
@@ -755,2 +786,15 @@
 
+  // Constant fold ashr( <A x Bi>, Ci ).
+  case Intrinsic::x86_sse2_psra_d:
+  case Intrinsic::x86_sse2_psra_w:
+  case Intrinsic::x86_sse2_psrai_d:
+  case Intrinsic::x86_sse2_psrai_w:
+  case Intrinsic::x86_avx2_psra_d:
+  case Intrinsic::x86_avx2_psra_w:
+  case Intrinsic::x86_avx2_psrai_d:
+  case Intrinsic::x86_avx2_psrai_w:
+    if (Value *V = SimplifyX86immshift(*II, *Builder, false, false))
+      return ReplaceInstUsesWith(*II, V);
+    break;
+
   // Constant fold lshr( <A x Bi>, Ci ).
----------------
Can this be committed in a separate patch?
For simplicity I would prefer if you just fix the problem with the shift count on this patch.

You can add rules for combining arithmetic shifts on a next patch. That new patch will also have to get rid of the (horrible) target specific DAG combine rules on psra(i) intrinsics that we currently run on x86 as part of 'PerformINTRINSIC_WO_CHAINCombine'.

================
Comment at: test/Transforms/InstCombine/x86-vector-shifts.ll:9-10
@@ +8,4 @@
+define <8 x i16> @sse2_psrai_w_0(<8 x i16> %v) nounwind readnone uwtable {
+; CHECK-LABEL: @sse2_psrai_w_0
+; CHECK: ret <8 x i16> %v
+  %1 = tail call <8 x i16> @llvm.x86.sse2.psrai.w(<8 x i16> %v, i32 0)
----------------
We should check that we don't have any instruction before the return.
We want to make sure that a shift-by-zero is folded away. In this case you can check that no shift is generated before the return statement (and that the tail call to the intrinsic function is no longer in the code).

You should do the same for all the other tests that  check the instcombine behavior for shift-by-zero.


Repository:
  rL LLVM

http://reviews.llvm.org/D11760