[PATCH] [X86][SSE] Avoid scalarization of v2i64 vector shifts

Wed Mar 18 09:42:33 PDT 2015

Thanks Andrea, I'll update a new version of the patch later today.


REPOSITORY
  rL LLVM

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:16194
@@ +16193,3 @@
+  // per-lane and then shuffle the partial results back together.
+  if (VT == MVT::v2i64) {
+    // Splat the shift amounts so the scalar shifts above will catch it.
----------------
andreadb wrote:
> This would generate worse code if `Op.getOpcode() == ISD::SRA`.
> You should check that the opcode is not ISD::SRA. Otherwise, you would end up scalarizing two shifts.
Yes - i64 SRA isn't currently supported (and doesn't go through LowerShift at all atm) but I will add the check. Initial tests indicate that SRA would be faster for constant shifts (and all AVX2 implementations) - but as I said I'll deal with SRA properly in a future patch.

================
Comment at: test/CodeGen/X86/x86-shifts.ll:122-123
@@ -121,4 +121,4 @@
 ; CHECK: shr2_nosplat
-; CHECK-NOT:  psrlq
-; CHECK-NOT:  psrlq
-; CHECK:      ret
+; CHECK: psrlq
+; CHECK: psrlq
+; CHECK: ret
----------------
andreadb wrote:
> Could you please add a check for the shift count?
> Something like
> CHECK-DAG: psrlq $8
> CHECK-DAG: psrlq $1
> 
> It would be nice to also have checks for the two extra 'punpcklqdq shuffles that would be generated by your patch.
Yes - I'll add more complete CHECK lines for all my changes.

http://reviews.llvm.org/D8416

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/