[PATCH] D11063: [X86][SSE] Vectorized v4i32 non-uniform shifts.

Thu Jul 9 16:15:46 PDT 2015

qcolombet accepted this revision.
qcolombet added a comment.
This revision is now accepted and ready to land.

Hi Simon,

LGTM with a few nitpicks to help coming back to the code.

Please commit directly your updated version.

Cheers,
-Quentin


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17383
@@ -17382,1 +17382,3 @@
 
+  // v4i32 Non Uniform Shifts
+  // If a constant shift amount we can shift each immediately,
----------------
Period.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17384
@@ +17383,3 @@
+  // v4i32 Non Uniform Shifts
+  // If a constant shift amount we can shift each immediately,
+  // else we need to zero-extend each shift amount to the lower i64 and
----------------
The wording is strange.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17392
@@ +17391,3 @@
+    if (ISD::isBuildVectorOfConstantSDNodes(Amt.getNode())) {
+      Amt0 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 0, 0, 0, 0 });
+      Amt1 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 1, 1, 1, 1 });
----------------
We could use UNDEF for the second operand, that should avoid the generic code to have to canonicalize it.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17395
@@ +17394,3 @@
+      Amt2 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 2, 2, 2, 2 });
+      Amt3 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 3, 3, 3, 3 });
+    } else {
----------------
Wouldn’t it make sense to leave more freedom to the next optimizer with more undef indexes:
0, -1, -1, -1
-1, 1, -1, -1
etc.

It looks to me that we have a too good idea of what the lowering should look like and we over-specify the data.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17398
@@ +17397,3 @@
+      switch (Opc) {
+      default: llvm_unreachable("Unknown target vector shift node");
+      case ISD::SHL: Opc = X86ISD::VSHL; break;
----------------
Maybe add a note saying that SHL v4i32 is handled earlier in this function.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17399
@@ +17398,3 @@
+      default: llvm_unreachable("Unknown target vector shift node");
+      case ISD::SHL: Opc = X86ISD::VSHL; break;
+      case ISD::SRL: Opc = X86ISD::VSRL; break;
----------------
Is this case reachable?

I thought we were handling SHL v4i32 earlier in this function (line 17300).
Though I guess it does not hurt to have it here.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17407
@@ +17406,3 @@
+      Amt2 = DAG.getVectorShuffle(VT, dl, Amt, Z, { 2, 6, -1, -1 });
+      Amt3 = DAG.getVectorShuffle(VT, dl, Amt, Z, { 3, 7, -1, -1 });
+    }
----------------
I guess you use 0, 4, then 1, 5, etc. instead of 0, 4, then 1, 4 etc. because masks are legal for shuffles.
If that is the case, then add a comment, if not, then maybe just use 4 for all the zero vector.

Also a comment saying that X86 shifts:
- Use only the 64 first bit of the register for the value of the amount.
- Shift all the lanes by the first amount (i.e., the first 64-bit like previously said), unlike LLVM shifts where each lane is shift by the related index.
would help reading the code.

Part of the information is at the being of the block, but I think it is a cryptic unless you know the actual instructions.

Right now, unless you have the intel documentation in front of you, this is not that easy to read.


Repository:
  rL LLVM

http://reviews.llvm.org/D11063