[PATCH] D11063: [X86][SSE] Vectorized v4i32 non-uniform shifts.

Fri Jul 10 06:46:59 PDT 2015

RKSimon added a comment.

Thanks Quentin, I'll commit the patch with updates later today. Comments below.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17395
@@ +17394,3 @@
+      Amt2 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 2, 2, 2, 2 });
+      Amt3 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 3, 3, 3, 3 });
+    } else {
----------------
qcolombet wrote:
> Wouldn’t it make sense to leave more freedom to the next optimizer with more undef indexes:
> 0, -1, -1, -1
> -1, 1, -1, -1
> etc.
> 
> It looks to me that we have a too good idea of what the lowering should look like and we over-specify the data.
If we do this then the shuffle gets removed (so we don't remember that the other lanes are undef) meaning that we don't recognise it as a splat.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17399
@@ +17398,3 @@
+      default: llvm_unreachable("Unknown target vector shift node");
+      case ISD::SHL: Opc = X86ISD::VSHL; break;
+      case ISD::SRL: Opc = X86ISD::VSRL; break;
----------------
qcolombet wrote:
> Is this case reachable?
> 
> I thought we were handling SHL v4i32 earlier in this function (line 17300).
> Though I guess it does not hurt to have it here.
The reason that SHL is there is that I've found that this looks like its faster for non-constants than the cvttps2dq/pmuludq approach on older (pre-SSE41) targets. I'm still testing this though (I don't have that wide a range of older hardware these days). so haven't made it the default yet. I'll add a comment.

Repository:
  rL LLVM

http://reviews.llvm.org/D11063