[PATCH] D11063: [X86][SSE] Vectorized v4i32 non-uniform shifts.
Simon Pilgrim
llvm-dev at redking.me.uk
Fri Jul 10 06:46:59 PDT 2015
RKSimon added a comment.
Thanks Quentin, I'll commit the patch with updates later today. Comments below.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17395
@@ +17394,3 @@
+ Amt2 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 2, 2, 2, 2 });
+ Amt3 = DAG.getVectorShuffle(VT, dl, Amt, Amt, { 3, 3, 3, 3 });
+ } else {
----------------
qcolombet wrote:
> Wouldn’t it make sense to leave more freedom to the next optimizer with more undef indexes:
> 0, -1, -1, -1
> -1, 1, -1, -1
> etc.
>
> It looks to me that we have a too good idea of what the lowering should look like and we over-specify the data.
If we do this then the shuffle gets removed (so we don't remember that the other lanes are undef) meaning that we don't recognise it as a splat.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17399
@@ +17398,3 @@
+ default: llvm_unreachable("Unknown target vector shift node");
+ case ISD::SHL: Opc = X86ISD::VSHL; break;
+ case ISD::SRL: Opc = X86ISD::VSRL; break;
----------------
qcolombet wrote:
> Is this case reachable?
>
> I thought we were handling SHL v4i32 earlier in this function (line 17300).
> Though I guess it does not hurt to have it here.
The reason that SHL is there is that I've found that this looks like its faster for non-constants than the cvttps2dq/pmuludq approach on older (pre-SSE41) targets. I'm still testing this though (I don't have that wide a range of older hardware these days). so haven't made it the default yet. I'll add a comment.
Repository:
rL LLVM
http://reviews.llvm.org/D11063
More information about the llvm-commits
mailing list