[PATCH] Optimization for certain shufflevector by using insertps.
elena.demikhovsky at intel.com
Thu Apr 24 00:18:16 PDT 2014
1) you can check INSERTPS mask outside NormalizeVectorShuffle(), where all other masks have been checked.
2) you can use insertps for v4i32 as well.
3) I think that folding load in insertps is not fully correct
you translate this IR to (add + insertps) commands.
%0 = load <4 x float>* %pb, align 16
%vecinit6 = shufflevector <4 x float> %a, <4 x float> %0, <4 x i32> <i32 0, i32 1, i32 2, i32 4>
insertps loads 4 bytes instead of 32. You lose exceptions. It is ok for OpenCL but other compilers can't ignore exceptions.
And in general, I'm not sure that
add + insertps-load-form is better than load + insertps
4) About tests: Why do you check X32 and X64 separately?
More information about the llvm-commits