[PATCH] Optimization for certain shufflevector by using insertps.

Elena Demikhovsky elena.demikhovsky at intel.com
Thu Apr 24 00:18:16 PDT 2014


Hi Filipe,

1) you can check INSERTPS mask outside NormalizeVectorShuffle(), where all other masks have been checked.
2) you can use insertps for v4i32 as well.

3) I think that folding load in insertps is not fully correct

you translate this IR to (add +  insertps) commands.

  %0 = load <4 x float>* %pb, align 16
  %vecinit6 = shufflevector <4 x float> %a, <4 x float> %0, <4 x i32> <i32 0, i32 1, i32 2, i32 4>

insertps loads 4 bytes instead of 32. You lose exceptions. It is ok for OpenCL but other compilers can't ignore exceptions.

And in general, I'm not sure that
add + insertps-load-form is better than load + insertps

4) About tests: Why do you check X32 and X64 separately?

-  Elena

http://reviews.llvm.org/D3475






More information about the llvm-commits mailing list