[PATCH] Added more insertps optimizations

Fri May 16 04:59:59 PDT 2014

Hi Filipe,

I think there is a better way to fix this bug.

For example, you can simply add the following ISel patterns to X86InstrSSE.td instead of adding new target combine rules in X86ISelLowering.cpp.

  let Predicates = [UseSSE41] in {
    def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1), (loadv4f32 addr:$src2),
                  imm:$src3)),
              (INSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
    def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1), (X86PShufd (v4f32 
                   (scalar_to_vector (loadf32 addr:$src2))), (i8 0)), imm:$src3)),
              (INSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
  }

  let Predicates = [UseAVX] in {
    def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1), (loadv4f32 addr:$src2),
                  imm:$src3)),
              (VINSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
    def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1),
                  (X86VBroadcast (loadf32 addr:$src2)), imm:$src3)),
              (VINSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
  }

I am pretty sure that the four patterns above would cover all the interesting cases.
I personally prefer to have tablegen patterns instead of complicated target combine rules that would only trigger on the already legalized and immediately before instruction selection.

Also, you should add tests to verify the AVX codegen as well. Currently your new tests only verifies that we do the correct thing with SSE4.1. However, your patch also affects AVX (in fact, you specifically added combine rules for the case where the inserted element comes from a vbroadcast).

-Andrea

http://reviews.llvm.org/D3581