[PATCH] [X86] tranform insertps to blendps when possible for better performance

Sun Mar 1 09:24:50 PST 2015

In http://reviews.llvm.org/D7866#131618, @chandlerc wrote:

> I'm not really sold on doing this as a target combine. Is there some reason we can't just produce the desired insertps or blendps when lowering? This doesn't seem likely to come up only after doing some other shuffle lowering, but maybe I'm not seeing why.

Let me answer this question first before going to wrangle up some data on the other question:
I made this a target combine because I don't know how else to handle this case given our current intrinsic lowering:

  define <4 x float> @blendps(<4 x float> %x, <4 x float> %y) {
    %0 = tail call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %x, <4 x float> %y, i8 0)
    ret <4 x float> %0
  }

This is the IR produced when a programmer uses SSE intrinsics in C source. It directly becomes an INSERTPS node via:

  X86_INTRINSIC_DATA(sse41_insertps,    INTR_TYPE_3OP, X86ISD::INSERTPS, 0)

http://reviews.llvm.org/D7866

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/