[PATCH] D50992: [InstCombine] try to fold insertelt + vector op into scalar op + insertelt

Mon Aug 20 16:35:10 PDT 2018

spatel added a comment.

In https://reviews.llvm.org/D50992#1206612, @efriedma wrote:

> I'm concerned the backend won't reliably reverse the transform. For integer operations, SelectionDAG heavily depends on IR types to decide whether to perform an operation in integer or SIMD registers, and transferring values between the two register files is slow.  Yes, the second of an insertelement is a scalar, but many backends pattern-match a load+insertelement to a vector register load.

If we add a load to the sequence, we have something like this:

  define <4 x i32> @vector_add_constant(i32* %p) {
    %x = load i32, i32* %p
    %ins = insertelement <4 x i32> undef, i32 %x, i32 0
    %bo = add <4 x i32> %ins, <i32 42, i32 42, i32 42, i32 42>
    ret <4 x i32> %bo
  }

  define <4 x i32> @scalar_add_constant(i32* %p) {
    %x = load i32, i32* %p
    %b = add i32 %x, 42
    %bo = insertelement <4 x i32> undef, i32 %b, i32 0
    ret <4 x i32> %bo
  }

And for x86, that's:

  vmovd	(%rdi), %xmm0           ## xmm0 = mem[0],zero,zero,zero
  vpaddd	LCPI4_0(%rip), %xmm0, %xmm0

vs.

  movl	(%rdi), %eax
  addl	$42, %eax
  vmovd	%eax, %xmm0

Let me see about adding a DAG reversal...

https://reviews.llvm.org/D50992