[PATCH] D60852: Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to SCALAR_TO_VECTOR(Elt) for all SSE flavors

Thu Apr 18 07:45:26 PDT 2019

spatel added a comment.

In D60852#1471661 <https://reviews.llvm.org/D60852#1471661>, @Serge_Preis wrote:

> > If we are getting this right sometimes, then we might already have the transform that we want, but it is limited in some way that prevents getting the larger case. 
> >  I doubt that the loop itself is needed to demonstrate the problem because I see 'movd' codegen even with a loop as long as it is not unrolled.
>
> After more experimentation I tend to agree. Also the most basic case produces pinsrd even in a small kernel (https://gcc.godbolt.org/z/HAmNha), so will just create test out of it.

Not sure if this is minimal, but this seems to show the problem:

  define <2 x i64> @pinsr(i32 %x, i32 %y) {
    %ins1 = insertelement <4 x i32> <i32 undef, i32 0, i32 undef, i32 undef>, i32 %x, i32 0
    %ins2 = insertelement <4 x i32> <i32 undef, i32 0, i32 undef, i32 undef>, i32 %y, i32 0
    %b1 = bitcast <4 x i32> %ins1 to <2 x i64>
    %b2 = bitcast <4 x i32> %ins2 to <2 x i64>
    %r = shufflevector <2 x i64> %b1, <2 x i64> %b2, <2 x i32> <i32 0, i32 2>
    ret <2 x i64> %r
  }

  $ llc -o - pinsr.ll -mattr=sse4.2
  	pxor	%xmm1, %xmm1
  	pxor	%xmm0, %xmm0
  	pinsrd	$0, %edi, %xmm0
  	pinsrd	$0, %esi, %xmm1
  	punpcklqdq	%xmm1, %xmm0    ## xmm0 = xmm0[0],xmm1[0]
  	retq

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60852/new/

https://reviews.llvm.org/D60852