[PATCH] D38316: [InstCombine] replace bitcast to scalar + insertelement with widening shuffle + vector bitcast

Wed Sep 27 13:30:11 PDT 2017

spatel added a comment.

In https://reviews.llvm.org/D38316#882545, @efriedma wrote:

> I meant, "how do we fix x86 in the general case"? Consider the following (with -mtriple=x86_64 -mattr=+xop):

Ah! Sorry, I misread the question.

> 
> 
>   define <8 x i64> @test(i32 %x0, i32 %x1, <8 x i64> %v) {
>     %1 = insertelement <2 x i32> undef, i32 %x0, i32 0
>     %2 = insertelement <2 x i32> %1, i32 %x1, i32 1
>     %3 = bitcast <2 x i32> %2 to i64
>     %4 = insertelement <8 x i64> %v, i64 %3, i32 0
>     ret <8 x i64> %4
>   }
> 
> 
> We currently generate a five-instruction sequence for something which can be done in two instructions.  And the instcombine here won't trigger.

Ok - so this example is an extension of a different instcombine that I was originally thinking of to solve this case (https://bugs.llvm.org/show_bug.cgi?id=34716#c1). We could trade an insert for a bitcast:

define <8 x i64> @test_not_undef_bc(i32 %x0, i32 %x1, <8 x i64> %v) {

    %bc = bitcast <8 x i64> %v to <16 x i32>
    %i1 = insertelement <16 x i32> %bc, i32 %x0, i32 0
    %i2 = insertelement <16 x i32> %i1, i32 %x1, i32 1
    %bc2 = bitcast <16 x i32> %i2 to <8 x i64>
    ret <8 x i64> %bc2
  }

I think we'd get that by applying a fold like:
 (ins (bitcast (ins ))) --> (bitcast (ins (bitcast ))) 
...twice. That's still 3 instructions though?

  vpinsrd	$0, %edi, %xmm0, %xmm2
  vpinsrd	$1, %esi, %xmm2, %xmm2
  vblendps	$15, %ymm2, %ymm0, %ymm0 ## ymm0 = ymm2[0,1,2,3],ymm0[4,5,6,7]

https://reviews.llvm.org/D38316