[PATCH] D54073: [x86] allow vector load narrowing with multi-use values

Thu Nov 8 07:37:40 PST 2018

spatel added inline comments.

================
Comment at: test/CodeGen/X86/insert-into-constant-vector.ll:277
+; X32AVX-NEXT:    vmovdqa {{.*#+}} ymm1 = <42,1,2,3,4,5,6,u>
+; X32AVX-NEXT:    vinserti128 $1, %xmm0, %ymm1, %ymm0
 ; X32AVX-NEXT:    retl
----------------
RKSimon wrote:
> This doesn't look great?
Right - we can see this even in the existing codegen because in LowerINSERT_VECTOR_ELT:
  // If the vector is wider than 128 bits, extract the 128-bit subvector, insert
  // into that, and then insert the subvector back into the result.

But is there a better way to get a scalar into the high part of a wide vector?

Eg, AVX1 can't even splat from register to a 256-bit vector, so a shuffle-based-alternative would be:

```
	vmovd	%edi, %xmm0
	vpshufd	$36, %xmm0, %xmm0       ## xmm0 = xmm0[0,1,2,0]
	vinsertf128	$1, %xmm0, %ymm0, %ymm0
	vblendps	$127, LCPI0_0(%rip), %ymm0, %ymm0 ## ymm0 = mem[0,1,2,3,4,5,6],ymm0[7]

```

https://reviews.llvm.org/D54073