[PATCH] D54073: [x86] allow vector load narrowing with multi-use values
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 8 07:37:40 PST 2018
spatel added inline comments.
================
Comment at: test/CodeGen/X86/insert-into-constant-vector.ll:277
+; X32AVX-NEXT: vmovdqa {{.*#+}} ymm1 = <42,1,2,3,4,5,6,u>
+; X32AVX-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; X32AVX-NEXT: retl
----------------
RKSimon wrote:
> This doesn't look great?
Right - we can see this even in the existing codegen because in LowerINSERT_VECTOR_ELT:
// If the vector is wider than 128 bits, extract the 128-bit subvector, insert
// into that, and then insert the subvector back into the result.
But is there a better way to get a scalar into the high part of a wide vector?
Eg, AVX1 can't even splat from register to a 256-bit vector, so a shuffle-based-alternative would be:
```
vmovd %edi, %xmm0
vpshufd $36, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,2,0]
vinsertf128 $1, %xmm0, %ymm0, %ymm0
vblendps $127, LCPI0_0(%rip), %ymm0, %ymm0 ## ymm0 = mem[0,1,2,3,4,5,6],ymm0[7]
```
https://reviews.llvm.org/D54073
More information about the llvm-commits
mailing list