[PATCH] D62498: [x86] split 256-bit store of concatenated vectors

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 27 15:05:06 PDT 2019

spatel marked 2 inline comments as done.
spatel added inline comments.

Comment at: llvm/test/CodeGen/X86/oddsubvector.ll:119-126
+; AVX-NEXT:    vmovaps (%rdi), %xmm0
+; AVX-NEXT:    vmovaps 16(%rdi), %xmm1
+; AVX-NEXT:    vmovaps 32(%rdi), %xmm2
+; AVX-NEXT:    vmovaps 48(%rdi), %xmm3
+; AVX-NEXT:    vmovaps %xmm2, 16(%rsi)
+; AVX-NEXT:    vmovaps %xmm3, (%rsi)
+; AVX-NEXT:    vmovaps %xmm0, 48(%rsi)
This seems like a failure of load combining? Even so, the split code has less uops than before even if the instruction count increased.

Comment at: llvm/test/CodeGen/X86/vector-gep.ll:211
 ; CHECK-NEXT:    retl $4
   %A = getelementptr i16, i16* %param, <64 x i32> %off
   ret <64 x i16*> %A
We're obviously spilling here, but I'm not sure what is happening underneath or if this is an important test for perf rather than just correctness/crashing.



More information about the llvm-commits mailing list