[PATCH] D62498: [x86] split 256-bit store of concatenated vectors

Mon May 27 15:05:06 PDT 2019

spatel marked 2 inline comments as done.
spatel added inline comments.

================
Comment at: llvm/test/CodeGen/X86/oddsubvector.ll:119-126
+; AVX-NEXT:    vmovaps (%rdi), %xmm0
+; AVX-NEXT:    vmovaps 16(%rdi), %xmm1
+; AVX-NEXT:    vmovaps 32(%rdi), %xmm2
+; AVX-NEXT:    vmovaps 48(%rdi), %xmm3
+; AVX-NEXT:    vmovaps %xmm2, 16(%rsi)
+; AVX-NEXT:    vmovaps %xmm3, (%rsi)
+; AVX-NEXT:    vmovaps %xmm0, 48(%rsi)
----------------
This seems like a failure of load combining? Even so, the split code has less uops than before even if the instruction count increased.

================
Comment at: llvm/test/CodeGen/X86/vector-gep.ll:211
 ; CHECK-NEXT:    retl $4
   %A = getelementptr i16, i16* %param, <64 x i32> %off
   ret <64 x i16*> %A
----------------
We're obviously spilling here, but I'm not sure what is happening underneath or if this is an important test for perf rather than just correctness/crashing.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62498/new/

https://reviews.llvm.org/D62498