[PATCH] D12635: merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)

Fri Sep 25 13:13:59 PDT 2015

scanon added inline comments.

================
Comment at: test/CodeGen/X86/MergeConsecutiveStores.ll:482
@@ -485,1 +481,3 @@
+; CHECK:      vmovups %ymm0, 48(%rdi)
+; CHECK-NEXT: vmovups %ymm1, 80(%rdi)
 ; CHECK-NEXT: vzeroupper
----------------
Combining these stores is not an unambiguous win.

With 16B alignment (as here), you're introducing a cacheline-crossing penalty where there otherwise would not be one; even with unknown alignment, Intel's optimization manual recommends using two 16B stores, rather than a 32B store (11.6.2).  Does anyone from Intel want to comment on the considerations here?


http://reviews.llvm.org/D12635