[PATCH] D12635: merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
Steve Canon via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 25 13:13:59 PDT 2015
scanon added inline comments.
Comment at: test/CodeGen/X86/MergeConsecutiveStores.ll:482
@@ -485,1 +481,3 @@
+; CHECK: vmovups %ymm0, 48(%rdi)
+; CHECK-NEXT: vmovups %ymm1, 80(%rdi)
; CHECK-NEXT: vzeroupper
Combining these stores is not an unambiguous win.
With 16B alignment (as here), you're introducing a cacheline-crossing penalty where there otherwise would not be one; even with unknown alignment, Intel's optimization manual recommends using two 16B stores, rather than a 32B store (11.6.2). Does anyone from Intel want to comment on the considerations here?
More information about the llvm-commits