[llvm] r263069 - [x86] fix cost model inaccuracy for vector memory ops
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 9 14:23:33 PST 2016
Author: spatel
Date: Wed Mar 9 16:23:33 2016
New Revision: 263069
URL: http://llvm.org/viewvc/llvm-project?rev=263069&view=rev
Log:
[x86] fix cost model inaccuracy for vector memory ops
The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.
Differential Revision: http://reviews.llvm.org/D18000
Modified:
llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/trunk/test/Transforms/LoopVectorize/X86/avx1.ll
Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=263069&r1=263068&r2=263069&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Wed Mar 9 16:23:33 2016
@@ -983,10 +983,10 @@ int X86TTIImpl::getMemoryOpCost(unsigned
// Each load/store unit costs 1.
int Cost = LT.first * 1;
- // On Sandybridge 256bit load/stores are double pumped
- // (but not on Haswell).
- if (LT.second.getSizeInBits() > 128 && !ST->hasAVX2())
- Cost*=2;
+ // This isn't exactly right. We're using slow unaligned 32-byte accesses as a
+ // proxy for a double-pumped AVX memory interface such as on Sandybridge.
+ if (LT.second.getStoreSize() == 32 && ST->isUnalignedMem32Slow())
+ Cost *= 2;
return Cost;
}
Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/avx1.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/X86/avx1.ll?rev=263069&r1=263068&r2=263069&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/LoopVectorize/X86/avx1.ll (original)
+++ llvm/trunk/test/Transforms/LoopVectorize/X86/avx1.ll Wed Mar 9 16:23:33 2016
@@ -26,10 +26,10 @@ define i32 @read_mod_write_single_ptr(fl
ret i32 undef
}
-;;; FIXME: If 32-byte accesses are fast, this should use a <4 x i64> load.
; CHECK-LABEL: @read_mod_i64(
-; CHECK: load <2 x i64>
+; SLOWMEM32: load <2 x i64>
+; FASTMEM32: load <4 x i64>
; CHECK: ret i32
define i32 @read_mod_i64(i64* nocapture %a, i32 %n) nounwind uwtable ssp {
%1 = icmp sgt i32 %n, 0
More information about the llvm-commits
mailing list