[llvm] acb4595 - [X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs
Roman Lebedev via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 2 03:52:45 PDT 2021
Author: Roman Lebedev
Date: 2021-10-02T13:40:21+03:00
New Revision: acb459574afc344bcb676737496f3fa35b1f04c1
URL: https://github.com/llvm/llvm-project/commit/acb459574afc344bcb676737496f3fa35b1f04c1
DIFF: https://github.com/llvm/llvm-project/commit/acb459574afc344bcb676737496f3fa35b1f04c1.diff
LOG: [X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs
While we already model this tuple, the load cost is divergent from reality, so fix it.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `56`.
For store we have:
https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `12`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110971
Added:
Modified:
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
Removed:
################################################################################
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 6848987af900..a37993846fcd 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -5098,7 +5098,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8
{4, MVT::v8i8, 12}, // (load 32i8 and) deinterleave into 4 x 8i8
{4, MVT::v16i8, 24}, // (load 64i8 and) deinterleave into 4 x 16i8
- {4, MVT::v32i8, 80}, // (load 128i8 and) deinterleave into 4 x 32i8
+ {4, MVT::v32i8, 56}, // (load 128i8 and) deinterleave into 4 x 32i8
{4, MVT::v2i16, 6}, // (load 8i16 and) deinterleave into 4 x 2i16
{4, MVT::v4i16, 17}, // (load 16i16 and) deinterleave into 4 x 4i16
diff --git a/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll b/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
index 74b3860ad257..59a64814f6ba 100644
--- a/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
+++ b/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
@@ -30,7 +30,7 @@ target triple = "x86_64-unknown-linux-gnu"
; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
; AVX2: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
; AVX2: LV: Found an estimated cost of 26 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
-; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
+; AVX2: LV: Found an estimated cost of 60 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
;
; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
More information about the llvm-commits
mailing list