[PATCH] D110968: [X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 1 13:17:50 PDT 2021
lebedev.ri created this revision.
lebedev.ri added a reviewer: RKSimon.
lebedev.ri added a project: LLVM.
Herald added subscribers: ctetreau, pengfei, hiraditya.
lebedev.ri requested review of this revision.
While we already model this tuple, the store cost is divergent from reality, so fix it.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
For store we have:
https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D110968
Files:
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
Index: llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
===================================================================
--- llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
+++ llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
@@ -27,7 +27,7 @@
;
; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v3, i8* %out3, align 1
; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: store i8 %v3, i8* %out3, align 1
-; AVX2: LV: Found an estimated cost of 10 for VF 4 For instruction: store i8 %v3, i8* %out3, align 1
+; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: store i8 %v3, i8* %out3, align 1
; AVX2: LV: Found an estimated cost of 11 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1
; AVX2: LV: Found an estimated cost of 12 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1
; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1
Index: llvm/lib/Target/X86/X86TargetTransformInfo.cpp
===================================================================
--- llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -5145,7 +5145,7 @@
{3, MVT::v32i8, 13}, // interleave 3 x 32i8 into 96i8 (and store)
{4, MVT::v2i8, 4}, // interleave 4 x 2i8 into 8i8 (and store)
- {4, MVT::v4i8, 9}, // interleave 4 x 4i8 into 16i8 (and store)
+ {4, MVT::v4i8, 4}, // interleave 4 x 4i8 into 16i8 (and store)
{4, MVT::v8i8, 10}, // interleave 4 x 8i8 into 32i8 (and store)
{4, MVT::v16i8, 10}, // interleave 4 x 16i8 into 64i8 (and store)
{4, MVT::v32i8, 12}, // interleave 4 x 32i8 into 128i8 (and store)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D110968.376619.patch
Type: text/x-patch
Size: 1783 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211001/220456d7/attachment.bin>
More information about the llvm-commits
mailing list