[PATCH] D110961: [X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 1 12:28:20 PDT 2021
lebedev.ri created this revision.
lebedev.ri added a reviewer: RKSimon.
lebedev.ri added a project: LLVM.
Herald added subscribers: pengfei, hiraditya.
lebedev.ri requested review of this revision.
For VF=16, costs are correct.
For VF=32, load cost is divergent.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `14`.
For store we have:
https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5`
So pick cost of `13`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D110961
Files:
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
Index: llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
===================================================================
--- llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
+++ llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
@@ -30,7 +30,7 @@
; AVX2: LV: Found an estimated cost of 6 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
; AVX2: LV: Found an estimated cost of 9 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
-; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
+; AVX2: LV: Found an estimated cost of 17 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
;
; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
; AVX512: LV: Found an estimated cost of 4 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
Index: llvm/lib/Target/X86/X86TargetTransformInfo.cpp
===================================================================
--- llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -5090,7 +5090,7 @@
{3, MVT::v4i8, 3}, // (load 12i8 and) deinterleave into 3 x 4i8
{3, MVT::v8i8, 6}, // (load 24i8 and) deinterleave into 3 x 8i8
{3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8
- {3, MVT::v32i8, 13}, // (load 96i8 and) deinterleave into 3 x 32i8
+ {3, MVT::v32i8, 14}, // (load 96i8 and) deinterleave into 3 x 32i8
{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D110961.376604.patch
Type: text/x-patch
Size: 1723 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211001/85d8caa8/attachment.bin>
More information about the llvm-commits
mailing list