[llvm] 8a3c64c - [X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs

Sat Oct 2 03:52:31 PDT 2021

Author: Roman Lebedev
Date: 2021-10-02T13:39:05+03:00
New Revision: 8a3c64c3a2393af4058103e7555a20e22151ca5d

URL: https://github.com/llvm/llvm-project/commit/8a3c64c3a2393af4058103e7555a20e22151ca5d
DIFF: https://github.com/llvm/llvm-project/commit/8a3c64c3a2393af4058103e7555a20e22151ca5d.diff

LOG: [X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/WYscYMcW4 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/e9qvYdbbs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110956

Added: 
    

Modified: 
    llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
    llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 2e41e61a370b9..4c3f4da0f22a1 100644

--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -5086,7 +5086,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
       {2, MVT::v8i64, 8}, // (load 16i64 and) deinterleave into 2 x 8i64
       {2, MVT::v16i64, 16}, // (load 32i64 and) deinterleave into 2 x 16i64
 
-      {3, MVT::v2i8, 10},  // (load 6i8 and) deinterleave into 3 x 2i8
+      {3, MVT::v2i8, 3},  // (load 6i8 and) deinterleave into 3 x 2i8
       {3, MVT::v4i8, 4},   // (load 12i8 and) deinterleave into 3 x 4i8
       {3, MVT::v8i8, 9},   // (load 24i8 and) deinterleave into 3 x 8i8
       {3, MVT::v16i8, 11}, // (load 48i8 and) deinterleave into 3 x 16i8
@@ -5138,7 +5138,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
       {2, MVT::v8i64, 8}, // interleave 2 x 8i64 into 16i64 (and store)
       {2, MVT::v16i64, 16}, // interleave 2 x 16i64 into 32i64 (and store)
 
-      {3, MVT::v2i8, 7},   // interleave 3 x 2i8 into 6i8 (and store)
+      {3, MVT::v2i8, 4},   // interleave 3 x 2i8 into 6i8 (and store)
       {3, MVT::v4i8, 8},   // interleave 3 x 4i8 into 12i8 (and store)
       {3, MVT::v8i8, 11},  // interleave 3 x 8i8 into 24i8 (and store)
       {3, MVT::v16i8, 11}, // interleave 3 x 16i8 into 48i8 (and store)

diff  --git a/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll b/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
index 01ae3331458b8..ed6daac56b3a0 100644
--- a/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
+++ b/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
@@ -26,7 +26,7 @@ target triple = "x86_64-unknown-linux-gnu"
 ; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction:   %v0 = load i8, i8* %in0, align 1
 ;
 ; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction:   %v0 = load i8, i8* %in0, align 1
-; AVX2: LV: Found an estimated cost of 13 for VF 2 For instruction:   %v0 = load i8, i8* %in0, align 1
+; AVX2: LV: Found an estimated cost of 6 for VF 2 For instruction:   %v0 = load i8, i8* %in0, align 1
 ; AVX2: LV: Found an estimated cost of 7 for VF 4 For instruction:   %v0 = load i8, i8* %in0, align 1
 ; AVX2: LV: Found an estimated cost of 12 for VF 8 For instruction:   %v0 = load i8, i8* %in0, align 1
 ; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction:   %v0 = load i8, i8* %in0, align 1

diff  --git a/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll b/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll
index ce436a2501c9e..c0da09516b810 100644
--- a/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll
+++ b/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll
@@ -26,7 +26,7 @@ target triple = "x86_64-unknown-linux-gnu"
 ; AVX1: LV: Found an estimated cost of 249 for VF 32 For instruction:   store i8 %v2, i8* %out2, align 1
 ;
 ; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction:   store i8 %v2, i8* %out2, align 1
-; AVX2: LV: Found an estimated cost of 10 for VF 2 For instruction:   store i8 %v2, i8* %out2, align 1
+; AVX2: LV: Found an estimated cost of 7 for VF 2 For instruction:   store i8 %v2, i8* %out2, align 1
 ; AVX2: LV: Found an estimated cost of 11 for VF 4 For instruction:   store i8 %v2, i8* %out2, align 1
 ; AVX2: LV: Found an estimated cost of 14 for VF 8 For instruction:   store i8 %v2, i8* %out2, align 1
 ; AVX2: LV: Found an estimated cost of 13 for VF 16 For instruction:   store i8 %v2, i8* %out2, align 1