[llvm] 0e71ae6 - [X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs

Roman Lebedev via llvm-commits llvm-commits at lists.llvm.org
Sat Oct 2 03:52:44 PDT 2021


Author: Roman Lebedev
Date: 2021-10-02T13:40:21+03:00
New Revision: 0e71ae6da8f3142f453267d4f1668b0d6d77bec5

URL: https://github.com/llvm/llvm-project/commit/0e71ae6da8f3142f453267d4f1668b0d6d77bec5
DIFF: https://github.com/llvm/llvm-project/commit/0e71ae6da8f3142f453267d4f1668b0d6d77bec5.diff

LOG: [X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `24`.

For store we have:
https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110970

Added: 
    

Modified: 
    llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
    llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 588b42b7b454..6848987af900 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -5097,7 +5097,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
       {4, MVT::v2i8, 4},  // (load 8i8 and) deinterleave into 4 x 2i8
       {4, MVT::v4i8, 4},   // (load 16i8 and) deinterleave into 4 x 4i8
       {4, MVT::v8i8, 12},  // (load 32i8 and) deinterleave into 4 x 8i8
-      {4, MVT::v16i8, 39}, // (load 64i8 and) deinterleave into 4 x 16i8
+      {4, MVT::v16i8, 24}, // (load 64i8 and) deinterleave into 4 x 16i8
       {4, MVT::v32i8, 80}, // (load 128i8 and) deinterleave into 4 x 32i8
 
       {4, MVT::v2i16, 6}, // (load 8i16 and) deinterleave into 4 x 2i16
@@ -5147,7 +5147,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
       {4, MVT::v2i8, 4},  // interleave 4 x 2i8 into 8i8 (and store)
       {4, MVT::v4i8, 4},   // interleave 4 x 4i8 into 16i8 (and store)
       {4, MVT::v8i8, 4},  // interleave 4 x 8i8 into 32i8 (and store)
-      {4, MVT::v16i8, 10}, // interleave 4 x 16i8 into 64i8 (and store)
+      {4, MVT::v16i8, 8}, // interleave 4 x 16i8 into 64i8 (and store)
       {4, MVT::v32i8, 12}, // interleave 4 x 32i8 into 128i8 (and store)
 
       {4, MVT::v2i16, 2},  // interleave 4 x 2i16 into 8i16 (and store)

diff  --git a/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll b/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
index 2ef0fc3e3bfe..74b3860ad257 100644
--- a/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
+++ b/llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
@@ -29,7 +29,7 @@ target triple = "x86_64-unknown-linux-gnu"
 ; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction:   %v0 = load i8, i8* %in0, align 1
 ; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction:   %v0 = load i8, i8* %in0, align 1
 ; AVX2: LV: Found an estimated cost of 13 for VF 8 For instruction:   %v0 = load i8, i8* %in0, align 1
-; AVX2: LV: Found an estimated cost of 41 for VF 16 For instruction:   %v0 = load i8, i8* %in0, align 1
+; AVX2: LV: Found an estimated cost of 26 for VF 16 For instruction:   %v0 = load i8, i8* %in0, align 1
 ; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction:   %v0 = load i8, i8* %in0, align 1
 ;
 ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction:   %v0 = load i8, i8* %in0, align 1

diff  --git a/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll b/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
index 962beeb3dbec..477e03c439cc 100644
--- a/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
+++ b/llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
@@ -29,7 +29,7 @@ target triple = "x86_64-unknown-linux-gnu"
 ; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction:   store i8 %v3, i8* %out3, align 1
 ; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction:   store i8 %v3, i8* %out3, align 1
 ; AVX2: LV: Found an estimated cost of 5 for VF 8 For instruction:   store i8 %v3, i8* %out3, align 1
-; AVX2: LV: Found an estimated cost of 12 for VF 16 For instruction:   store i8 %v3, i8* %out3, align 1
+; AVX2: LV: Found an estimated cost of 10 for VF 16 For instruction:   store i8 %v3, i8* %out3, align 1
 ; AVX2: LV: Found an estimated cost of 16 for VF 32 For instruction:   store i8 %v3, i8* %out3, align 1
 ;
 ; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction:   store i8 %v3, i8* %out3, align 1


        


More information about the llvm-commits mailing list