[all-commits] [llvm/llvm-project] 8a3c64: [X86][Costmodel] Load/store i8 Stride=3 VF=2 inter...
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Sat Oct 2 03:52:38 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 8a3c64c3a2393af4058103e7555a20e22151ca5d
https://github.com/llvm/llvm-project/commit/8a3c64c3a2393af4058103e7555a20e22151ca5d
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/WYscYMcW4 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.
For store we have:
https://godbolt.org/z/e9qvYdbbs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110956
Commit: f1df2d8eaf188eec2971b12e57c821a0db5f3a36
https://github.com/llvm/llvm-project/commit/f1df2d8eaf188eec2971b12e57c821a0db5f3a36
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/obWz3PrfK - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.
For store we have:
https://godbolt.org/z/orjPshn3h - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110958
Commit: d1460c88a6d8739920f86383ff7d17be3dc517f6
https://github.com/llvm/llvm-project/commit/d1460c88a6d8739920f86383ff7d17be3dc517f6
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.
For store we have:
https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110960
Commit: 448c939839992188000841bfaa6fcc6990e0fa2b
https://github.com/llvm/llvm-project/commit/448c939839992188000841bfaa6fcc6990e0fa2b
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs
For VF=16, costs are correct.
For VF=32, load cost is divergent.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `14`.
For store we have:
https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5`
So pick cost of `13`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110961
Commit: 935b9693aea14343aee3eced905590056c6579dc
https://github.com/llvm/llvm-project/commit/935b9693aea14343aee3eced905590056c6579dc
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
For store we have:
https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110966
Commit: ae08362cb8e60864a0505af47189d6a996cfb5d9
https://github.com/llvm/llvm-project/commit/ae08362cb8e60864a0505af47189d6a996cfb5d9
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs
While we already model this tuple, the store cost is divergent from reality, so fix it.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
For store we have:
https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110968
Commit: 74e4a0e327579bfc3b00f6af0c9fd408c5843e8b
https://github.com/llvm/llvm-project/commit/74e4a0e327579bfc3b00f6af0c9fd408c5843e8b
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/v7746Wcf7 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `12`.
For store we have:
https://godbolt.org/z/aEeEohEbP - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110969
Commit: 0e71ae6da8f3142f453267d4f1668b0d6d77bec5
https://github.com/llvm/llvm-project/commit/0e71ae6da8f3142f453267d4f1668b0d6d77bec5
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `24`.
For store we have:
https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110970
Commit: acb459574afc344bcb676737496f3fa35b1f04c1
https://github.com/llvm/llvm-project/commit/acb459574afc344bcb676737496f3fa35b1f04c1
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs
While we already model this tuple, the load cost is divergent from reality, so fix it.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `56`.
For store we have:
https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `12`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110971
Compare: https://github.com/llvm/llvm-project/compare/ac7031b2b2fa...acb459574afc
More information about the All-commits
mailing list