[all-commits] [llvm/llvm-project] 3e93fc: [X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 ...
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Mon Oct 4 04:41:16 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 3e93fcdfc893b1cd365126876b81b32b54446d9f
https://github.com/llvm/llvm-project/commit/3e93fcdfc893b1cd365126876b81b32b54446d9f
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.
For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111019
Commit: a93411c3afc76a4dd4436829d07eb65f61c2188e
https://github.com/llvm/llvm-project/commit/a93411c3afc76a4dd4436829d07eb65f61c2188e
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.
For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111020
Commit: 198aa84973e6d5f9cdc7b241c4dc9880d63a5b5c
https://github.com/llvm/llvm-project/commit/198aa84973e6d5f9cdc7b241c4dc9880d63a5b5c
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-float.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.
For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111021
Commit: 4ca5bc07af0685fbbe04e8731b4ab37354368c84
https://github.com/llvm/llvm-project/commit/4ca5bc07af0685fbbe04e8731b4ab37354368c84
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.
For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111022
Commit: d3bbe781ea8e6e968ad4be2eb3aa5eedb168a4a8
https://github.com/llvm/llvm-project/commit/d3bbe781ea8e6e968ad4be2eb3aa5eedb168a4a8
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.
For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111025
Commit: eb9a694c1744f6a1608faf7daa79244bd1e45248
https://github.com/llvm/llvm-project/commit/eb9a694c1744f6a1608faf7daa79244bd1e45248
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs
This one required quite a bit of assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.
For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111026
Commit: ede0611e792c90acbe528ca7895377195a1bbadf
https://github.com/llvm/llvm-project/commit/ede0611e792c90acbe528ca7895377195a1bbadf
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs
This one required quite a bit of assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.
For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111027
Commit: cef0a693b6373764dc5483ef3b4523e68a812972
https://github.com/llvm/llvm-project/commit/cef0a693b6373764dc5483ef3b4523e68a812972
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-04 (Mon, 04 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs
This required huge amount of assembly surgery, but i think this is about right.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.
For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111031
Compare: https://github.com/llvm/llvm-project/compare/4fc2f4979cf5...cef0a693b637
More information about the All-commits
mailing list