[all-commits] [llvm/llvm-project] 45caac: [X86][Costmodel] Load/store i16 Stride=4 VF=2 inte...
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Mon Sep 27 12:20:30 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 45caac91c4e0caf64ec933f35c4a2d86a3fa31e3
https://github.com/llvm/llvm-project/commit/45caac91c4e0caf64ec933f35c4a2d86a3fa31e3
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-09-27 (Mon, 27 Sep 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.
For store we have:
https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110536
Commit: df2b42d12e4b4ff18bec8460c3d6ede6b411c048
https://github.com/llvm/llvm-project/commit/df2b42d12e4b4ff18bec8460c3d6ede6b411c048
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-09-27 (Mon, 27 Sep 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5`
So pick cost of `17`.
For store we have:
https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110537
Commit: 5615d6a6dd3f904cc9e1a219bfaf7df8183ee765
https://github.com/llvm/llvm-project/commit/5615d6a6dd3f904cc9e1a219bfaf7df8183ee765
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-09-27 (Mon, 27 Sep 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5`
So pick cost of `33`.
For store we have:
https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `10`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110541
Commit: ee5a050e2e548991f0369fa7ee29fb3e7aade071
https://github.com/llvm/llvm-project/commit/ee5a050e2e548991f0369fa7ee29fb3e7aade071
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-09-27 (Mon, 27 Sep 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5`
So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.)
For store we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `32`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110543
Commit: 2a7a768dad3a77571fae8506d84078fe4ce3d105
https://github.com/llvm/llvm-project/commit/2a7a768dad3a77571fae8506d84078fe4ce3d105
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-09-27 (Mon, 27 Sep 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.
For load we have:
https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59`
So pick cost of `150`.
For store we have:
https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `64`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110548
Compare: https://github.com/llvm/llvm-project/compare/18cf5b220d3f...2a7a768dad3a
More information about the All-commits
mailing list