[all-commits] [llvm/llvm-project] 396b95: [X86][Costmodel] Load/store i8 Stride=6 VF=2 inter...
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Sun Oct 3 13:43:01 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 396b95e5c9ede161b3634f7c8046188b7da8f387
https://github.com/llvm/llvm-project/commit/396b95e5c9ede161b3634f7c8046188b7da8f387
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.
For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111008
Commit: 6fe4cce55816863bbb2ca9628d103dfa2d431616
https://github.com/llvm/llvm-project/commit/6fe4cce55816863bbb2ca9628d103dfa2d431616
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.
For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111010
Commit: 0b27f9c0886fcd052b4b0194c6d41376787213d4
https://github.com/llvm/llvm-project/commit/0b27f9c0886fcd052b4b0194c6d41376787213d4
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.
For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111011
Commit: bd5ba437fd8fb42d068876c5d070c7a72ca17643
https://github.com/llvm/llvm-project/commit/bd5ba437fd8fb42d068876c5d070c7a72ca17643
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.
For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111012
Commit: a5e5883ef515abe6fc5e8565f11b1c49bb33c2e3
https://github.com/llvm/llvm-project/commit/a5e5883ef515abe6fc5e8565f11b1c49bb33c2e3
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll
Log Message:
-----------
[X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.
For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111013
Commit: 8e8fb77aa40c287067306df7ff2416122b31e33b
https://github.com/llvm/llvm-project/commit/8e8fb77aa40c287067306df7ff2416122b31e33b
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.
For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111014
Commit: 04f1469cb4caeedaabc3ab0f9ae00a8576f774eb
https://github.com/llvm/llvm-project/commit/04f1469cb4caeedaabc3ab0f9ae00a8576f774eb
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.
For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111015
Commit: 72f8a9244a64387d83a313607f94509cd2fd5fd2
https://github.com/llvm/llvm-project/commit/72f8a9244a64387d83a313607f94509cd2fd5fd2
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.
For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111016
Commit: 3cbc0a07f92b4a630a1c03a6587d52f206ec8248
https://github.com/llvm/llvm-project/commit/3cbc0a07f92b4a630a1c03a6587d52f206ec8248
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.
For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111017
Commit: 67f1ee2e38e83af34b58e3873bd4ba6dec7f5c50
https://github.com/llvm/llvm-project/commit/67f1ee2e38e83af34b58e3873bd4ba6dec7f5c50
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-03 (Sun, 03 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.
For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111018
Compare: https://github.com/llvm/llvm-project/compare/a944f801cacd...67f1ee2e38e8
More information about the All-commits
mailing list