[all-commits] [llvm/llvm-project] f44d90: [X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 ...
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Fri Oct 1 07:49:16 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: f44d9009c25827dd9fad5bfa240f6e59335d07b8
https://github.com/llvm/llvm-project/commit/f44d9009c25827dd9fad5bfa240f6e59335d07b8
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
M llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/4rY96hnGT - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.
For store we have:
https://godbolt.org/z/vbo37Y3r9 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: =0.5`
So pick cost of `1`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110753
Commit: b12aeaec9aca28cbd23587dda6a3126ab0aaf1c0
https://github.com/llvm/llvm-project/commit/b12aeaec9aca28cbd23587dda6a3126ab0aaf1c0
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
M llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.
For store we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110754
Commit: 3a0643e9c2252290a9f29c2b3ceb696033af4903
https://github.com/llvm/llvm-project/commit/3a0643e9c2252290a9f29c2b3ceb696033af4903
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
M llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
For store we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110755
Commit: 80cd8da78d027f59b54586887af4bb9c3b36a6ba
https://github.com/llvm/llvm-project/commit/80cd8da78d027f59b54586887af4bb9c3b36a6ba
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.
For store we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So pick cost of `8`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110756
Commit: ea76cb87ee4022d8663a7c25943478fe3f64e21a
https://github.com/llvm/llvm-project/commit/ea76cb87ee4022d8663a7c25943478fe3f64e21a
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32/f32 Stride=2 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
Here for `store` pattern we are starting to have spilling,
so accurate modelling may be problematic,
although if i drop the spilling, the measurements don't change.
For load we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.
For store we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: =8.0`
So pick cost of `16`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110761
Commit: 612e5b05a281b867383f52e457781d1b5ba76c2d
https://github.com/llvm/llvm-project/commit/612e5b05a281b867383f52e457781d1b5ba76c2d
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/8a1cfGeMn - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.
For store we have:
https://godbolt.org/z/jMdcM47bx - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110835
Commit: 71bc31b907193c294f718046ed8ef569e3d4b9fa
https://github.com/llvm/llvm-project/commit/71bc31b907193c294f718046ed8ef569e3d4b9fa
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
R llvm/test/Analysis/CostModel/X86/interleaved-load-store-double.ll
R llvm/test/Analysis/CostModel/X86/interleaved-load-store-i64.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.
For store we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `4`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110837
Commit: abd37de63ee97330f9397c4468802498b6101360
https://github.com/llvm/llvm-project/commit/abd37de63ee97330f9397c4468802498b6101360
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.
For store we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `8`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110838
Commit: 3e260efdfc6064481396a0c3ade703a739023c77
https://github.com/llvm/llvm-project/commit/3e260efdfc6064481396a0c3ade703a739023c77
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-01 (Fri, 01 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64/f64 Stride=2 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.
For store we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `16`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110840
Compare: https://github.com/llvm/llvm-project/compare/4f0a39b9b4ba...3e260efdfc60
More information about the All-commits
mailing list