[all-commits] [llvm/llvm-project] 644a96: [LV] Vectorize cases with larger number of RT chec...
Florian Hahn via All-commits
all-commits at lists.llvm.org
Mon Jul 4 07:12:10 PDT 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 644a965c1efef68f22d9495e4cefbb599c214788
https://github.com/llvm/llvm-project/commit/644a965c1efef68f22d9495e4cefbb599c214788
Author: Florian Hahn <flo at fhahn.com>
Date: 2022-07-04 (Mon, 04 Jul 2022)
Changed paths:
M llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
M llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
M llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
M llvm/test/Transforms/LoopVectorize/AArch64/runtime-check-size-based-threshold.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
M llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
M llvm/test/Transforms/LoopVectorize/X86/pointer-runtime-checks-unprofitable.ll
M llvm/test/Transforms/LoopVectorize/X86/pr23997.ll
M llvm/test/Transforms/LoopVectorize/X86/pr35432.ll
M llvm/test/Transforms/LoopVectorize/X86/pr54634.ll
M llvm/test/Transforms/LoopVectorize/X86/runtime-limit.ll
Log Message:
-----------
[LV] Vectorize cases with larger number of RT checks, execute only if profitable.
This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approach.
The new approach allows vectorization with a larger number of runtime
checks in general, but only executes the vector loop (and runtime checks) if
considered profitable at runtime. Profitable here means that the cost-model
indicates that the runtime check cost + vector loop cost < scalar loop cost.
To do that, LV computes the minimum trip count for which runtime check cost
+ vector-loop-cost < scalar loop cost.
Note that there is still a hard cut-off to avoid excessive compile-time/code-size
increases, but it is much larger than the original limit.
The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource
is mostly neutral, but the new approach can give substantial gains in cases where
we failed to vectorize before due to the over-aggressive cut-offs.
On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%)
and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.
```
CFP2006/447.dealII/447.dealII -1.9%
CINT2017rate/525.x264_r/525.x264_r -2.2%
ASC_Sequoia/IRSmk/IRSmk -9.2%
Rodinia/srad/srad -36.1%
```
`size` regressions on AArch64 with -O3 are
```
MultiSource/Applications/hbd/hbd 90256.00 106768.00 18.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000 240676.00 257268.00 6.9%
MultiSourc...enchmarks/mafft/pairlocalalign 472603.00 489131.00 3.5%
External/S...2017rate/525.x264_r/525.x264_r 613831.00 630343.00 2.7%
External/S...NT2006/464.h264ref/464.h264ref 818920.00 835448.00 2.0%
External/S...te/538.imagick_r/538.imagick_r 1994730.00 2027754.00 1.7%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 1236471.00 1253015.00 1.3%
MultiSource/Applications/oggenc/oggenc 2108147.00 2124675.00 0.8%
External/S.../CFP2006/447.dealII/447.dealII 4742999.00 4759559.00 0.3%
External/S...rate/510.parest_r/510.parest_r 14206377.00 14239433.00 0.2%
```
Reviewed By: lebedev.ri, ebrevnov, dmgreen
Differential Revision: https://reviews.llvm.org/D109368
More information about the All-commits
mailing list