[llvm] [LV] Vectorize Epilogues for loops with small VF but high IC (PR #108190)
Julian Nagele via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 24 08:13:17 PDT 2024
juliannagele wrote:
Ah, I see, thanks for clarifying! I tried both options (max epilogue trip count and average, i.e, / 2 ) on some micro benchmarks, https://github.com/llvm/llvm-test-suite/pull/165, and it looks like avg is strictly worse, even resulting in some regressions (baseline is current without this PR, shows runtime so lower is better):
| | name | Baseline | MaxTC | AvgTC | MaxTCvsBase | AvgTCvsBase | AvgTCvsMaxTC |
|---:|:---------------------------------------------|-----------:|---------:|---------:|:----------------|:----------------|:-----------------|
| 0 | benchAutoVecForuint8_tForLoopTC65 | 2.33047 | 2.32827 | 2.32767 | -0.09% | -0.12% | -0.03% |
| 1 | benchReductionAutoVecForuint8_tForLoopTC65 | 6.3755 | 6.37788 | 6.3816 | 0.04% | 0.10% | 0.06% |
| 2 | benchAutoVecForuint8_tForLoopTC80 | 2.72384 | 2.72446 | 2.72267 | 0.02% | -0.04% | -0.07% |
| 3 | benchReductionAutoVecForuint8_tForLoopTC80 | 7.4568 | 7.45126 | 8.78556 | -0.07% | 17.82% | 17.91% |
| 4 | benchAutoVecForuint8_tForLoopTC96 | 3.72276 | 3.73023 | 3.72467 | 0.20% | 0.05% | -0.15% |
| 5 | benchReductionAutoVecForuint8_tForLoopTC96 | 7.77124 | 7.75197 | 7.7557 | -0.25% | -0.20% | 0.05% |
| 6 | benchAutoVecForuint8_tForLoopTC104 | 4.01091 | 4.01233 | 4.01352 | 0.04% | 0.07% | 0.03% |
| 7 | benchReductionAutoVecForuint8_tForLoopTC104 | 8.8079 | 8.82872 | 9.31565 | 0.24% | 5.76% | 5.52% |
| 8 | benchAutoVecForuint8_tForLoopTC127 | 7.45248 | 7.45135 | 7.45874 | -0.02% | 0.08% | 0.10% |
| 9 | benchReductionAutoVecForuint8_tForLoopTC127 | 10.9005 | 10.9019 | 12.7518 | 0.01% | 16.98% | 16.97% |
| 10 | benchAutoVecForuint16_tForLoopTC65 | 3.06875 | 3.07387 | 3.07374 | 0.17% | 0.16% | -0.00% |
| 11 | benchReductionAutoVecForuint16_tForLoopTC65 | 5.83245 | 5.87596 | 5.88597 | 0.75% | 0.92% | 0.17% |
| 12 | benchAutoVecForuint16_tForLoopTC80 | 7.16715 | 4.01275 | 4.0102 | -44.01% | -44.05% | -0.06% |
| 13 | benchReductionAutoVecForuint16_tForLoopTC80 | 9.74116 | 7.16212 | 7.16729 | -26.48% | -26.42% | 0.07% |
| 14 | benchAutoVecForuint16_tForLoopTC96 | 3.68478 | 3.68489 | 3.67922 | 0.00% | -0.15% | -0.15% |
| 15 | benchReductionAutoVecForuint16_tForLoopTC96 | 7.16122 | 7.21163 | 7.21582 | 0.70% | 0.76% | 0.06% |
| 16 | benchAutoVecForuint16_tForLoopTC104 | 5.95103 | 4.38464 | 4.38048 | -26.32% | -26.39% | -0.09% |
| 17 | benchReductionAutoVecForuint16_tForLoopTC104 | 9.17912 | 8.31162 | 8.30985 | -9.45% | -9.47% | -0.02% |
| 18 | benchAutoVecForuint16_tForLoopTC127 | 12.5022 | 7.43596 | 7.43987 | -40.52% | -40.49% | 0.05% |
| 19 | benchReductionAutoVecForuint16_tForLoopTC127 | 15.7736 | 10.0411 | 10.0305 | -36.34% | -36.41% | -0.11% |
| 20 | benchAutoVecForuint32_tForLoopTC65 | 4.73704 | 4.70811 | 4.71278 | -0.61% | -0.51% | 0.10% |
| 21 | benchReductionAutoVecForuint32_tForLoopTC65 | 3.70915 | 3.7259 | 3.72295 | 0.45% | 0.37% | -0.08% |
| 22 | benchAutoVecForuint32_tForLoopTC80 | 5.35488 | 5.3557 | 5.35402 | 0.02% | -0.02% | -0.03% |
| 23 | benchReductionAutoVecForuint32_tForLoopTC80 | 4.14778 | 4.15393 | 4.15491 | 0.15% | 0.17% | 0.02% |
| 24 | benchAutoVecForuint32_tForLoopTC96 | 6.16981 | 6.17033 | 6.17167 | 0.01% | 0.03% | 0.02% |
| 25 | benchReductionAutoVecForuint32_tForLoopTC96 | 4.68322 | 4.69684 | 4.69914 | 0.29% | 0.34% | 0.05% |
| 26 | benchAutoVecForuint32_tForLoopTC104 | 8.49511 | 7.0049 | 7.40087 | -17.54% | -12.88% | 5.65% |
| 27 | benchReductionAutoVecForuint32_tForLoopTC104 | 6.69038 | 5.72978 | 6.0188 | -14.36% | -10.04% | 5.04% |
| 28 | benchAutoVecForuint32_tForLoopTC127 | 11.3578 | 8.94057 | 9.2603 | -21.28% | -18.47% | 3.58% |
| 29 | benchReductionAutoVecForuint32_tForLoopTC127 | 9.17041 | 7.44843 | 7.45032 | -18.78% | -18.76% | 0.03% |
https://github.com/llvm/llvm-project/pull/108190
More information about the llvm-commits
mailing list