[llvm] [LV] Vectorize Epilogues for loops with small VF but high IC (PR #108190)

Julian Nagele via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 24 08:13:17 PDT 2024


juliannagele wrote:

Ah, I see, thanks for clarifying! I tried both options (max epilogue trip count and average, i.e, / 2 ) on some micro benchmarks, https://github.com/llvm/llvm-test-suite/pull/165, and it looks like avg is strictly worse, even resulting in some regressions (baseline is current without this PR, shows runtime so lower is better):
|    | name                                         |   Baseline |    MaxTC |    AvgTC | MaxTCvsBase   | AvgTCvsBase   | AvgTCvsMaxTC   |
|---:|:---------------------------------------------|-----------:|---------:|---------:|:----------------|:----------------|:-----------------|
|  0 | benchAutoVecForuint8_tForLoopTC65            |    2.33047 |  2.32827 |  2.32767 | -0.09%          | -0.12%          | -0.03%           |
|  1 | benchReductionAutoVecForuint8_tForLoopTC65   |    6.3755  |  6.37788 |  6.3816  | 0.04%           | 0.10%           | 0.06%            |
|  2 | benchAutoVecForuint8_tForLoopTC80            |    2.72384 |  2.72446 |  2.72267 | 0.02%           | -0.04%          | -0.07%           |
|  3 | benchReductionAutoVecForuint8_tForLoopTC80   |    7.4568  |  7.45126 |  8.78556 | -0.07%          | 17.82%          | 17.91%           |
|  4 | benchAutoVecForuint8_tForLoopTC96            |    3.72276 |  3.73023 |  3.72467 | 0.20%           | 0.05%           | -0.15%           |
|  5 | benchReductionAutoVecForuint8_tForLoopTC96   |    7.77124 |  7.75197 |  7.7557  | -0.25%          | -0.20%          | 0.05%            |
|  6 | benchAutoVecForuint8_tForLoopTC104           |    4.01091 |  4.01233 |  4.01352 | 0.04%           | 0.07%           | 0.03%            |
|  7 | benchReductionAutoVecForuint8_tForLoopTC104  |    8.8079  |  8.82872 |  9.31565 | 0.24%           | 5.76%           | 5.52%            |
|  8 | benchAutoVecForuint8_tForLoopTC127           |    7.45248 |  7.45135 |  7.45874 | -0.02%          | 0.08%           | 0.10%            |
|  9 | benchReductionAutoVecForuint8_tForLoopTC127  |   10.9005  | 10.9019  | 12.7518  | 0.01%           | 16.98%          | 16.97%           |
| 10 | benchAutoVecForuint16_tForLoopTC65           |    3.06875 |  3.07387 |  3.07374 | 0.17%           | 0.16%           | -0.00%           |
| 11 | benchReductionAutoVecForuint16_tForLoopTC65  |    5.83245 |  5.87596 |  5.88597 | 0.75%           | 0.92%           | 0.17%            |
| 12 | benchAutoVecForuint16_tForLoopTC80           |    7.16715 |  4.01275 |  4.0102  | -44.01%         | -44.05%         | -0.06%           |
| 13 | benchReductionAutoVecForuint16_tForLoopTC80  |    9.74116 |  7.16212 |  7.16729 | -26.48%         | -26.42%         | 0.07%            |
| 14 | benchAutoVecForuint16_tForLoopTC96           |    3.68478 |  3.68489 |  3.67922 | 0.00%           | -0.15%          | -0.15%           |
| 15 | benchReductionAutoVecForuint16_tForLoopTC96  |    7.16122 |  7.21163 |  7.21582 | 0.70%           | 0.76%           | 0.06%            |
| 16 | benchAutoVecForuint16_tForLoopTC104          |    5.95103 |  4.38464 |  4.38048 | -26.32%         | -26.39%         | -0.09%           |
| 17 | benchReductionAutoVecForuint16_tForLoopTC104 |    9.17912 |  8.31162 |  8.30985 | -9.45%          | -9.47%          | -0.02%           |
| 18 | benchAutoVecForuint16_tForLoopTC127          |   12.5022  |  7.43596 |  7.43987 | -40.52%         | -40.49%         | 0.05%            |
| 19 | benchReductionAutoVecForuint16_tForLoopTC127 |   15.7736  | 10.0411  | 10.0305  | -36.34%         | -36.41%         | -0.11%           |
| 20 | benchAutoVecForuint32_tForLoopTC65           |    4.73704 |  4.70811 |  4.71278 | -0.61%          | -0.51%          | 0.10%            |
| 21 | benchReductionAutoVecForuint32_tForLoopTC65  |    3.70915 |  3.7259  |  3.72295 | 0.45%           | 0.37%           | -0.08%           |
| 22 | benchAutoVecForuint32_tForLoopTC80           |    5.35488 |  5.3557  |  5.35402 | 0.02%           | -0.02%          | -0.03%           |
| 23 | benchReductionAutoVecForuint32_tForLoopTC80  |    4.14778 |  4.15393 |  4.15491 | 0.15%           | 0.17%           | 0.02%            |
| 24 | benchAutoVecForuint32_tForLoopTC96           |    6.16981 |  6.17033 |  6.17167 | 0.01%           | 0.03%           | 0.02%            |
| 25 | benchReductionAutoVecForuint32_tForLoopTC96  |    4.68322 |  4.69684 |  4.69914 | 0.29%           | 0.34%           | 0.05%            |
| 26 | benchAutoVecForuint32_tForLoopTC104          |    8.49511 |  7.0049  |  7.40087 | -17.54%         | -12.88%         | 5.65%            |
| 27 | benchReductionAutoVecForuint32_tForLoopTC104 |    6.69038 |  5.72978 |  6.0188  | -14.36%         | -10.04%         | 5.04%            |
| 28 | benchAutoVecForuint32_tForLoopTC127          |   11.3578  |  8.94057 |  9.2603  | -21.28%         | -18.47%         | 3.58%            |
| 29 | benchReductionAutoVecForuint32_tForLoopTC127 |    9.17041 |  7.44843 |  7.45032 | -18.78%         | -18.76%         | 0.03%            |

https://github.com/llvm/llvm-project/pull/108190


More information about the llvm-commits mailing list