[PATCH] D118566: [LoopVectorizer] Don't perform interleaving of predicated scalar loops

Tue Feb 1 00:23:33 PST 2022

dmgreen added a comment.

In D118566#3283274 <https://reviews.llvm.org/D118566#3283274>, @sdesmalen wrote:

>> This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial
>
> Can you expand a little bit on why this becomes a bit of a mess? The original scalar loop has control-flow for predication as well, so I guess interleaving would just duplicate such control flow for the second scalar iteration. Is the code generated by the LV less efficient or are we missing any folds/simplification? Or is there a fundamental reason this can never be an improvement? I could imagine a scenario where most of the loop body would benefit from interleaving, but one statement in the loop doesn't because of predication, it would still be beneficial to interleave.

It is less efficient in all the benchmarks I've ran. It won't come up very often - we usually either choose to vectorize or won't choose to interleave. Interleaving is generally only done for smallish loops. When the vectorizer is forced to make serialized predicate blocks (and possibly add scev checks, as in the testcase) - it's hard to see how the code could be so much better than it is now. The patch gives a 50-60% improvement in the places it helps.

Whatever happens, it is best to leave it for the unroller to unroll with its own profitability heuristics (which in this case, it likely will not).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118566/new/

https://reviews.llvm.org/D118566