[PATCH] D118566: [LoopVectorizer] Don't perform interleaving of predicated scalar loops

Wed Feb 2 08:02:45 PST 2022

sdesmalen accepted this revision.
sdesmalen added a comment.
This revision is now accepted and ready to land.

In D118566#3286623 <https://reviews.llvm.org/D118566#3286623>, @dmgreen wrote:

> In D118566#3283274 <https://reviews.llvm.org/D118566#3283274>, @sdesmalen wrote:
>
>>> This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial
>>
>> Can you expand a little bit on why this becomes a bit of a mess? The original scalar loop has control-flow for predication as well, so I guess interleaving would just duplicate such control flow for the second scalar iteration. Is the code generated by the LV less efficient or are we missing any folds/simplification? Or is there a fundamental reason this can never be an improvement? I could imagine a scenario where most of the loop body would benefit from interleaving, but one statement in the loop doesn't because of predication, it would still be beneficial to interleave.
>
> It is less efficient in all the benchmarks I've ran. It won't come up very often - we usually either choose to vectorize or won't choose to interleave. Interleaving is generally only done for smallish loops. When the vectorizer is forced to make serialized predicate blocks (and possibly add scev checks, as in the testcase) - it's hard to see how the code could be so much better than it is now. The patch gives a 50-60% improvement in the places it helps.
>
> Whatever happens, it is best to leave it for the unroller to unroll with its own profitability heuristics (which in this case, it likely will not).

Okay, I think I see what you mean now. Because the block is predicated, the LV will try to predicate each //operation// that needs predication (e.g. every load/store), so that we end up executing NumPredicatedOps * UF branch instructions.

I guess the SCEVChecks would be something we could add to the cost-model in a separate patch, e.g. if the LV has to generate SCEVChecks at all don't bother setting UF>1 iff VF=1.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118566/new/

https://reviews.llvm.org/D118566