[PATCH] D88819: [LV] Support for Remainder loop vectorization

Thu Oct 15 09:39:54 PDT 2020

fhahn added a comment.

In D88819#2332490 <https://reviews.llvm.org/D88819#2332490>, @bmahjour wrote:

> I also agree with @mivnay's summary above and the general approach of just running ILV again on the remainder loop with the available vplan. If we mark the epilogue loop and put it back in the worklist, it'll be harder/uglier to then modify the CFG to make it more optimal. I do, however, think that the implementation can be improved (see my note below). Please also note that the SCEV and runtime checks **cannot **be avoided by marking them "noalias" (or similar tricks) because if the iteration count of the loop is small enough to by-pass the main vector loop and large enough to execute the vector epilogue, then the runtime checks need to be executed for the epilogue loop. The only way to avoid the redundant runtime checks is to generate the smaller trip count check first, as illustrated in the CFG I've posted above.

Right, the approach I suggested should work for the case where we only execute the epilogue if we also execute the main vector loop (currently the runtime checks are independent of the VF AFAIK, and the SCEV checks as well (less sure), but not the minimum iteration check).

But setting things up as in the suggested CFG is going to be a  bit more tricky and might not turn out to be much simpler in the end. I might give it a try to see if it's feasible.

>> This optimization is disabled for -Osize. Redundant runtime check blocks can only be avoided when epilog vector loop trip count checks are done first. But it looks like code size vs performance trade-off.
>
> Code size can also have an impact on performance. It's also much harder to model the cost of code-size increase than it is to model the cost of compare and branches (required for the trip count checks), so I'd strongly suggest we go with the alternative CFG which avoids generating the redundant runtime checks.

If we have implementations for both, we could just evaluate which one's better on a large set of benchmarks?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88819/new/

https://reviews.llvm.org/D88819