[PATCH] D88819: [LV] Support for Remainder loop vectorization

Thu Oct 15 08:30:12 PDT 2020

bmahjour added a comment.

> This approach doesn't work when most of the trip counts are always good for original vector loop. In fact, it even performs one additional trip count check when both vector loop and epilog vector loops are executed. For example, if original VF=16 and UF=2, and epilog VF=8 and UF=1, trip count as small as 40 requires 3 trip count checks. Where as, it is 2 in the current implementation.

The relative cost of the extra trip count check is greater when the trip count is small enough to by-pass the vector code. Similarly the relative cost is lower (in proportion to the actual computation of the loop) when the vector code is executed. As a result it makes more sense to optimize the case where the cost of the extra trip count matters the most. If we take your example above and consider the case where the trip count is smaller than 8, then the number of trip count checks would be 1 (in the CFG I posted), compared to 2 in this patch.

> This optimization is disabled for -Osize. Redundant runtime check blocks can only be avoided when epilog vector loop trip count checks are done first. But it looks like code size vs performance trade-off.

Code size can also have an impact on performance. It's also much harder to model the cost of code-size increase than it is to model the cost of compare and branches (required for the trip count checks), so I'd strongly suggest we go with the alternative CFG which avoids generating the redundant runtime checks.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88819/new/

https://reviews.llvm.org/D88819