[PATCH] D97357: Inductive unrolling pass

Thu Apr 8 08:40:21 PDT 2021

reames added a comment.

Wrapping my head around this, just sharing some initial thoughts.  Don't take anything here to be stronger than me thinking aloud.

The sub-piece of this which doesn't require a prologue loop looks like a natural extension to the existing loop unrolling.  I think - but you should experimentally confirm - that existing runtime unrolling + indvars should prune all but one of the exits when the exiting lane is statically known.  The missing piece would be to update the cost model.  I'd had another idea for a change in the same area, you could look at my https://reviews.llvm.org/D91481 for ideas on how to possibly structure that.

That same sub-piece is also interesting in the loop vectorizer.  If we know the exiting lane is zero, and that no control dependent potentially faulting operations are above said exit, we can widen the exit condition (as a simple scale check of lane 0) and vectorize the loop (if other exits meet the current conditions.)

Continuing to think about the vectorizer, a generalization of that may also be useful in that we can cheaply form a predicate mask for the whole loop.  If we know lane X exits (and said exit dominates latch), then we can widen loads before said exit with a predicate mask which disables all lanes X+1 onward.  We can do the same thing for multiple analyzeable loops (though we instead fold together the exit conditions), but this is interesting in allowing one unanalyzeable exit in the set.

Out of the above, the unrolling support seems easy to motivate, and not terrible invasive.  The case of trip count being known TC mod UF == C for some C doesn't seem uncommon, and the costing checks should be cheap,

The more general transformation - which uses a prologue loop to ensure C == 0 for otherwise arbitrary TC - seems trickier.  I really like the power, but we don't have other cases where the unroller needs a preload right now.  (I don't think?)  If we could find other cases which motivate a preloop and the costing thereof, having this in the unroller (as an alternative to peeling) seems reasonable.  There's a bunch of code structure pieces - we really need this well abstracted, generic, and tested - but we can return to the details later.

I'm also hesitant about how to handle the case of multiple exits which have computable exit lanes C, and C2, but where C != C2.  (E.g. exit 0 might exit on lane 1, exit 1 might exit on lane 2).  Note that C1 and C2 in this case aren't known until runtime, so we can't tell exit 2 is dead.  Figuring out something for the multiple exit case seems worthwhile, and needs more thought.

It seems like we might be able to adjust the trip count of the preloop to the minimum of the required rotation for the two exits, but that needs a bit more thought to confirm there's not some cornercase there.

Again, interesting, but a few additional moving parts.  I'd probably start with the first sub-case costing changes in the unroller, get that worked through and enabled, then return to the second case.

Mostly for my own sanity later, I believe the costing code for this should end up as simple as:
auto *EC = SE->getExitCount(L, ExitingBlock);
if (!isa<SCEVUnknown>(EC)) {

  // get TC from EC
  auto *Rem = SE->getURemExpr(TC, UF as SCEVConstant);
  if (isa<SCEVConstant>(Rem)) {
    // discount cost of exit by (UF-1)/UF
  }

}

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97357/new/

https://reviews.llvm.org/D97357