[PATCH] D89566: [LV] Epilogue Vectorization with Optimal Control Flow

Wed Nov 25 13:34:06 PST 2020

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1024
+  /// *epilogue loop* strategy (ie the second pass of vplan execution).
+  BasicBlock *createEpilogueVectorizedLoopSkeleton() final override;
+
----------------
bmahjour wrote:
> fhahn wrote:
> > IIUC we specialize LV so we can override just `createEpilogueVectorizedLoopSkeleton` and `emitMinimumVectorEpilogueIterCountCheck` (which is only used by `createEpilogueVectorizedLoopSkeleton`.
> > 
> > Could we do a more targeted specialization, e.g. by extracting the skeleton creation code into something like `LoopSekletonCreator` and have a regular and epilogue version of that? Or combine it with `EpilogueLoopVectorizationInfo`?
> Currently only the skeleton code is specialized, but future enhancements will likely necessitate more specialization (eg to support widened induction vars and live-out phis). We could create a `LoopSkeletonCreator`, and modify the `LoopVectorizationPlanner` interfaces to work with it, but I think extending from the `InnerLoopVectorizer` fits better with the current design. Note that `InnerLoopUnroller` also takes a similar approach and extends `InnerLoopVectorizer` while only overriding a couple of functions.
> Currently only the skeleton code is specialized, but future enhancements will likely necessitate more specialization (eg to support widened induction vars and live-out phis). 

I thought a bit on how to support those cases. IIUC the problem here is mostly setting up the right incoming values for the PHIs in the epilogue loop. Given that this is related to codegen, I *think* we should be able to represent all the required info in the VPlan, i.e. we should be able to adjust the VPlan of the epilogue vector loop to use the correct start values naturally.

> We could create a LoopSkeletonCreator, and modify the LoopVectorizationPlanner interfaces to work with it, but I think extending from the InnerLoopVectorizer fits better with the current design. Note that InnerLoopUnroller also takes a similar approach and extends InnerLoopVectorizer while only overriding a couple of functions.

My concern here is that this approach invites more modifications in the sub-classes, instead of modeling the required information directly in VPlan. I think this will lead to more work down the road, as we move more pieces into VPlan. I don't think `InnerLoopUnroller` is an ideal example here, because IIRC it was created before VPlan. I think working on modeling the incoming values in VPlan would be better aligned with the long term goals and lead to a more modular solution overall.

To illustrate this approach, I went ahead and tried to sketch support for widened induction using VPlan: D92132. It is a bit rough around the edges, but should give an idea of the required pieces. I'm working on a similar patch for reductions as well and  we should be able to handle first-order recurrences in a similar fashion.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89566/new/

https://reviews.llvm.org/D89566