[PATCH] D89566: [LV] Epilogue Vectorization with Optimal Control Flow

Tue Oct 27 13:29:44 PDT 2020

bmahjour added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:175
+
+static cl::opt<bool> EnableEpilogueVectorization(
+    "enable-epilogue-vectorization", cl::init(false), cl::Hidden,
----------------
xbolva00 wrote:
> mivnay wrote:
> > Why not enable it by default?
> +1
The reason I'm reluctant to enable it by default is because the heuristic for choosing when to vectorize the epilogue (and by what VF) is quite simplistic and tuned to a specific benchmark, without consideration for the cost of extra branches, code size increase, extra register spills, etc. 
My thinking is we can move towards the goal of enabling it by default in smaller steps, by first implementing the transformation (this patch), then improving the cost-model along with performance tuning (future work). What do you think?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:907
+  ElementCount EpilogueVF = ElementCount::getFixed(0);
+  unsigned EpilogueUF = 0;
+  BasicBlock *MainLoopIterationCountCheck = nullptr;
----------------
mivnay wrote:
> High UF might negate the benefit of EpilogVectorization. I think keeping `UF = 1` is a good idea unless there are multiple levels of epilog loop vectorization.
I think so too, but I also thought it would be a good thing to make the UF configurable in case the need arises in the future (eg. with increasing vector widths and scalable vector types). In this patch, the only EpilogueLoopVectorizationInfo object is created with an epilogue UF of 1.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7320
+template <typename ILVType>
+void LoopVectorizationPlanner::executePlanForEpilogueVectorization(
+    ILVType &ILV, DominatorTree *DT) {
----------------
mivnay wrote:
> This function and other core functions like `createEpilogueVectorizedLoopSkeleton`, `createInductionResumeValues`, etc contains lot of redundant code. Can it be improved?
Right, there is a bit of code duplication in exchange for decoupling and separation of concerns. There is very little code reuse opportunity from `createEpilogueVectorizedLoopSkeleton()`, as it already makes calls to common code like `emitMinimumIterationCountCheck`, `createInductionVariable`, `completeLoopSkeleton`, etc. and the rest of the code is inherently different between the `IMLAEMainLoop` and `IMLAEEpilogueLoop`.

I agree this function can be avoided and I can think of a way to reuse code in `createInductionResumeValues`. I'll post an update soon.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89566/new/

https://reviews.llvm.org/D89566