[PATCH] D42447: [LV][VPlan] Detect outer loops for explicit vectorization.

Wed Mar 28 09:14:10 PDT 2018

fhahn added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:8642
+
+     // TODO: It will be set to true when we introduce codegen support.
+     return false;
----------------
rengolin wrote:
> hsaito wrote:
> > dcaballe wrote:
> > > rengolin wrote:
> > > > I'm really uncomfortable with all these temporary code blocks that don't do anything...
> > > > 
> > > > They're really just hijacking the existing infrastructure without implementing as a VPlan.
> > > > 
> > > > I really thought the whole point of VPlans was that you wouldn't need to hack-it-up like we used to in the old vectoriser...
> > > This is the entrance to the VPlan-native vectorization path. It's not doing anything yet because we are trying to follow an incremental approach by releasing relatively small patches that are easy to digest. This code will be functional (generating vector code) soon.
> > > 
> > > The code block is temporary as long as both vectorization paths co-exist but the final goal is to converge into a single one. This approach will allow us to incrementally and easily extend all the current inner loop vectorization functionality to support outer loops and, most importantly, doing so without destabilizing inner loop vectorization. We are really concerned about the latter and we think that this approach is a reasonable trade-off between safety and temporary code blocks.
> > > 
> > > If you want to discuss this further, I would recommend to move the discussion to the RFC thread so that everybody is aware of it: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html
> > > 
> > I'm working on the "converge into a singe one" side. At this point, I'm taking care of the ground work of moving the right things to the right places such that I don't have to include those "almost NFC" things as part of "expand VPlan's participation into innermost loop vectorization". Thank you for helping me do that with your reviews. We need to be able to build VPlan for the innermost loop vectorization right after Legal, for example, before we can remove the diverged code path at the beginning. In the meantime, the outer loop vectorization patch series will help people realize how much common things are there between innermost loop vectorization and outer loop vectorization, and more importantly, help people think how to write code that can work in both ways. 
> > That's as much as I want to write about the approach we are taking, within this patch review. The rest of the discussions should happen on the above mentioned RFC. Thanks.
> Ok, as above, just remove the comments and add a two-line comment summarising it.
I am also slightly worried that people will come along and see this code and think that cost modelling and planning already works for outer loops, as it is used in the VPlan native path. But I think the comment makes it clear now.

I am not sure if it would be clearer/nicer to have clearer separation by having the code in separate functions rather than adding even more code to those already huge functions.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1661
+  bool canVectorizeLoopNestCFG(Loop *Lp);
+  /// Helper function to canVectorizeLoopNestCFG that returns true if the
+  /// pre-header, exiting and latch blocks of \p Lp (non-recursive) are
----------------
Maybe add a newline to separate the 2 functions. Not sure if calling it out as helper function is necessary. In a way, most functions here are helper functions :)

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5119
+// related to the loop latch because they don't affect the loop uniformity.
+static bool isUniformLoop(Loop *Lp, Loop *OuterLp) {
+  assert(Lp->getLoopLatch() && "Expected loop with a single latch.");
----------------
Work done here is potentially done multiple times for each loop, right? E.g. for deep loop nests, this will be called multiple times for the same Lp, but with different outer loops. 

Only a few checks here depend on the outer loop and I think ideally we would not check the same things again and again. For now those redundant checks are quite simple, but I think we should keep that issue in mind once we introduce more complex checks.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5128
+  // 1.
+  PHINode *IV = Lp->getCanonicalInductionVariable();
+  if (!IV) {
----------------
I think the use of getCanonicalInductionVariable is discouraged. I think it would be better to detect induction variables using SCEV, as done LoopVectorizeLegality.

================
Comment at: test/Transforms/LoopVectorize/explicit_outer_detection.ll:222
+
+attributes #0 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
----------------
attributes not needed here and in the tests below, as no cost modelling is done so far.

https://reviews.llvm.org/D42447