[PATCH] D42447: [LV][VPlan] Detect outer loops for explicit vectorization.

Fri Apr 13 05:21:18 PDT 2018

dcaballe added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:8642
+
+     // TODO: It will be set to true when we introduce codegen support.
+     return false;
----------------
fhahn wrote:
> dcaballe wrote:
> > fhahn wrote:
> > > rengolin wrote:
> > > > hsaito wrote:
> > > > > dcaballe wrote:
> > > > > > rengolin wrote:
> > > > > > > I'm really uncomfortable with all these temporary code blocks that don't do anything...
> > > > > > > 
> > > > > > > They're really just hijacking the existing infrastructure without implementing as a VPlan.
> > > > > > > 
> > > > > > > I really thought the whole point of VPlans was that you wouldn't need to hack-it-up like we used to in the old vectoriser...
> > > > > > This is the entrance to the VPlan-native vectorization path. It's not doing anything yet because we are trying to follow an incremental approach by releasing relatively small patches that are easy to digest. This code will be functional (generating vector code) soon.
> > > > > > 
> > > > > > The code block is temporary as long as both vectorization paths co-exist but the final goal is to converge into a single one. This approach will allow us to incrementally and easily extend all the current inner loop vectorization functionality to support outer loops and, most importantly, doing so without destabilizing inner loop vectorization. We are really concerned about the latter and we think that this approach is a reasonable trade-off between safety and temporary code blocks.
> > > > > > 
> > > > > > If you want to discuss this further, I would recommend to move the discussion to the RFC thread so that everybody is aware of it: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html
> > > > > > 
> > > > > I'm working on the "converge into a singe one" side. At this point, I'm taking care of the ground work of moving the right things to the right places such that I don't have to include those "almost NFC" things as part of "expand VPlan's participation into innermost loop vectorization". Thank you for helping me do that with your reviews. We need to be able to build VPlan for the innermost loop vectorization right after Legal, for example, before we can remove the diverged code path at the beginning. In the meantime, the outer loop vectorization patch series will help people realize how much common things are there between innermost loop vectorization and outer loop vectorization, and more importantly, help people think how to write code that can work in both ways. 
> > > > > That's as much as I want to write about the approach we are taking, within this patch review. The rest of the discussions should happen on the above mentioned RFC. Thanks.
> > > > Ok, as above, just remove the comments and add a two-line comment summarising it.
> > > I am also slightly worried that people will come along and see this code and think that cost modelling and planning already works for outer loops, as it is used in the VPlan native path. But I think the comment makes it clear now.
> > > 
> > > I am not sure if it would be clearer/nicer to have clearer separation by having the code in separate functions rather than adding even more code to those already huge functions.
> > > I am not sure if it would be clearer/nicer to have clearer separation by having the code in separate functions rather than adding even more code to those already huge functions.
> > 
> > Agreed, I could move these code to a separate function.
> > 
> > >rather than adding even more code to those already huge functions.
> > 
> > Are you talking about only this function or also some other ones?
> Mostly this function and LoopVectorizationPlanner::plan. Otherwise it is already nicely separated.
Ok, thanks!. I doubted if the new 'processLoopInVPlanNativePath' should be a member function of LoopVectorizerPass (same as 'processLoop') and avoid passing most of the parameters. I decided not to do it just not to modify the public header file. Please, let me know what you think. 

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5119
+// related to the loop latch because they don't affect the loop uniformity.
+static bool isUniformLoop(Loop *Lp, Loop *OuterLp) {
+  assert(Lp->getLoopLatch() && "Expected loop with a single latch.");
----------------
fhahn wrote:
> dcaballe wrote:
> > fhahn wrote:
> > > Work done here is potentially done multiple times for each loop, right? E.g. for deep loop nests, this will be called multiple times for the same Lp, but with different outer loops. 
> > > 
> > > Only a few checks here depend on the outer loop and I think ideally we would not check the same things again and again. For now those redundant checks are quite simple, but I think we should keep that issue in mind once we introduce more complex checks.
> > Good point. `OuterLp` will be fixed, at least in the short term while we only support explicit vectorization. Given that we are introducing support for divergent inner loops in the patch series #4, it's more likely that we don't need this function (or at least this function as is) before we introduce the engine to evaluate different outer loops.
> > 
> > In any case, the proper inner loop uniformity check will depend on the outer loop we are vectorizing. Some of these "extra" checks are very specific for the patch series #1, where the supported loops are very limited. They will be progressively removed, leaving only the `OuterLp` dependent checks.
> > 
> > For these reasons, IMO, it makes sense to keep all these checks and the documentation together. I think it's easier to understand which inner loops are currently supported. I could add a comment explaining you concerns. However, I could try to split them if you think this is not enough.
> > 
> > Please, let me know what you think.
> Yes, I think for now it is fine, but we should definitely keep that in mind, for future checks.
> 
> Another related question came to mind: How are we going to deal with nested loops where different nests can be vectorized, e.g. say we have nested loops with 3 levels and both outer-most loops can be vectorized? If we decide to vectorize the outermost loop, wouldn't we have to skip handling the other outer loop? Or have links between the VPlans to decide which level is best to vectorize?
> Yes, I think for now it is fine, but we should definitely keep that in mind, for future checks.

Ok, I added a comment explaining the situation. I also tried to find a better place for these checks, at least to indicate it in the comment, but I couldn't find a good place at this point. We don't have the infrastructure to evaluate multiple outer loops of the same loop nest. Hopefully, we comment is clarifying enough.

> If we decide to vectorize the outermost loop, wouldn't we have to skip handling the other outer loop? 

Yes, if the outermost loop is finally vectorized, the outer and the inner should be marked also as vectorized (I introduced a TODO suggesting this in an earlier version of this patch, but we decided to remove it to avoid confusion and keep the code cleaner). Or we could follow any other approach that leads to the same behavior: skipping them.

>Or have links between the VPlans to decide which level is best to vectorize?

The idea is to have an initial H-CFG modeling the input IR of the whole loop nest (starting form the outermost vectorizable loop) and use it as a starting point to evaluate all the candidate loops for vectorization. Does this answer your questions?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5128
+  // 1.
+  PHINode *IV = Lp->getCanonicalInductionVariable();
+  if (!IV) {
----------------
fhahn wrote:
> dcaballe wrote:
> > fhahn wrote:
> > > I think the use of getCanonicalInductionVariable is discouraged. I think it would be better to detect induction variables using SCEV, as done LoopVectorizeLegality.
> > Could you please elaborate a bit more? Why is it discouraged? I can't find any comments in the source code.
> > 
> > We are trying to introduce some restrictive but simple checks. If the answer is that this interface is discouraged because it may not detect some IVs that that are canonical, that would be perfectly fine. I'm also looking at the `LoopVectorizationLegality::addInductionPhi`. Isn't this function doing something similar to `getCanonicalInductionVariable` to detect the primary induction but using `InductionDescriptor`?
> Yes, for the very simple checks it works, but as you said we potentially miss IVs that we could support. I suppose we could use getCanonicalInductionVariable if it keeps things simple for now, but we definitely should relax that soonish.
Then it's perfectly fine. Let's do it incrementally. It's the whole purpose of the approach. We may even want to support non-canonical IVs sooner than later and coming up with something more complicated for the current check might not be worth it.

https://reviews.llvm.org/D42447