[PATCH] D79783: [LoopVectorize] Fallback to a scalar epilogue when TP fails

Mon May 18 13:33:54 PDT 2020

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1233

-bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
+bool LoopVectorizationLegality::prepareToFoldTailByMasking(bool ReportFailure) {

----------------
Pierre-vh wrote:
> Ayal wrote:
> > Pierre-vh wrote:
> > > Ayal wrote:
> > > > Rather than if/how to report failure, probably better to clone this method into a `canFoldTailByMasking()` (as it was originally called iirc) const method, w/o having any side-effects that may later need to be abandoned, and a `prepareToFoldTailByMasking()` which need not return any value.
> > > I've been thinking about this, but I'm not exactly sure it's the right thing to do. There will probably be a small amount of code duplication and we'd be iterating over the loop's instruction twice.
> > > 
> > > I'm not a fan of iterating over the loop's instructions more than needed, especially in this case as I think it can be avoided.
> > > 
> > > Here is another idea: What if `prepareToFoldTailByMasking ` becomes const, and instead of directly inserting elements into `MaskedOp` and `ConditionalAssumes`, it works on a temporary set.
> > > 
> > > The simplest way to implement this would be to take sets by reference, like this:
> > > `bool prepareToFoldTailByMasking(SmallPtrSetImpl<Instruction*> &MaskedOps, SmallPtrSetImpl<Instruction*> ConditionalAssumes, bool ReportFailure = true)`. Then the caller controls the sets, and can either give them to `LoopVectorizationLegal`, or discard them.
> > > 
> > > Another idea would be to move this logic in another object, for instance, something like `TailFoldingLegal`, which would:
> > >   - Check the loop, like `prepareToFoldTailByMasking` currently does, and store the info into its own sets
> > >   - Have methods to either abandon tail-folding by masking, or apply it (= transfer the contents of its sets into `LoopVectorizationLegal`'s sets)
> > > 
> > > Then it could be used like this:
> > > ```
> > > TailFoldingLegal TFL;
> > > if tail folding is needed:
> > >   TFL.checkLoop(TheLoop)
> > > 
> > > // etc.
> > > 
> > > if TFL.canFoldTailByMasking() && we need a tail:
> > >   TFL.applyTailFoldingByMasking(Legal);
> > > ```
> > > 
> > > What do you think ? Do you prefer a simpler approach, even if it means iterating over the loop's instruction many times, or something a bit more complete like what I suggested? 
> > >  
> > > Note that I don't know the cost of iterating over the loop. Maybe it's really cheap and I shouldn't worry about it (then just ignore everything I suggested above). In any case, I'll just implement your favorite solution, as you know much more about this topic than I do, so please tell me what you prefer.
> > >  
> > > Thank you very much for your help.
> > Agreed, would be better to avoid repeated instruction scans.
> > 
> > In order for prepareToFoldTail() to be an optional preference, it should cause no side-effects when returning false; i.e., it should add to MaksedOps and ConditionalAssumes iff returning true. This mostly affects blockCanBePredicated() - it would need to have MaskedOps and ConditionalAssumes passed-in as arguments. Sounds reasonable?
> > In order for prepareToFoldTail() to be an optional preference, it should cause no side-effects when returning false; i.e., it should add to MaksedOps and ConditionalAssumes iff returning true. This mostly affects blockCanBePredicated() - it would need to have MaskedOps and ConditionalAssumes passed-in as arguments. Sounds reasonable?
> 
> This is a good idea, but there is one issue left: when `ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate` and the loop has no tail (`TC > 0 && TC % MaxVF == 0`), we have to abandon tail-folding (Line 5029 in LoopVectorize.cpp), but if we do things that way, there's no way to abandon tail-folding at that stage. There are 2 solutions that I can think of:
> 
> - Refactor `computeMaxVF` so it calculates the MaxVF earlier. That way, we can tell if the loops has a tail or not before calling `prepareToFoldTailByMasking` (= we only call it if there's a tail). Unfortunately, I tried moving the call to `computeFeasibleMaxVF` earlier in the function, but it triggered the assertion at line 5000 (in `if (!useMaskedInterleavedAccesses(TTI))`), so it doesn't look like a trivial fix.
> - Make `prepareToFoldTailByMasking` return some kind of result object (that'd contain both `MaskedOp`/`ConditionalAssumes` sets), which can either be discarded or applied to `LoopVectorizationLegal`. This is similar to what I suggested in my previous comment, just a bit simplified. 
>     - As I also suggested earlier, we can do without the result object - just pass-in the sets to `prepareToFoldTailByMasking` and add new methods to `LoopVectorizationLegal` to add elements to the `MaskedOp`/`ConditionalAssumes` sets. This is the simplest fix, but maybe not the cleanest.
> 
> Either solution is fine for me, but I'll need some guidance on how to refactor `computeMaxVF` properly if you prefer the first solution.
Right. Instead of introducing an early call to prepareToFoldTail(), before knowing MaxVF and if there's a tail to fold or else the preparation should be undone, how about waiting until the original call to prepareToFoldTail() fails. Then, before bailing-out, check if the status is CM_ScalarEpilogueNotNeededUsePredicate and if so return MaxVF instead?

Note BTW that D80085 should hopefully land soon.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79783/new/

https://reviews.llvm.org/D79783