[PATCH] D79783: [LoopVectorize] Fallback to a scalar epilogue when TP fails

Fri May 15 05:13:22 PDT 2020

Pierre-vh added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1233

-bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
+bool LoopVectorizationLegality::prepareToFoldTailByMasking(bool ReportFailure) {

----------------
Ayal wrote:
> Rather than if/how to report failure, probably better to clone this method into a `canFoldTailByMasking()` (as it was originally called iirc) const method, w/o having any side-effects that may later need to be abandoned, and a `prepareToFoldTailByMasking()` which need not return any value.
I've been thinking about this, but I'm not exactly sure it's the right thing to do. There will probably be a small amount of code duplication and we'd be iterating over the loop's instruction twice.

I'm not a fan of iterating over the loop's instructions more than needed, especially in this case as I think it can be avoided.

Here is another idea: What if `prepareToFoldTailByMasking ` becomes const, and instead of directly inserting elements into `MaskedOp` and `ConditionalAssumes`, it works on a temporary set.

The simplest way to implement this would be to take sets by reference, like this:
`bool prepareToFoldTailByMasking(SmallPtrSetImpl<Instruction*> &MaskedOps, SmallPtrSetImpl<Instruction*> ConditionalAssumes, bool ReportFailure = true)`. Then the caller controls the sets, and can either give them to `LoopVectorizationLegal`, or discard them.

Another idea would be to move this logic in another object, for instance, something like `TailFoldingLegal`, which would:
  - Check the loop, like `prepareToFoldTailByMasking` currently does, and store the info into its own sets
  - Have methods to either abandon tail-folding by masking, or apply it (= transfer the contents of its sets into `LoopVectorizationLegal`'s sets)

Then it could be used like this:
```
TailFoldingLegal TFL;
if tail folding is needed:
  TFL.checkLoop(TheLoop)

// etc.

if TFL.canFoldTailByMasking() && we need a tail:
  TFL.applyTailFoldingByMasking(Legal);
```

What do you think ? Do you prefer a simpler approach, even if it means iterating over the loop's instruction many times, or something a bit more complete like what I suggested? 

Note that I don't know the cost of iterating over the loop. Maybe it's really cheap and I shouldn't worry about it (then just ignore everything I suggested above). In any case, I'll just implement your favorite solution, as you know much more about this topic than I do, so please tell me what you prefer.

Thank you very much for your help.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79783/new/

https://reviews.llvm.org/D79783