[PATCH] D142015: [LV] Plan with and without FoldTailByMasking

Sat Apr 29 12:31:50 PDT 2023

dmgreen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1550

   /// Returns true if a scalar epilogue is not allowed due to optsize or a
   /// loop hint annotation.
----------------
david-arm wrote:
> This is not your fault, but whilst you are here could you update the comment to reflect the function behaviour? It looks like the comment probably got out of sync with the function at some point. :)
> 
> Perhaps something like:
> 
>   /// Returns true if a scalar epilogue is allowed. It may return false if:
>   ///   1. We are optimising for code size
>   ///   2. There is a loop hint annotation
>   ///   etc.
> 
> 
Yeah this is a bit of a weird one nowadays. I had tried to remove it, but I think it makes more sense to keep around for specifying when a scalar epilogue cannot be used.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1554
+    return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed ||
+           (!FoldTailByMasking &&
+            ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate);
----------------
david-arm wrote:
> Do you know the scenarios in which we aren't folding the tail by masking, but a scalar epilogue is not needed? I think CM_ScalarEpilogueNotNeededUsePredicate is only set for these cases:
> 
> 1. The user has supplied a hint.
> 2. The user has set the prefer-predicate-over-epilogue flag.
> 3. The TTI hook preferPredicateOverEpilogue has returned true.
> 
> but I'd expect that for at least 3) we've set FoldTailByMasking  to true?
For 3 it would be both with and without FoldTailByMasking. D145925 adds a way for the target to control the SEL more directly.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5076
   case CM_ScalarEpilogueNotAllowedUsePredicate:
-    [[fallthrough]];
+    LLVM_DEBUG(dbgs() << "LV: vector predicate hint/switch found.\n"
+                      << "LV: Not allowing scalar epilogue, creating "
----------------
david-arm wrote:
> In this case it looks like we're going to create two sets of plans - one with tail-folding and one without - and in both cases we're actually going to tail-fold anyway. Is that right? Not saying we shouldn't do that, but just trying to understand how this works.
This falls though in both cases. In the code below we may find that the the MaxVF is a multiple of the known tripcount, and if so return plans with TailFold=false. Else it will not create TailFold=false plans, just using the TailFold=true.
There are a lot of edge cases.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5085
+    if (!FoldTailByMasking)
+      return computeFeasibleMaxVF(TC, UserVF, false);
+    LLVM_DEBUG(dbgs() << "LV: vector predicate hint/switch found.\n"
----------------
david-arm wrote:
> Don't we have to also set `ScalarEpilogueStatus = CM_ScalarEpilogueAllowed` here?
I don't believe so. The ScalarEpilogueStatus tell us at a high level whether we should be predicating. FoldTailByMasking then controls whether the individual vplans are predicated. So it felt better to keep the original ScalarEpilogueStatus inplace to refer back to, in case we would like to differentiate between CM_ScalarEpilogueAllowed and CM_ScalarEpilogueNotNeededUsePredicate cases.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5476
+  const VectorizationFactor ScalarCost(ElementCount::getFixed(1),
+                                       VPlans[0]->foldTailByMasking(),
+                                       ExpectedCost, ExpectedCost);
----------------
david-arm wrote:
> This should always be false for VF=1, right?
Not in all cases. Cases that are always predicated (like CM_ScalarEpilogueNotAllowedUsePredicate or CM_ScalarEpilogueNotAllowedLowTripLoop) will have VF=1 with tail folding still. They won't have any unpredicated vplans.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7617
+                  [VF, FoldTailByMasking](const VPlanPtr &Plan) {
+                    return Plan->hasVF(VF) &&
+                           Plan->foldTailByMasking() == FoldTailByMasking;
----------------
david-arm wrote:
> Is it worth changing `hasVF` to take the `FoldTailByMasking` as a second argument so you can just write:
> 
>   assert(count_if(VPlans,
>                   [VF, FoldTailByMasking](const VPlanPtr &Plan) {
>                     return Plan->hasVF(VF, FoldTailByMasking);
>                   }) == 1 &&
>          "Best VF has not a single VPlan.");
It feels like a separate parameter to me, that would want to be checked separately. I can change it if you think its better but there are places that call hasVF on it's own.

================
Comment at: llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll:2915
 ; UNROLL-NO-VF-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; UNROLL-NO-VF:       vector.body:
+; UNROLL-NO-VF-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
----------------
david-arm wrote:
> For what it's worth this new version looks better, but do you know why? Is it because we no longer allow tail-folding for a VF of 1? I assume it was trying to tail-fold before due to the low trip count.
This was because:
```
// Don't use a predicated vector body if the user has forced a vectorization
// factor of 1.
  if (UserVF.isScalar())
    SEL = CM_ScalarEpilogueAllowed;
```
It overrides the "tiny trip count" SEL that was applying previously. I think that for VF=1 it makes a lot sense to not predicated the body, but I've removed that from this patch to keep the diff down. We can re-add it in the future if needed.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142015/new/

https://reviews.llvm.org/D142015