[PATCH] D142015: [LV] Plan with and without FoldTailByMasking

Fri Apr 21 07:29:49 PDT 2023

david-arm added a comment.

Hi @dmgreen, this patch looks a lot smaller and tidier than last time - thanks a lot! I've just got a few more comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:195
+
   /// Cost of the loop with that width.
   InstructionCost Cost;
----------------
Perhaps this comment should now be something like:

  /// Cost of the loop with that width and vectorization style.

What do you think?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:302

   /// Plan how to best vectorize, return the best VF and its cost, or
   /// std::nullopt if vectorization and interleaving should be avoided up front.
----------------
I think the comment needs updating now because it no longer returns a VF.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1550

   /// Returns true if a scalar epilogue is not allowed due to optsize or a
   /// loop hint annotation.
----------------
This is not your fault, but whilst you are here could you update the comment to reflect the function behaviour? It looks like the comment probably got out of sync with the function at some point. :)

Perhaps something like:

  /// Returns true if a scalar epilogue is allowed. It may return false if:
  ///   1. We are optimising for code size
  ///   2. There is a loop hint annotation
  ///   etc.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1554
+    return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed ||
+           (!FoldTailByMasking &&
+            ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate);
----------------
Do you know the scenarios in which we aren't folding the tail by masking, but a scalar epilogue is not needed? I think CM_ScalarEpilogueNotNeededUsePredicate is only set for these cases:

1. The user has supplied a hint.
2. The user has set the prefer-predicate-over-epilogue flag.
3. The TTI hook preferPredicateOverEpilogue has returned true.

but I'd expect that for at least 3) we've set FoldTailByMasking  to true?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5076
   case CM_ScalarEpilogueNotAllowedUsePredicate:
-    [[fallthrough]];
+    LLVM_DEBUG(dbgs() << "LV: vector predicate hint/switch found.\n"
+                      << "LV: Not allowing scalar epilogue, creating "
----------------
In this case it looks like we're going to create two sets of plans - one with tail-folding and one without - and in both cases we're actually going to tail-fold anyway. Is that right? Not saying we shouldn't do that, but just trying to understand how this works.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5085
+    if (!FoldTailByMasking)
+      return computeFeasibleMaxVF(TC, UserVF, false);
+    LLVM_DEBUG(dbgs() << "LV: vector predicate hint/switch found.\n"
----------------
Don't we have to also set `ScalarEpilogueStatus = CM_ScalarEpilogueAllowed` here?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5476
+  const VectorizationFactor ScalarCost(ElementCount::getFixed(1),
+                                       VPlans[0]->foldTailByMasking(),
+                                       ExpectedCost, ExpectedCost);
----------------
This should always be false for VF=1, right?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7556

-    return {VF, 0 /*Cost*/, 0 /* ScalarCost */};
+    return {VF, false, 0 /*Cost*/, 0 /* ScalarCost */};
   }
----------------
Perhaps worth adding a `/* ... */` comment for the new argument too? Same for any other similar places in the file.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7617
+                  [VF, FoldTailByMasking](const VPlanPtr &Plan) {
+                    return Plan->hasVF(VF) &&
+                           Plan->foldTailByMasking() == FoldTailByMasking;
----------------
Is it worth changing `hasVF` to take the `FoldTailByMasking` as a second argument so you can just write:

  assert(count_if(VPlans,
                  [VF, FoldTailByMasking](const VPlanPtr &Plan) {
                    return Plan->hasVF(VF, FoldTailByMasking);
                  }) == 1 &&
         "Best VF has not a single VPlan.");

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8703
+  auto MaxVFPlusOne = MaxVF.getWithIncrement(1);
+  for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFPlusOne);
+       VF *= 2) {
----------------
Wouldn't it be simpler to just write:

  for (ElementCount VF = MinVF; ElementCount::isKnownLE(VF, MaxVF);

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/maximize-bandwidth-invalidate.ll:17
 ; COST: LV: Found an estimated cost of 3000000 for VF 16 For instruction:   %0 = load
-; COST: LV: Selecting VF: 1.
+; COST: LV: Selecting Tail folded VF: 1.

----------------
This seems odd. Perhaps I'm mistakend, but I thought with your patch we wouldn't decide to tail-fold with a VF of 1?

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll:805
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr i32, ptr [[TMP10]], i32 0
+; CHECK-NEXT:    store <vscale x 4 x i32> [[BROADCAST_SPLAT]], ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
----------------
I was confused at first, but actually this looks like an improvement - nice! I think before we were still using an interleave count of 1 as if we were tail-folding, but now we've fallen back on the non-tail-folding plan that uses interleaving.

================
Comment at: llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll:2915
 ; UNROLL-NO-VF-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; UNROLL-NO-VF:       vector.body:
+; UNROLL-NO-VF-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
----------------
For what it's worth this new version looks better, but do you know why? Is it because we no longer allow tail-folding for a VF of 1? I assume it was trying to tail-fold before due to the low trip count.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142015/new/

https://reviews.llvm.org/D142015