[PATCH] D142015: [LV] Plan with and without FoldTailByMasking

Thu Apr 6 09:30:48 PDT 2023

dmgreen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5289

-std::optional<unsigned> LoopVectorizationCostModel::getVScaleForTuning() const {
+/// Convenience function that returns the value of vscale_range iff
+/// vscale_range.min == vscale_range.max or otherwise returns the value
----------------
david-arm wrote:
> Some changes like this don't really need to be in this patch, right?
Yeah sure, I can give that a try. It seems a bit odd on it's own to be honest, like a patch that makes things worse on its own and feels like we change it just so that we change it again later. Creating needless busywork. But I have for the moment moved it out of this patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5293
+static std::optional<unsigned>
+getVScaleForTuning(const Loop *L, const TargetTransformInfo &TTI) {
+  Function *TheFunction = L->getHeader()->getParent();
----------------
david-arm wrote:
> Can't we just pass in `TheFunction` instead?
I think I chose Loop because the LoopVectorizationPlanner doesn't store the Function, so would need pass L->getHeader()->getParent() as argument and it simplified the interface a little. Happy to change it though.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5321
+    // cost.
+    auto RTCostA =
+        A.Width.getFixedValue()
----------------
I also have pulled this part out into D147720, as a functional change that can be done separately.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5349
+  if (A.FoldTailByMasking && !B.FoldTailByMasking)
+    return (CostA * EstimatedWidthB) <= (CostB * EstimatedWidthA);
+
----------------
david-arm wrote:
> Again, I'm a bit concerned about the effect this will have when the active lane mask call is not costed into the loop. I'd prefer for now to be conservative and opt for the non-predicated scheme until we've accounted for the IV costs.
For MVE in the past, when we returned true for preferPredicateOverEpilogue, we would get a tail-predicated loop (or not vectorize). Now that we get both tail-folded and non-tail-folded loops to cost against one another, we need to pick the tail-folded version on a tie to not be worse than before. The scores between the two are often the same, so to be closer to the old codegen the conservative option is to chose FoldTail on a tie.

I believe that the only target that returns true from preferPredicateOverEpilogue at the moment is MVE. SVE can be adjusted later if needed, but my current thinking was to use UsePredicatedEpilogue from D145925 as a first step to get the benefits of epilog vectorization whilst hopefully not messing anything else up. That was the plan for how to treat SVE conservatively, and we can expand things if needed in the future. It doesn't really make a lot of sense to add up disparate throughput costs and expect them to mean anything, but we can perhaps come up with something if we need and if not we can always just force unpredicated body + predicated epilog. Let me know what you think.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5749
 unsigned
-LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
-                                                  InstructionCost LoopCost) {
+LoopVectorizationCostModel::selectInterleaveCount(VectorizationFactor VF) {
   // -- The interleave heuristics --
----------------
david-arm wrote:
> Again, it looks like the change in prototype doesn't need to be part of this patch and might be a useful tidy-up NFC patch. 
Thanks Yeah - That is a good idea. I can actually just remove this from the current version of the patch, I believe.

================
Comment at: llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll:973
 ; CHECK-LABEL: LV: Checking a loop in 'merge_with_dead_gep_between_regions'
-; CHECK:      VPlan 'Initial VPlan for VF={2},UF>=1' {
+; CHECK:      VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
 ; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
----------------
david-arm wrote:
> These debug output changes look useful by themselves outside of this patch - not sure if it's possible to pass in the `FoldTailByMasking` flag in a separate patch?
Hmm. It would involve pulling out VPlan->FoldTailByMasking into a separate patch. That feels like it is the core of this patch, to be honest. Pulling it out just for some debug messages that are usually present elsewhere feels like a bit of an odd patch on it's own. When you look at the whole debug output there is already parts explaining whether the CostModel is FoldTailByMasking.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142015/new/

https://reviews.llvm.org/D142015