[llvm] [LV]Initial support for safe distance in predicated DataWithEVL vectorization mode. (PR #102897)

Tue Sep 10 16:08:22 PDT 2024

================
@@ -4071,15 +4093,25 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
     InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
   }
 
-  FixedScalableVFPair MaxFactors = computeFeasibleMaxVF(MaxTC, UserVF, true);
+  // If we don't know the precise trip count, or if the trip count that we
+  // found modulo the vectorization factor is not zero, try to fold the tail
+  // by masking.
+  // FIXME: look for a smaller MaxVF that does divide TC rather than masking.
+  setTailFoldingStyles(UserIC);
+  FixedScalableVFPair MaxFactors =
+      computeFeasibleMaxVF(MaxTC, UserVF, foldTailByMasking());
----------------
ayalz wrote:

Trying to reason about MaxVF, setting tail, max dependence distance, and EVL:

MaxVF: the current mechanism of computeMaxVF() should be simplified rather than further complicated, let alone due to EVL/safe-dependence-distance support. This method determines two upper bounds: fixed and scalable, of VF ranges for building VPlans, to save time considering unprofitable and illegal ones. How many VPlans should be built in case of EVL, for what range of (scalable) VF's - ending with MaxVF?
Suffice to consider a single VPlan for a single VF - the one corresponding to vector length computed dynamically by providing the original trip count and max safe distance - regardless of any MaxVF, both fixed and scalable? (LMULs other than 1 may play a role, but one that conceptually corresponds to UF, treated as a compile-time fixed constant.) Some scalable-VF should be used as the static type, accommodating the trip-count (if known) and max dependence distance. But eventually all excessive lanes will be masked out every iteration, i.e., MaxVF may exceed max dependence distance in the case of EVL, but not in other cases (fixed or scalable).

MaxVF and tail folding (yes/no/style): computeMaxVF() uses computeFeasibleMaxVF() which in turn uses getMaximizedVFForTarget() - the latter dependent on whether tail is folded or not - to limit MaxVF by the original trip count or not, and ends up being responsible for setting the tail style. Speculating a folded tail should produce greater (or equal) MaxVF's.

https://github.com/llvm/llvm-project/pull/102897