[llvm] [LV, VP]VP intrinsics support for the Loop Vectorizer (PR #76172)

Fri Jan 26 08:36:33 PST 2024

================
@@ -4690,6 +4712,39 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
   // FIXME: look for a smaller MaxVF that does divide TC rather than masking.
   if (Legal->prepareToFoldTailByMasking()) {
     CanFoldTailByMasking = true;
+    if (getTailFoldingStyle() == TailFoldingStyle::None)
+      return MaxFactors;
+
+    if (UserIC > 1) {
+      LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                           "not generate VP intrinsics since interleave count "
+                           "specified is greater than 1.\n");
+      return MaxFactors;
+    }
+
+    if (MaxFactors.ScalableVF.isVector()) {
+      assert(MaxFactors.ScalableVF.isScalable() &&
+             "Expected scalable vector factor.");
+      // FIXME: use actual opcode/data type for analysis here.
+      PreferEVL = getTailFoldingStyle() == TailFoldingStyle::DataWithEVL &&
+                  TTI.hasActiveVectorLength(0, nullptr, Align());
+#if !NDEBUG
+      if (getTailFoldingStyle() == TailFoldingStyle::DataWithEVL) {
+        if (PreferEVL)
+          dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                    "try to generate VP Intrinsics.\n";
+        else
+          dbgs() << "LV: Preference for VP intrinsics indicated. Will "
+                    "not try to generate VP Intrinsics since the target "
+                    "does not support vector length predication.\n";
+      }
+#endif // !NDEBUG
+
+      // Tail folded loop using VP intrinsics restricts the VF to be scalable.
----------------
fhahn wrote:

Might be better to explain why, IIUC the only target that supports EVL effectively only supports it with scalable vectors?

https://github.com/llvm/llvm-project/pull/76172