[llvm] [LV][EVL] Support in-loop reduction using tail folding with EVL. (PR #90184)

Mon Jul 15 01:15:03 PDT 2024

================
@@ -1429,11 +1429,20 @@ bool VPlanTransforms::tryAddExplicitVectorLength(VPlan &Plan) {
   // The transform updates all users of inductions to work based on EVL, instead
   // of the VF directly. At the moment, widened inductions cannot be updated, so
   // bail out if the plan contains any.
-  if (any_of(Header->phis(), [](VPRecipeBase &Phi) {
-        return (isa<VPWidenIntOrFpInductionRecipe>(&Phi) ||
-                isa<VPWidenPointerInductionRecipe>(&Phi));
-      }))
+  bool ContainsWidenInductions = any_of(Header->phis(), [](VPRecipeBase &Phi) {
+    return isa<VPWidenIntOrFpInductionRecipe, VPWidenPointerInductionRecipe>(
+        &Phi);
+  });
+  // FIXME: Remove this once we can transform (select header_mask, true_value,
+  // false_value) into vp.merge.
+  bool ContainsOutloopReductions =
+      any_of(Header->phis(), [&](VPRecipeBase &Phi) {
+        auto *R = dyn_cast<VPReductionPHIRecipe>(&Phi);
+        return R && !R->isInLoop();
+      });
+  if (ContainsWidenInductions || ContainsOutloopReductions)
     return false;
----------------
Mel-Chen wrote:

Yes, out-loop reduction for EVL vectorization is feasible, but not at the moment. The reason is that the EVL for the second-to-last iteration might be smaller than VF*UF. This would poison the elements from lane EVL + 1 to lane VF*UF in the merging of the reduction results from the last-to-third and second-to-last iterations.

We will create a patch to address this issue. This patch will transform `select(HeaderMask, LHS, RHS)` and emit `vp.merge`. Once this is done, we can remove the restriction on out-loop reduction.

https://github.com/llvm/llvm-project/pull/90184