[llvm] [LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (PR #101641)

Tue Oct 29 06:16:47 PDT 2024

================
@@ -1392,7 +1395,19 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
                                                   TypeInfo.inferScalarType(Sel),
                                                   false, false, false);
               })
-
+              .Case<VPInstruction>([&](VPInstruction *VPI) -> VPRecipeBase * {
+                VPValue *LHS, *RHS;
+                if (!match(VPI, m_Select(m_Specific(HeaderMask), m_VPValue(LHS),
----------------
Mel-Chen wrote:

I understand your concern. However, I don’t think we need to perform a check specifically for reductions. As long as the `VPInstruction::select` matches the form `select(HeaderMask, LHS, RHS)`, it is correct to convert it to `vp.merge(all-true, LHS, RHS, EVL)`, whether or not it’s a predicated reduction select.

The difference between `vp.merge` and `vp.select` is that for all result lanes at positions greater or equal than EVL, `vp.select` sets them as  undefined, whereas `vp.merge` sets those lanes to `RHS` (the value on false). Thus, `VPWidenSelectRecipe` and `VPInstruction::select` **without** a header mask condition can be converted to `vp.select` because we don't care the results at inactive lanes, while `VPInstruction::select` **with** a header mask condition must be converted to `vp.merge`.

https://github.com/llvm/llvm-project/pull/101641