[llvm] [LV] Use VPReductionRecipe for partial reductions (PR #144908)

Fri Jun 20 05:22:04 PDT 2025

================
@@ -2408,23 +2414,32 @@ class VPInterleaveRecipe : public VPRecipeBase {
   Instruction *getInsertPos() const { return IG->getInsertPos(); }
 };
 
-/// A recipe to represent inloop reduction operations, performing a reduction on
-/// a vector operand into a scalar value, and adding the result to a chain.
-/// The Operands are {ChainOp, VecOp, [Condition]}.
+/// A recipe to represent inloop, ordered or partial reduction operations. It
+/// performs a reduction on a vector operand into a scalar (vector in the case
+/// of a partial reduction) value, and adds the result to a chain. The Operands
+/// are {ChainOp, VecOp, [Condition]}.
 class VPReductionRecipe : public VPRecipeWithIRFlags {
   /// The recurrence kind for the reduction in question.
   RecurKind RdxKind;
   bool IsOrdered;
   /// Whether the reduction is conditional.
   bool IsConditional = false;
+  /// The scaling factor, relative to the VF, that this recipe's output is
+  /// divided by.
+  /// For outer-loop reductions this is equal to 1.
+  /// For in-loop reductions this is equal to 0, to specify that this is equal
+  /// to the VF (which may not be known yet). For partial-reductions this is
+  /// equal to another scalar value.
+  ElementCount VFScaleFactor;
----------------
sdesmalen-arm wrote:

I'm not sure if there's currently a compelling enough use-case for it, but there's no reason why we shouldn't be able to support a partial reduction of e.g. `<vscale x 16 x i32>` to `<4 x i32>`, which would require a VFScaleFactor of `vscale x 4`. An example of an instruction that does this for AArch64 would be `addqv` that partially reduces e.g. a `<vscale x 4 x i32>` to a `<4 x i32>`.

https://github.com/llvm/llvm-project/pull/144908