[llvm] [LoopVectorizer] Add support for partial reductions (PR #92418)

Tue Oct 15 06:44:11 PDT 2024

================
@@ -6871,6 +6974,18 @@ void LoopVectorizationCostModel::collectValuesToIgnore() {
     const SmallVectorImpl<Instruction *> &Casts = IndDes.getCastInsts();
     VecValuesToIgnore.insert(Casts.begin(), Casts.end());
   }
+
+  // Ignore any values that we know will be flattened
+  for (auto It : getPartialReductionChains()) {
+    PartialReductionChain Chain = It.second;
+    SmallVector<Value *> PartialReductionValues{Chain.Reduction, Chain.BinOp,
+                                                Chain.ExtendA, Chain.ExtendB,
+                                                Chain.Accumulator};
+    ValuesToIgnore.insert(PartialReductionValues.begin(),
+                          PartialReductionValues.end());
+    VecValuesToIgnore.insert(PartialReductionValues.begin(),
----------------
huntergr-arm wrote:

I believe so, I have https://github.com/llvm/llvm-project/pull/109671 to enable max bandwidth which does allow us to move the code to vplan recipes. There's some concern about whether there will be some performance regressions by doing so, but I wasn't able to find any when running spec2K17, lnt, or a few hpc kernels. So I'm tempted to commit that and fix up any we do find later.

I think the recipe-based transformation for LV can proceed independently of that. It's useful for NEON in the current form, and will be for SVE once we enable max bw there.

https://github.com/llvm/llvm-project/pull/92418