[llvm] Add LoopVectorizer support for `llvm.vector.partial.reduce.fadd` (PR #163975)

Fri Dec 12 05:28:11 PST 2025

================
@@ -5879,6 +5887,13 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
       return Cost;
   }
 
+  // f16 -> f32 is natively supported for fdot
+  if (Opcode == Instruction::FAdd && (ST->hasSME2() || ST->hasSVE2p1())) {
+    if (AccumLT.second.getScalarType() == MVT::f32 &&
+        InputLT.second.getScalarType() == MVT::f16)
----------------
sdesmalen-arm wrote:

I just realised that this also needs a check that the vector type is a 'full' vector, i.e. 
`&& AccumLT.second.getVectorMinNumElements() == 4 && InputLT.second.getVectorMinNumElements() == 8`

and rather than falling back to `return Cost + 2`, we should return a higher cost (e.g. `return Cost + 20`) because for FP types we don't promote the types, but rather fall back onto expanding the partial reduce, which is more expensive.

https://github.com/llvm/llvm-project/pull/163975