[llvm] Add LoopVectorizer support for `llvm.vector.partial.reduce.fadd` (PR #163975)
Sander de Smalen via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 12 05:28:11 PST 2025
================
@@ -5879,6 +5887,13 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
return Cost;
}
+ // f16 -> f32 is natively supported for fdot
+ if (Opcode == Instruction::FAdd && (ST->hasSME2() || ST->hasSVE2p1())) {
+ if (AccumLT.second.getScalarType() == MVT::f32 &&
+ InputLT.second.getScalarType() == MVT::f16)
----------------
sdesmalen-arm wrote:
I just realised that this also needs a check that the vector type is a 'full' vector, i.e.
`&& AccumLT.second.getVectorMinNumElements() == 4 && InputLT.second.getVectorMinNumElements() == 8`
and rather than falling back to `return Cost + 2`, we should return a higher cost (e.g. `return Cost + 20`) because for FP types we don't promote the types, but rather fall back onto expanding the partial reduce, which is more expensive.
https://github.com/llvm/llvm-project/pull/163975
More information about the llvm-commits
mailing list