[llvm] [AArch64][LV] Reduce cost of scaled reduction extends (PR #134074)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 3 05:42:51 PDT 2025
================
@@ -1757,6 +1766,43 @@ void VPWidenCastRecipe::execute(VPTransformState &State) {
setFlags(CastOp);
}
+// Detects whether the extension should be folded away into a combined
+// target instruction, and therefore given a cost of 0.
+// Handles patterns similar to the following:
+// * partial_reduce(ext, phi)
+// * partial_reduce(mul(ext, ext), phi)
+// * partial_reduce(sub(0, mul(ext, ext)), phi)
+static bool isScaledReductionExtension(const VPWidenCastRecipe *Extend) {
+ unsigned Opcode = Extend->getOpcode();
+ if (Opcode != Instruction::SExt && Opcode != Instruction::ZExt)
+ return false;
+
+ // Check that all users are either a partial reduction, or a multiply
+ // (and possibly subtract) used by a partial reduction.
+ return all_of(Extend->users(), [](const VPUser *U) {
+ // Look through a (possible) multiply.
+ if (const VPWidenRecipe *I = dyn_cast_if_present<VPWidenRecipe>(U)) {
----------------
david-arm wrote:
Hmm, whilst this may be true for aarch64 I wonder if it's correct in general to assume that a partial reduction by definition folds a mul into a udot? It's my understanding that at the IR level partial reductions are far more abstract than just a udot or sdot. At the IR level we're simply partially reducing a set of values into a smaller set. It's quite conceivable that a target has support for this that doesn't involve muls, i.e. an instruction that sums up each 4 bytes of an input and accumulates in 32-bit result? In which case the mul is not free. At the moment this does like we're taking a AArch64 cost model and using it in a general way for everyone.
https://github.com/llvm/llvm-project/pull/134074
More information about the llvm-commits
mailing list