[llvm] [AArch64] Improve codegen for some fixed-width partial reductions (PR #126529)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 13 01:30:15 PST 2025


================
@@ -16866,9 +16866,14 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
     // mul(zext(i8), sext) can be transformed into smull(zext, sext) which
     // performs one extend implicitly. If DstWidth is at most 4 * SrcWidth, at
     // most one extra extend step is needed and using tbl is not profitable.
+    // Similarly, bail out if partial_reduce(acc, zext(i8)) can be lowered to a
+    // udot instruction.
     if (SrcWidth * 4 <= DstWidth && I->hasOneUser()) {
       auto *SingleUser = cast<Instruction>(*I->user_begin());
-      if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))))
+      if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))) ||
+          (isa<IntrinsicInst>(SingleUser) &&
+           !shouldExpandPartialReductionIntrinsic(
----------------
david-arm wrote:

Yep, that's a good suggestion. I've added an assert to `shouldExpandPartialReductionIntrinsic` for now, as I'd prefer not to change the hook in this patch if possible. Also, I can see a possible argument in future for the hook needing the instruction to provide context.

Also, there was a problem with my previous version anyway because the extended value could have been used for the accumulator, whereas in this latest version I'm explicitly checking it's the second arg.

https://github.com/llvm/llvm-project/pull/126529


More information about the llvm-commits mailing list