[llvm] [AArch64] Improve codegen for some fixed-width partial reductions (PR #126529)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 13 01:30:15 PST 2025
================
@@ -16866,9 +16866,14 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
// mul(zext(i8), sext) can be transformed into smull(zext, sext) which
// performs one extend implicitly. If DstWidth is at most 4 * SrcWidth, at
// most one extra extend step is needed and using tbl is not profitable.
+ // Similarly, bail out if partial_reduce(acc, zext(i8)) can be lowered to a
+ // udot instruction.
if (SrcWidth * 4 <= DstWidth && I->hasOneUser()) {
auto *SingleUser = cast<Instruction>(*I->user_begin());
- if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))))
+ if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))) ||
+ (isa<IntrinsicInst>(SingleUser) &&
+ !shouldExpandPartialReductionIntrinsic(
----------------
david-arm wrote:
Yep, that's a good suggestion. I've added an assert to `shouldExpandPartialReductionIntrinsic` for now, as I'd prefer not to change the hook in this patch if possible. Also, I can see a possible argument in future for the hook needing the instruction to provide context.
Also, there was a problem with my previous version anyway because the extended value could have been used for the accumulator, whereas in this latest version I'm explicitly checking it's the second arg.
https://github.com/llvm/llvm-project/pull/126529
More information about the llvm-commits
mailing list