[llvm] [AArch64] Improve codegen for some fixed-width partial reductions (PR #126529)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 10 07:29:13 PST 2025
================
@@ -16866,9 +16866,14 @@ bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(
// mul(zext(i8), sext) can be transformed into smull(zext, sext) which
// performs one extend implicitly. If DstWidth is at most 4 * SrcWidth, at
// most one extra extend step is needed and using tbl is not profitable.
+ // Similarly, bail out if partial_reduce(acc, zext(i8)) can be lowered to a
+ // udot instruction.
if (SrcWidth * 4 <= DstWidth && I->hasOneUser()) {
auto *SingleUser = cast<Instruction>(*I->user_begin());
- if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))))
+ if (match(SingleUser, m_c_Mul(m_Specific(I), m_SExt(m_Value()))) ||
+ (isa<IntrinsicInst>(SingleUser) &&
+ !shouldExpandPartialReductionIntrinsic(
----------------
david-arm wrote:
Currently `shouldExpandPartialReductionIntrinsic` does not check whether the target actually has support for the udot/sdot, but the loop vectoriser should not be generating partial reduction intrinsic calls in that case.
https://github.com/llvm/llvm-project/pull/126529
More information about the llvm-commits
mailing list