[llvm] [AArch64] Generalize bfdotq_lane patterns to work for f32/i32 duplanes (PR #171146)
Paul Walker via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 10 08:46:22 PST 2025
================
@@ -9054,6 +9038,39 @@ class BF16ToSinglePrecision<string asm>
}
} // End of let mayStore = 0, mayLoad = 0, hasSideEffects = 0
+multiclass BaseSIMDThreeSameVectorBF16DotI<bit Q, bit U, string asm,
+ string dst_kind, string lhs_kind,
+ string rhs_kind,
+ RegisterOperand RegType,
+ ValueType AccumType,
+ ValueType InputType> {
+ let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {
+ def NAME : BaseSIMDIndexedTied<Q, U, 0b0, 0b01, 0b1111, RegType, RegType, V128, VectorIndexS,
+ asm, "", dst_kind, lhs_kind, rhs_kind, []>
+ {
+ bits<2> idx;
+ let Inst{21} = idx{0}; // L
+ let Inst{11} = idx{1}; // H
+ }
+ }
+
+ foreach DupTypes = [VTPair<AccumType, v4f32>,
+ VTPair<ChangeElementTypeToInteger<AccumType>.VT, v4i32>] in {
+ def : Pat<(AccumType (int_aarch64_neon_bfdot
+ (AccumType RegType:$Rd), (InputType RegType:$Rn),
+ (InputType (bitconvert
+ (DupTypes.VT0 (AArch64duplane32 (DupTypes.VT1 V128:$Rm), VectorIndexS:$Idx)))))),
+ (!cast<Instruction>(NAME) $Rd, $Rn, $Rm, VectorIndexS:$Idx)>;
----------------
paulwalker-arm wrote:
This doesn't look correct for big-endian targets, where mixed-element-size bitconverts are not no-ops.
The removed pattern matched matching bitconverts either side of the AArch64duplane32, which is ok because they're effectively back to back and so collapse to a no-op.
I appreciate some of this is existing code, but if we're increasing the applicability of the pattern then I'd rather not make things worse. I'm wondering if there's a post legalisation combine you can add to replace `bitconvert->duplane->bitconvert` with `nvcast->duplane->nvcast` and then the above can work for both cases when matching a single nvcast?
https://github.com/llvm/llvm-project/pull/171146
More information about the llvm-commits
mailing list