[llvm] [AArch64] Generalize bfdotq_lane patterns to work for f32/i32 duplanes (PR #171146)

Wed Dec 10 08:46:22 PST 2025

================
@@ -9054,6 +9038,39 @@ class BF16ToSinglePrecision<string asm>
 }
 } // End of let mayStore = 0, mayLoad = 0, hasSideEffects = 0
 
+multiclass BaseSIMDThreeSameVectorBF16DotI<bit Q, bit U, string asm,
+                                           string dst_kind, string lhs_kind,
+                                           string rhs_kind,
+                                           RegisterOperand RegType,
+                                           ValueType AccumType,
+                                           ValueType InputType> {
+  let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {
+    def NAME : BaseSIMDIndexedTied<Q, U, 0b0, 0b01, 0b1111, RegType, RegType, V128, VectorIndexS,
+                                   asm, "", dst_kind, lhs_kind, rhs_kind, []>
+    {
+      bits<2> idx;
+      let Inst{21} = idx{0};  // L
+      let Inst{11} = idx{1};  // H
+    }
+  }
+
+  foreach DupTypes = [VTPair<AccumType, v4f32>,
+                      VTPair<ChangeElementTypeToInteger<AccumType>.VT, v4i32>] in {
+    def : Pat<(AccumType (int_aarch64_neon_bfdot
+                (AccumType RegType:$Rd), (InputType RegType:$Rn),
+                (InputType (bitconvert
+                  (DupTypes.VT0 (AArch64duplane32 (DupTypes.VT1 V128:$Rm), VectorIndexS:$Idx)))))),
+              (!cast<Instruction>(NAME) $Rd, $Rn, $Rm, VectorIndexS:$Idx)>;
----------------
paulwalker-arm wrote:

This doesn't look correct for big-endian targets, where mixed-element-size bitconverts are not no-ops.

The removed pattern matched matching bitconverts either side of the AArch64duplane32, which is ok because they're effectively back to back and so collapse to a no-op.

I appreciate some of this is existing code, but if we're increasing the applicability of the pattern then I'd rather not make things worse. I'm wondering if there's a post legalisation combine you can add to replace `bitconvert->duplane->bitconvert` with `nvcast->duplane->nvcast` and then the above can work for both cases when matching a single nvcast?

https://github.com/llvm/llvm-project/pull/171146