[llvm] [SelectionDAG] Improve type legalisation for PARTIAL_REDUCE_MLA (PR #130935)

Wed Apr 23 07:22:59 PDT 2025

================
@@ -3220,8 +3220,26 @@ void DAGTypeLegalizer::SplitVecRes_VP_REVERSE(SDNode *N, SDValue &Lo,
 void DAGTypeLegalizer::SplitVecRes_PARTIAL_REDUCE_MLA(SDNode *N, SDValue &Lo,
                                                       SDValue &Hi) {
   SDLoc DL(N);
-  SDValue Expanded = TLI.expandPartialReduceMLA(N, DAG);
-  std::tie(Lo, Hi) = DAG.SplitVector(Expanded, DL);
+  SDValue Acc = N->getOperand(0);
+  SDValue Input1 = N->getOperand(1);
+
+  // If the node has not gone through the DAG combine, then do not attempt to
+  // legalise, just expand.
+  if (!TLI.isPartialReduceMLALegal(Acc.getValueType(), Input1.getValueType())) {
+    SDValue Expanded = TLI.expandPartialReduceMLA(N, DAG);
+    std::tie(Lo, Hi) = DAG.SplitVector(Expanded, DL);
+    return;
+  }
+
+  SDValue AccLo, AccHi, Input1Lo, Input1Hi, Input2Lo, Input2Hi;
----------------
sdesmalen-arm wrote:

If the input types don't need splitting, then instead of actually splitting the vector result *and* input operands it would be simpler (and more efficient) to reduce into the lower/higher half of the accumulator.

e.g.
```
nxv4i64 partial.reduce.mla(nxv4i64 %acc, nxv8i64 mul(zext(nxv8i16 %a), zext(nxv8i16 %b))
=> 
nxv4i64 insert.subvector(nxv4i64 %acc,
  nxv2i64 partial.reduce.mla(nxv2i64 extract.subvector(nxv4i64 %acc, 0),
                             nxv8i64 mul(zext(nxv8i16 %a), zext(nxv8i16 %b), 0)
```

https://github.com/llvm/llvm-project/pull/130935