[llvm] [AArch64][NEON][SVE] Lower i8 to i64 partial reduction to a dot product (PR #110220)

Fri Sep 27 05:56:48 PDT 2024

================
@@ -21942,6 +21944,20 @@ SDValue tryLowerPartialReductionToDot(SDNode *N,
   else
     Opcode = AArch64ISD::UDOT;
 
+  // Partial reduction lowering for (nx)v16i8 to (nx)v4i64 requires an i32 dot
+  // product followed by a zero / sign extension
+  if ((ReducedType == MVT::nxv4i64 && MulSrcType == MVT::nxv16i8) ||
+      (ReducedType == MVT::v4i64 && MulSrcType == MVT::v16i8)) {
+    EVT ReducedTypeHalved = (ReducedType.isScalableVector()) ? MVT::nxv4i32 : MVT::v4i32;
+
+    auto Doti32 =
+        DAG.getNode(Opcode, DL, ReducedTypeHalved,
+                    DAG.getConstant(0, DL, ReducedTypeHalved), A, B);
+    auto Extended = DAG.getSExtOrTrunc(Doti32, DL, ReducedType);
+    return DAG.getNode(ISD::ADD, DL, NarrowOp.getValueType(),
+                                  {NarrowOp, Extended});
----------------
SamTebbs33 wrote:

I think we can make this slightly cleaner by omitting the brackets around the `NarrowOp` and `Extended` operands. There should be a variant of `getNode` that takes the two operands without them being in a list.

https://github.com/llvm/llvm-project/pull/110220