[llvm] [AArch64] Lower partial add reduction to udot or svdot (PR #101010)

Thu Aug 15 04:14:16 PDT 2024

================
@@ -1971,6 +1971,48 @@ bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
   return false;
 }
 
+bool AArch64TargetLowering::shouldExpandPartialReductionIntrinsic(
+    const IntrinsicInst *I) const {
+
+  VectorType *RetTy = dyn_cast<VectorType>(I->getType());
+  if (!RetTy || !RetTy->isScalableTy())
+    return true;
+
+  Value *InputA;
+  Value *InputB;
+  if (match(I,
+            m_Intrinsic<Intrinsic::experimental_vector_partial_reduce_add>(
+                m_Value(), m_OneUse(m_Mul(m_ZExtOrSExt(m_Value(InputA)),
+                                          m_ZExtOrSExt(m_Value(InputB))))))) {
+    VectorType *InputAType = dyn_cast<VectorType>(InputA->getType());
+    VectorType *InputBType = dyn_cast<VectorType>(InputB->getType());
+    if (!InputAType || !InputBType)
+      return true;
----------------
paulwalker-arm wrote:

I don't think this function needs to be this complex.  Is it not possible to base the decision purely on the result type (e.g. legal scalable vectors).

I ask because you're putting significant effort into matching DOT instructions, which is not that unreasonable given the PR's title but the true intent of this patch is to enable better code generation for the partial reduction intrinsic, of which DOT is just one possible destination.

For me the complexity can stay in the target specific DAG combine, which will evolve over time, with the only cost being perhaps a duplication of the common lowering code.  This duplication is easily solved by moving it into a dedicated SelectionDAG function that can be called by both the builder and the target specific DAG combine.

https://github.com/llvm/llvm-project/pull/101010