[llvm] [IR][LangRef] Add partial reduction add intrinsic (PR #94499)

Wed Jun 12 10:33:11 PDT 2024

================
@@ -7914,6 +7914,28 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     setValue(&I, Trunc);
     return;
   }
+  case Intrinsic::experimental_vector_partial_reduce_add: {
+    auto DL = getCurSDLoc();
+    auto ReducedTy = EVT::getEVT(I.getType());
+    auto OpNode = getValue(I.getOperand(1));
+    auto FullTy = OpNode.getValueType();
+
+    auto Accumulator = getValue(I.getOperand(0));
+    unsigned ScaleFactor = FullTy.getVectorMinNumElements() / ReducedTy.getVectorMinNumElements();
+
+    for(unsigned i = 0; i < ScaleFactor; i++) {
----------------
paulwalker-arm wrote:

The problem with the "having a separate binop" approach is that it constrains optimisation/code generation because that binop requires a very specific ordering for how elements are combined, which is the very problem the partial reduction is solving.

I think folk are stuck in a "how can we use dot instructions" mindset, whilst I'm trying to push for "what is the loosest way reductions can be represented in IR".  To this point, the current suggested langref text for the intrinsic is still too strict because it gives the impression there's a defined order for how the second operand's elements are combined with the first, where there shouldn't be.

https://github.com/llvm/llvm-project/pull/94499