[llvm] [IR][LangRef] Add partial reduction add intrinsic (PR #94499)

Wed Jun 12 08:11:34 PDT 2024

================
@@ -7914,6 +7914,28 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     setValue(&I, Trunc);
     return;
   }
+  case Intrinsic::experimental_vector_partial_reduce_add: {
+    auto DL = getCurSDLoc();
+    auto ReducedTy = EVT::getEVT(I.getType());
+    auto OpNode = getValue(I.getOperand(1));
+    auto FullTy = OpNode.getValueType();
+
+    auto Accumulator = getValue(I.getOperand(0));
+    unsigned ScaleFactor = FullTy.getVectorMinNumElements() / ReducedTy.getVectorMinNumElements();
+
+    for(unsigned i = 0; i < ScaleFactor; i++) {
+      auto SourceIndex = DAG.getVectorIdxConstant(i * ScaleFactor, DL);
+      auto TargetIndex = DAG.getVectorIdxConstant(i, DL);
+      auto ExistingValue = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ReducedTy.getScalarType(), {Accumulator, TargetIndex});
+      auto N = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ReducedTy, {OpNode, SourceIndex});
----------------
huntergr-arm wrote:

This seems to assume that each subvector will be the same size as the smaller vector type? It works for the case we're interested in (e.g. <vscale x 16 x i32> to <vscale x 4 x i32>), but would fail if the larger type were <vscale x 8 x i32> -- you'd want to extract <vscale x 2 x i32> and reduce that. (We might never create such a partial reduction, but I think it should work correctly if we did).

https://github.com/llvm/llvm-project/pull/94499