[llvm] [IR][LangRef] Add partial reduction add intrinsic (PR #94499)
Graham Hunter via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 12 08:11:34 PDT 2024
================
@@ -7914,6 +7914,28 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
setValue(&I, Trunc);
return;
}
+ case Intrinsic::experimental_vector_partial_reduce_add: {
+ auto DL = getCurSDLoc();
+ auto ReducedTy = EVT::getEVT(I.getType());
+ auto OpNode = getValue(I.getOperand(1));
+ auto FullTy = OpNode.getValueType();
+
+ auto Accumulator = getValue(I.getOperand(0));
+ unsigned ScaleFactor = FullTy.getVectorMinNumElements() / ReducedTy.getVectorMinNumElements();
+
+ for(unsigned i = 0; i < ScaleFactor; i++) {
+ auto SourceIndex = DAG.getVectorIdxConstant(i * ScaleFactor, DL);
+ auto TargetIndex = DAG.getVectorIdxConstant(i, DL);
+ auto ExistingValue = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ReducedTy.getScalarType(), {Accumulator, TargetIndex});
+ auto N = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ReducedTy, {OpNode, SourceIndex});
----------------
huntergr-arm wrote:
This seems to assume that each subvector will be the same size as the smaller vector type? It works for the case we're interested in (e.g. <vscale x 16 x i32> to <vscale x 4 x i32>), but would fail if the larger type were <vscale x 8 x i32> -- you'd want to extract <vscale x 2 x i32> and reduce that. (We might never create such a partial reduction, but I think it should work correctly if we did).
https://github.com/llvm/llvm-project/pull/94499
More information about the llvm-commits
mailing list