[Mlir-commits] [mlir] [MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg (PR #170936)

Mon Dec 8 08:24:59 PST 2025

================
@@ -1152,64 +1152,232 @@ struct WgToSgVectorShapeCastOp
   }
 };
 
-/// Pattern for lowering vector.multi_reduction op to subgroup level.
-/// Current limitation: the sg_layout in the reduced dimension being 1
-/// so that reduction is local to subgroup & no cross-subgroup communication is
-/// needed.
-/// TODO: Add cases to handle more general situations which require SLM access.
+// This pattern transforms vector.multi_dim_reduction ops to work at subgroup
+// level.
 struct WgToSgMultiDimReductionOp
     : public OpConversionPattern<vector::MultiDimReductionOp> {
   using OpConversionPattern<vector::MultiDimReductionOp>::OpConversionPattern;
 
   LogicalResult
   matchAndRewrite(vector::MultiDimReductionOp op, OneToNOpAdaptor adaptor,
                   ConversionPatternRewriter &rewriter) const override {
+    Location loc = op.getLoc();
+
     VectorType srcType = op.getSourceVectorType();
     VectorType dstType = dyn_cast<VectorType>(op.getResult().getType());
     if (!dstType)
       return failure();
 
-    auto srcShape = srcType.getShape();
+    auto originalSrcShape = srcType.getShape();
     xegpu::DistributeLayoutAttr layout =
         xegpu::getDistributeLayoutAttr(op.getResult());
+
     if (!layout || !layout.isForWorkgroup())
       return failure();
 
     auto reductionDims = llvm::to_vector(op.getReductionDims());
+    if (reductionDims.size() != 1)
+      return rewriter.notifyMatchFailure(
+          op, "Only single dimension reduction is supported");
----------------
akroviakov wrote:

But then we face a problem. If there is a 2D test case, then we have to rewrite it as two 1D reductions first. From what I see, this pattern naturally supports intra-sg reduction or further handles cross-sg results. 

If we were to consider 2D case, the pattern already has a most of the components for the hardcoded logic: do intra-sg reduction _and_ then cross-sg via SLM. We do not care how "2D" is to be represented at lower levels.

When we go lower and start to actually care how sg-local 2D reduction is executed, we have to do two 1D reductions. We decide on the order based on the layout (we first reduce the dimension that does not require shuffles, if any).

However, if we are forced to split 2D reduction into two 1D reductions, we lose the ability to reason about the better order, because we do not require lane layout at WG level and cannot use it when splitting.

Please correct me if I missed something.

https://github.com/llvm/llvm-project/pull/170936