[Mlir-commits] [mlir] [MLIR][XeGPU] Lowering 2-Dimensional Reductions of N-D Tensors into Chained 1-D Reductions (PR #186034)

Wed Mar 18 19:16:29 PDT 2026

================
@@ -444,33 +440,27 @@ class MultiRed2dOpPattern
     auto loc = reductionOp.getLoc();
     auto acc = reductionOp.getAcc();
 
-    // The first reduction's dist attribute does not have the cross lane dim.
-    auto resSliceLayoutAttr = cast<xegpu::SliceAttr>(resLayout);
-    SmallVector<int64_t> dropDims{crossLaneDim};
-    auto intraLaneRedResLayout = resSliceLayoutAttr.dropSliceDims(dropDims);
-
     SmallVector<int64_t> accShape(sourceVecType.getShape());
     accShape.erase(accShape.begin() + intraLaneDim);
-    if (acc) {
-      acc = vector::BroadcastOp::create(
-          rewriter, loc,
-          VectorType::get(accShape, sourceVecType.getElementType()), acc);
-      xegpu::setDistributeLayoutAttr(
-          llvm::dyn_cast<OpResult>(acc),
-          cast<xegpu::DistributeLayoutAttr>(intraLaneRedResLayout));
-    }
+    Type eTy = sourceVecType.getElementType();
+    Attribute eVal;
+    if (eTy.isFloat())
+      eVal = FloatAttr::get(eTy, 0.0);
----------------
Jianhui-Li wrote:

Thanks. There is a helper function in workgroupdistribution named createAccumulator(), which can be reused for this purpose. 
We can't reuse the existing acc here. This is the bug this PR try to fix, since the optimization split the reduction (say 2x2) to two steps, the first step will generate an intermediate result (2x1), then reduce it to 1x1.  If we added accumulator to intermediate result, then the accumulator shape is 2x1, which is then further reduced in last step so effectively we add the accumlator twice. 

https://github.com/llvm/llvm-project/pull/186034