[Mlir-commits] [mlir] [MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg (PR #170936)

Thu Dec 11 11:14:22 PST 2025

charithaintc wrote:

> If we allowed lane level to have 2D vectors, then SG level would only decide on how to merge lane level results, and lane level would decide how to best handle its local 2d vector, but we do not allow that, so we decompose at the SG level once we get lane layout to reason about decomposition.

This is because SG reduction requires SG level data shuffling. this is not a lane local operation and must be exposed at SG to WI distribution. So WG to SG reduction lowering is much different than SG to WI reduction lowering.
Plus it usually much better to break down reduction into single dim so you implementation is simpler (without necessarily losing performance). You can look at broadcast is lowered to LLVM (rewrite complex broadcasts in terms of simple ones). 

But if Xe4 support 2D -> scalar reductions directly we should consider other approaches.  

https://github.com/llvm/llvm-project/pull/170936