[Mlir-commits] [mlir] [MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg (PR #170936)

Mon Dec 15 02:19:30 PST 2025

akroviakov wrote:

> why do we need to rewrite it as multiple reduction instructions doing one reduce dim at at time? Wg to sg can do multi-dim reduce locally first and then do cross-subgroup reduction using SLM?

Exactly. 

> This is because SG reduction requires SG level data shuffling (faster than SLM trip). this is not a lane local operation and must be exposed at SG to WI distribution.

An SLM trip is never mentioned for SG-to-WI. The data shuffling is required for 2D reductions in lane-level code. My previous response tried to demonstrate that the lowest level should care about how to best produce its result, and any higher level should only care about how to best merge lower-level results. 2D WG is (SLM x (shuffle x (lane result))).

> So WG to SG reduction lowering is much different than SG to WI reduction lowering. If everything go though SLM, I think your approach makes sense.

Each hierarchy level can only expand the IR with the lower-level results merging logic, ultimately simplifying the core reduction if necessary (at some point 2D -> 1D). 

https://github.com/llvm/llvm-project/pull/170936