[Mlir-commits] [mlir] [MLIR][XeGPU] Allow some nd ops to have argument shapes mismatch for … (PR #120566)

Thu Jan 9 11:08:33 PST 2025

charithaintc wrote:

> I would not rush for option 4 also. If I understand correctly, the subview op needs to compute the offsets from id, and then create the subview with sizes, offsets, strides. I like the approach keeping the original tensor descriptor as a whole, but it requires adding a subview op which doesn't look like other XeGPU OPs (mapping to concrete hardware operations). I don't know what benefit it can bring at this point other than the IR appears less "confusing", which is a debatable point.
> 
> To me, whether the IR is "confusing" actually depends on how the IR is lowered or optimized. My view is actually reverse. The type mismatch doesn't bother me that much. But if the IR doesn't model the hardware behavior, say it introduces per-lane offsets/sizes computation which we don't need during the lowering, it causes a different type of confusion that bothers me more. The passes on XeGPU is mostly target-specific, so people likes to match the IR with what hardware behavior - each lane takes the whole shape and read back its own data fragments implicitly, instead of computing its own offsets/sizes. If transformation/optimization needs to know the data fragment distribution, they can refer to sg_map that was designed to explicitly describe the data distribution.
> 
> I suggest we first go with option 2. When it become clear that we really need a subview op, we can revisit it.

Agreed. +1 for option 2. XeGPU is primarily designed to faithfully represent HW details in block Load/Store. So in my view, approach 1 violates this philosophy for not clearly apparent benefits. 

>I don't think this is necessary, you just need to restore the parameters such as the size of the descriptor which can be done using the sg_map. The assignment of data chunks to lane ids is implicit. Anyway, this is neither what I'm proposing nor it is related to the patch.

This argument is not clear to me. Are you saying that `sg_map` must be present even after the SIMT distribution? That seems like an unnecessary demand because after SIMT we don't really need `sg_map`. Also what if some user wants to directly generate XeGPU SIMT? I maybe misunderstood but this need some clarification. 

https://github.com/llvm/llvm-project/pull/120566