[Mlir-commits] [mlir] [MLIR][XeGPU] Xegpu distribution patterns for load_nd, store_nd, and create_nd_tdesc. (PR #112945)

Tue Oct 29 10:34:01 PDT 2024

hanhanW wrote:

> > If I read the code correctly, most of vector dialect changes are about moving some utils to `VectorDistribute.cpp`. This part looks okay to me, just a nit about function comments.
> 
> Yup, most of it is trivial code move.
> 
> > The other part that I don't follow is the change in `VectorOps.cpp`. Can you elaborate a little more about why?
> 
> This is to aid with xegpu ops sinking through the `yield` op of the `warp_execute_on_lane_0`. Ops in xegpu take `xegpu.tensor_desc` as an argument. This type roughly describes a physical memory tile to be processed by a subgroup. So, to move to SIMT, each logical thread should own some portion of this tile (just like with vectors). Hence, I'm distributing the `xegpu.tensor_desc` type in a similar way to vector. Here's a pseudo IR example:
> 
> ```mlir
> %res = vector.warp_execute_on_lane_0(%laneid) {
>   ...
>   %desc = xegpu.create_nd_tdesc %somesrc[0, 0] : memref<24x32xf16> -> !xegpu.tensor_desc<24x32xf16>
>   vector.yield %desc: !xegpu.tensor_desc<24x32xf16> // original type for the whole subgroup, not distributed
> }
> 
> xegpu.someop %res : !xegpu.tensor_desc<24x2xf16> // the type is distributed
> ```
> 
> A tensor descriptor is a result of the `warp_execute_on_lane_0` has yielded type for the whole tile, but outside of the region it should be distributed. Currently, this is not allowed since the validation checks the type against vector and fails. So, I allow a shaped type instead.

I see, thanks for the detail! I'm okay with the change; I think we also need to update the documentation?

https://github.com/llvm/llvm-project/blob/d661aea4c5668fc9b06f4b26d9fb072b1a6d7ff4/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td#L2994

https://github.com/llvm/llvm-project/pull/112945