[Mlir-commits] [mlir] [MLIR][XeGPU][VectorToXeGPU] Lower vector.load/store/transfer_read/transfer_write to new offsets syntax (PR #162095)

Tue Oct 28 20:31:54 PDT 2025

Jianhui-Li wrote:

> Looking at the older `update_nd_offset` semantics, the update is also restricted to the descriptor's rank. I'd expect the new offset syntax to essentially fold `update_nd_offset` into `load_nd` while leaving descriptor creation as is. If you want to fully avoid offsets at `create_nd_tdesc`, then an extra `memref.subview` could be added to create initial tile pointer.
> 
> Is there any clear benefit in relaxing load offset semantics?

Actually this is a great question. The original motivation is that the users want to express a 2D block in the nD tensor with the higher dimensions (n>=2) flattened to the 2nd dimension. After that user only works with the flattened 2D tensor and moves the 2D block within the flatten 2D tensor. 

So I think that the right solution is what you proposed:  As we move the offsets to load_nd, XeGPU user must explicitly insert a memref.subview to flatten the ND tensor to 2D, and then create a 2D tensor descriptor. 

Allowing 3D+ offsets creates an issue during lowering to XeVM, even we allow it in IR creation time. The tensor descriptor in HW only tracks the stride of innermost dimension so only support 2D block load. Allowing 3D+ offset means we need to either compute the flatten 2D offsets inside the K-loop, or expand the tensor descriptor to track more than 1 stride. This increases complexity and may cause negative performance impact. Instead, I think the right balance between "ease of use" and "performance" is to ask user to do a subview upfront to flatten it upfront. 

See the discussion here https://github.com/llvm/llvm-project/pull/164701  

https://github.com/llvm/llvm-project/pull/162095