[Mlir-commits] [mlir] [MLIR][XeGPU] Matrix load/store subgroup distribution (PR #165008)

Wed Oct 29 02:22:50 PDT 2025

================
@@ -562,6 +562,8 @@ class LoadStoreMatrixToXeVMPattern : public OpConversionPattern<OpType> {
     VectorType valOrResVecTy = dyn_cast<VectorType>(data.getType());
     if (!valOrResVecTy)
       valOrResVecTy = VectorType::get(1, data.getType());
+    if (valOrResVecTy.getShape().size() != 1)
----------------
akroviakov wrote:

Added verification. However, 

> change the vector<2x16xf32> to <16x2xf32>

I understand the logical reasoning for this in the matrix ops case, but the current distribution does not allow it, considering the "correct" lane layout the block load requires.

We have 
```cpp
  for (auto [i, dim] : llvm::enumerate(originalType.getShape())) {
    if (i < distributionStart)
      continue;
    // Check if the dimension can be distributed evenly.
    if (dim % effectiveLaneLayout[i - distributionStart] != 0)
      return failure();
    distributedShape[i] = dim / effectiveLaneLayout[i - distributionStart];
  }
  ```
  Meaning that given `lane_layout = [1, 16], lane_data = [1, 1]` and a `16x2` data shape, we get
  ```
  shape[0] % layout[0] = 16 % 1 = 0 // good
  shape[1] % layout[1] = 2 % 16 = 2 // fail
  ```
  
  We can change the layout to be `[16, 1]`, which would allow the pattern to complete and the distributed code to still be correct, since the lane layout is not used in further coordinate calculations. But `[16, 1]` may be harder for users to reason about by simply looking at the xevm block load description and the sg-level `subgroup_block_io` matrix op.

https://github.com/llvm/llvm-project/pull/165008