[Mlir-commits] [mlir] [MLIR][XeGPU] Matrix load/store subgroup distribution (PR #165008)
Artem Kroviakov
llvmlistbot at llvm.org
Wed Oct 29 02:22:50 PDT 2025
================
@@ -562,6 +562,8 @@ class LoadStoreMatrixToXeVMPattern : public OpConversionPattern<OpType> {
VectorType valOrResVecTy = dyn_cast<VectorType>(data.getType());
if (!valOrResVecTy)
valOrResVecTy = VectorType::get(1, data.getType());
+ if (valOrResVecTy.getShape().size() != 1)
----------------
akroviakov wrote:
Added verification. However,
> change the vector<2x16xf32> to <16x2xf32>
I understand the logical reasoning for this in the matrix ops case, but the current distribution does not allow it, considering the "correct" lane layout the block load requires.
We have
```cpp
for (auto [i, dim] : llvm::enumerate(originalType.getShape())) {
if (i < distributionStart)
continue;
// Check if the dimension can be distributed evenly.
if (dim % effectiveLaneLayout[i - distributionStart] != 0)
return failure();
distributedShape[i] = dim / effectiveLaneLayout[i - distributionStart];
}
```
Meaning that given `lane_layout = [1, 16], lane_data = [1, 1]` and a `16x2` data shape, we get
```
shape[0] % layout[0] = 16 % 1 = 0 // good
shape[1] % layout[1] = 2 % 16 = 2 // fail
```
We can change the layout to be `[16, 1]`, which would allow the pattern to complete and the distributed code to still be correct, since the lane layout is not used in further coordinate calculations. But `[16, 1]` may be harder for users to reason about by simply looking at the xevm block load description and the sg-level `subgroup_block_io` matrix op.
https://github.com/llvm/llvm-project/pull/165008
More information about the Mlir-commits
mailing list