[Mlir-commits] [mlir] [mlir][vector-to-gpu]: Extend MMA Lowerings (PR #176785)

Mon Jan 26 03:01:43 PST 2026

================
@@ -130,7 +147,13 @@ static std::optional<int64_t> getStaticallyKnownRowStride(ShapedType type) {
   if (failed(memrefType.getStridesAndOffset(strides, offset)) ||
       strides.back() != 1)
     return std::nullopt;
-  int64_t stride = strides[strides.size() - 2];
+
+  int stridePostion = strides.size() - 2;
+  if (!permutationMap.isPermutation()) {
+    if (auto outerResult = dyn_cast<AffineDimExpr>(permutationMap.getResult(0)))
----------------
FranklandJack wrote:

That's a great question.

The permutation map here has two results because we are loading a 2D cooperative matrix, but we are loading from an arbitrary ranked memref e.g. we could have `(d0, d1, .., dn) -> (dn-1, dn)` for a load where the tile is the fastest two moving dimensions or `(d0, d1, .., dn) -> (dn, dn-1)` where the tile is still on the fastest two dimensions but is permuted. In these examples result-0 would be `dn-1` in the first case and `dn` in the second.

This patch is trying to add support for permutation maps like `(d0, d1, ..., dn) -> (dx, dn)` where `0 <= x < n` and we use the stride information of dimension `x` to make sure we stride the matrix load appropriately. So in this case it is the 0th result that will be `dx`. I guess in general it's possible for the permutation map to have a non affine dimension expression at result-0 e.g. `(d0, d1, ..., dn) -> (dx % 4, dn)` in which case this lowering isn't going to work, so that is why we have the check here.

Hopefully that makes some sense, lmk if not and I can try and expand more.

https://github.com/llvm/llvm-project/pull/176785