[Mlir-commits] [mlir] [mlir][vector-to-gpu]: Extend MMA Lowerings (PR #176785)

Mon Feb 2 02:16:14 PST 2026

FranklandJack wrote:

> > I have a feeling this might be fixing the wrong problem. I'm not super familiar with the maths but I'm guessing that usually some kind of loop reordering would be possible. Jack mentioned that for some hardware non-contiguous loads are not supported so that would be another reason not to land this.
> 
> It makes sense to me. If loop reordering is possible in the use cases, it would be better to do so. It is also benefit to bring performance on the hardware.

I'm not totally sure what this has to do with loop reordering? We have a `vector.transfer_read` instruction here with a strided minor identity map and this patch adds support for lowering this correctly to a `gpu.subgroup_mma_load_matrix` operation. It seems like we are solving different problems here so I'd argue it's still useful to have this functionality upstream? 

https://github.com/llvm/llvm-project/pull/176785