rengolin wrote: The code looks good to me. It's a GPU dialect like the others, it has load/store, MMA/SIMD ops, sub-group descriptors, etc. @adam-smnk https://github.com/llvm/llvm-project/pull/78483