[Mlir-commits] [mlir] [nvgpu][mlir] Fix wgmma store offset (PR #154581)

Wed Aug 20 10:35:16 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-mlir-gpu

Author: lonely eagle (linuxlonelyeagle)

<details>
<summary>Changes</summary>

When M is greater than 64 in wgmma, the accumulator type should consist of multiple struct types. Calculating the offset of memref on the row based on the number of elements in the struct type is incorrect. For fp16, this offset is 64.

---
Full diff: https://github.com/llvm/llvm-project/pull/154581.diff


1 Files Affected:

- (modified) mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp (+1-2) 


``````````diff

diff --git a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
index c6c5ab356f256..fffcb2aedafee 100644
--- a/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
+++ b/mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
@@ -1623,11 +1623,10 @@ struct NVGPUWarpgroupMmaStoreOpLowering
     Value matriDValue = adaptor.getMatrixD();
     auto stype = cast<LLVM::LLVMStructType>(matriDValue.getType());
     for (auto [idx, matrixD] : llvm::enumerate(stype.getBody())) {
-      auto structType = cast<LLVM::LLVMStructType>(matrixD);
       Value innerStructValue =
           LLVM::ExtractValueOp::create(b, matriDValue, idx);
       storeFragmentedMatrix(b, innerStructValue, op.getDstMemref(), offset);
-      offset += structType.getBody().size();
+      offset += kWgmmaSizeM;
     }
     rewriter.eraseOp(op);
     return success();

``````````

</details>


https://github.com/llvm/llvm-project/pull/154581