[Mlir-commits] [mlir] [mlir][nvgpu] Add `nvgpu.tma.async.store` (PR #77811)

Fri Jan 12 03:36:12 PST 2024

================
@@ -661,6 +661,28 @@ def NVGPU_TmaAsyncLoadOp : NVGPU_Op<"tma.async.load", [AttrSizedOperandSegments]
 
 }
 
+def NVGPU_TmaAsyncStoreOp : NVGPU_Op<"tma.async.store", [AttrSizedOperandSegments]> {
+  let summary = "TMA asynchronous store";
+  let description = [{
+    The Op store a tile memory region from global memory to shared memory by 
+    Tensor Memory Access (TMA).
+    
+    `$tensorMapDescriptor` is tensor map descriptor which has information about
+    tile shape. The descriptor is created by `nvgpu.tma.create.descriptor`
+  }];  
+  let arguments = (ins  Arg<AnyMemRef, "", [MemReadAt<0, FullEffect>]>:$src,
+                        NVGPU_TensorMapDescriptor:$tensorMapDescriptor,
+                        Variadic<Index>:$coordinates, 
+                        Optional<I1>:$predicate);
+  let assemblyFormat = [{
+      $src `to` $tensorMapDescriptor `[` $coordinates `]`
+      (`,` `predicate` `=` $predicate^)?
+      attr-dict `:` type($src)
+      `->` type($tensorMapDescriptor)
+  }];
+  let hasVerifier = 1;
+}
----------------
grypp wrote:

Valid point. I suggest using these OPs directly from the `NVVM` dialect.

Why? The `NVGPU` dialect acts as a bridge between high-level dialects (such as `memref`, `vector`, and etc.) and the `NVVM` dialect. The `cp.async.bulk.commit.group` is straightforward to use, and introducing it to the NVGPU dialect would only result in replication.

If we need higher-level abstractions (what we have for operations like `mbarrier.group` or `nvgpu.tma.descriptor`) then we can introduce specific Ops in the NVGPU dialect.

What do you think?

https://github.com/llvm/llvm-project/pull/77811