[Mlir-commits] [mlir] [mlir][nvgpu] Add `nvgpu.tma.async.store` (PR #77811)
Guray Ozen
llvmlistbot at llvm.org
Fri Jan 12 03:36:12 PST 2024
================
@@ -661,6 +661,28 @@ def NVGPU_TmaAsyncLoadOp : NVGPU_Op<"tma.async.load", [AttrSizedOperandSegments]
}
+def NVGPU_TmaAsyncStoreOp : NVGPU_Op<"tma.async.store", [AttrSizedOperandSegments]> {
+ let summary = "TMA asynchronous store";
+ let description = [{
+ The Op store a tile memory region from global memory to shared memory by
+ Tensor Memory Access (TMA).
+
+ `$tensorMapDescriptor` is tensor map descriptor which has information about
+ tile shape. The descriptor is created by `nvgpu.tma.create.descriptor`
+ }];
+ let arguments = (ins Arg<AnyMemRef, "", [MemReadAt<0, FullEffect>]>:$src,
+ NVGPU_TensorMapDescriptor:$tensorMapDescriptor,
+ Variadic<Index>:$coordinates,
+ Optional<I1>:$predicate);
+ let assemblyFormat = [{
+ $src `to` $tensorMapDescriptor `[` $coordinates `]`
+ (`,` `predicate` `=` $predicate^)?
+ attr-dict `:` type($src)
+ `->` type($tensorMapDescriptor)
+ }];
+ let hasVerifier = 1;
+}
----------------
grypp wrote:
Valid point. I suggest using these OPs directly from the `NVVM` dialect.
Why? The `NVGPU` dialect acts as a bridge between high-level dialects (such as `memref`, `vector`, and etc.) and the `NVVM` dialect. The `cp.async.bulk.commit.group` is straightforward to use, and introducing it to the NVGPU dialect would only result in replication.
If we need higher-level abstractions (what we have for operations like `mbarrier.group` or `nvgpu.tma.descriptor`) then we can introduce specific Ops in the NVGPU dialect.
What do you think?
https://github.com/llvm/llvm-project/pull/77811
More information about the Mlir-commits
mailing list