[Mlir-commits] [mlir] [mlir][rocdl] Add GlobalLoadAsyncToLDS operation (PR #165374)

Fri Oct 31 02:15:53 PDT 2025

================
@@ -692,6 +692,39 @@ def ROCDL_GlobalLoadLDSOp :
   }];
 }
 
+//===---------------------------------------------------------------------===//
+// Async load to LDS intrinsic (available in GFX1250)
+//===---------------------------------------------------------------------===//
+
+class ROCDL_GlobalLoadAsyncToLDSOp<string mnemonic> :
+  ROCDL_IntrOp<mnemonic, [], [], [], 0, 0, 1, 0, [2, 3], ["offset", "aux"]> {
+  dag args = (ins Arg<ROCDLGlobalBuffer, "", [MemRead]>:$globalPtr,
+                 Arg<ROCDLBufferLDS, "", [MemWrite]>:$ldsPtr,
+                 I32Attr:$offset,
+                 I32Attr:$aux);
+  let arguments = !con(args, baseArgs);
+  let assemblyFormat = [{
+    $globalPtr `,`  $ldsPtr `,` $offset `,` $aux
+    attr-dict `:` type($globalPtr)
+  }];
+  let description = [{
+    Loads data asynchronously from a global memory pointer to a local data
+    store (LDS) pointer.
+
+    Available on gfx1250+.
+  }];
+  let extraClassDefinition = [{
+    ::llvm::SmallVector<::mlir::Value> $cppClass::getAccessedOperands() {
+      return {getGlobalPtr(), getLdsPtr()};
+    }
+  }];
+}
+
+def ROCDL_GlobalLoadAsyncToLDSB8Op : ROCDL_GlobalLoadAsyncToLDSOp<"global.load.async.to.lds.b8">;
+def ROCDL_GlobalLoadAsyncToLDSB32Op : ROCDL_GlobalLoadAsyncToLDSOp<"global.load.async.to.lds.b32">;
+def ROCDL_GlobalLoadAsyncToLDSB64Op : ROCDL_GlobalLoadAsyncToLDSOp<"global.load.async.to.lds.b64">;
+def ROCDL_GlobalLoadAsyncToLDSB128Op : ROCDL_GlobalLoadAsyncToLDSOp<"global.load.async.to.lds.b128">;
----------------
pabloantoniom wrote:

Wow, that's some cool tablegen black magic I didn't know about!

I have used `foreach` as you suggested, but I changed it slightly, the iterator should be `bits`, not `bytes`, and the description was not showing the right string so I also fixed that.

I have checked the generated documentation and it correctly shows the 4 ops with the right description. What I don't like is that they are not generated in order, (first goes b128, then b32, then b64, then b8).

https://github.com/llvm/llvm-project/pull/165374