[llvm] [NVPTX] Add TMA Bulk Copy Intrinsics (PR #138679)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Thu May 8 11:59:00 PDT 2025


================
@@ -616,6 +616,7 @@ Syntax:
 .. code-block:: llvm
 
   declare void @llvm.nvvm.cp.async.bulk.shared.cta.to.global(ptr addrspace(1) %dst, ptr addrspace(3) %src, i32 %size, i64 %ch, i1 %flag_ch)
+  declare void @llvm.nvvm.cp.async.bulk.shared.cta.to.global.bytemask(..., i32 %size, i16 %mask, i64 %ch, i1 %flag_ch)
----------------
Artem-B wrote:

I'd make the mask the last argument to match the instruction argument order, though the `flag_ch` makes it impossible to match the instruction args exactly, but at least it would be a bit less surprising.

I wish we had a convenient way to have overloaded intrinsics with different number of arguments, so we don't need an explicit flag. Oh, well.


https://github.com/llvm/llvm-project/pull/138679


More information about the llvm-commits mailing list