[llvm] [NVPTX] Add TMA Bulk Copy Intrinsics (PR #138679)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Tue May 13 11:51:53 PDT 2025


================
@@ -680,9 +681,12 @@ The '``@llvm.nvvm.cp.async.bulk.shared.cta.to.global``' intrinsic
 corresponds to the ``cp.async.bulk.global.shared::cta.*`` set of PTX
 instructions. These instructions initiate an asynchronous copy from
 shared::cta to global memory. The 32-bit operand ``%size`` specifies
-the amount of memory to be copied and it must be a multiple of 16.
+the amount of memory to be copied (in bytes) and it must be a multiple
+of 16. For the ``.bytemask`` variant, the 16-bit wide mask operand
+specifies whether the i-th byte of each 16-byte wide chunk of source
+data is copied to the destination.
 
-* The last argument to these intrinsics is a boolean flag
+* The [N-1]th argument to these intrinsics is a boolean flag
----------------
Artem-B wrote:

Perhaps we should drop the reference to the flag position, and just use the name.

https://github.com/llvm/llvm-project/pull/138679


More information about the llvm-commits mailing list