[llvm] [NVPTX] Add TMA Bulk Copy Intrinsics (PR #138679)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Tue May 13 11:51:53 PDT 2025
================
@@ -680,9 +681,12 @@ The '``@llvm.nvvm.cp.async.bulk.shared.cta.to.global``' intrinsic
corresponds to the ``cp.async.bulk.global.shared::cta.*`` set of PTX
instructions. These instructions initiate an asynchronous copy from
shared::cta to global memory. The 32-bit operand ``%size`` specifies
-the amount of memory to be copied and it must be a multiple of 16.
+the amount of memory to be copied (in bytes) and it must be a multiple
+of 16. For the ``.bytemask`` variant, the 16-bit wide mask operand
+specifies whether the i-th byte of each 16-byte wide chunk of source
+data is copied to the destination.
-* The last argument to these intrinsics is a boolean flag
+* The [N-1]th argument to these intrinsics is a boolean flag
----------------
Artem-B wrote:
Perhaps we should drop the reference to the flag position, and just use the name.
https://github.com/llvm/llvm-project/pull/138679
More information about the llvm-commits
mailing list