[llvm] [NVPTX] Add TMA Bulk Copy Intrinsics (PR #138679)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Thu May 8 11:59:00 PDT 2025
================
@@ -616,6 +616,7 @@ Syntax:
.. code-block:: llvm
declare void @llvm.nvvm.cp.async.bulk.shared.cta.to.global(ptr addrspace(1) %dst, ptr addrspace(3) %src, i32 %size, i64 %ch, i1 %flag_ch)
+ declare void @llvm.nvvm.cp.async.bulk.shared.cta.to.global.bytemask(..., i32 %size, i16 %mask, i64 %ch, i1 %flag_ch)
----------------
Artem-B wrote:
I'd make the mask the last argument to match the instruction argument order, though the `flag_ch` makes it impossible to match the instruction args exactly, but at least it would be a bit less surprising.
I wish we had a convenient way to have overloaded intrinsics with different number of arguments, so we don't need an explicit flag. Oh, well.
https://github.com/llvm/llvm-project/pull/138679
More information about the llvm-commits
mailing list