[llvm] [NVPTX] Add NVPTX intrinsics for TMA copies (PR #95289)
Durgadoss R via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 13 12:53:46 PDT 2024
================
@@ -1448,6 +1448,26 @@ defm int_nvvm_cp_async_ca_shared_global_8 : CP_ASYNC_SHARED_GLOBAL<"8", "ca">;
defm int_nvvm_cp_async_ca_shared_global_16 : CP_ASYNC_SHARED_GLOBAL<"16", "ca">;
defm int_nvvm_cp_async_cg_shared_global_16 : CP_ASYNC_SHARED_GLOBAL<"16", "cg">;
+// TODO(apaszke): Multicast TMA loads
----------------
durga4github wrote:
[Only for the TMA intrinsics part]:
The TMA intrinsic changes look good to me.
We have the intrinsics implemented with tests for cp.async.bulk.tensor (including multicast, cache-hint, im2col variants). I can submit the intrinsic changes by early next week, if we all agree and that timeline is acceptable for us. @apaszke, @jlebar Please let me know what you think.
https://github.com/llvm/llvm-project/pull/95289
More information about the llvm-commits
mailing list