[PATCH] D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Artem Belevich via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 13 12:22:19 PDT 2021
tra added inline comments.
================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:733-736
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_b64, "vv*", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_shared_b64, "vv*", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_noinc_b64, "vv*", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_noinc_shared_b64, "vv*", "", AND(SM_80,PTX70))
----------------
For `cp.async.mbarrier` instructions to work we do need to have `mbarrier.init`to init the barrier object and other mbarrier ops to use them inpractice. Perhaps these should be added if/when all mbarrier instructions are added.
Also, mbarrier object has additional requirements for the pointer (aligned by 8, in __shared__ space): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-size-alignment
So, technically they all should use `v*3`, but I don't think it will work well in practice -- a lot of things assume that we start with all the pointers bein in generic AS. Nevertheless, we do want to have some sort of safeguards for these builtins.
Perhaps it would make sense to add a custom type checker and only allow references to `__shared__` variables.
================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:744
+TARGET_BUILTIN(__nvvm_cp_async_commit_group, "v", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_wait_group, "vi", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_wait_all, "v", "", AND(SM_80,PTX70))
----------------
I think this should be "vIi" as the instruction only accepts an integer constant as an argument.
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:1072-1087
+def int_nvvm_cp_async_ca_shared_global_4 :
+ GCCBuiltin<"__nvvm_cp_async_ca_shared_global_4">,
+ Intrinsic<[],[llvm_i8ptr_ty, llvm_i8ptr_ty],[],
+ "llvm.nvvm.cp.async.ca.shared.global.4">;
+def int_nvvm_cp_async_ca_shared_global_8 :
+ GCCBuiltin<"__nvvm_cp_async_ca_shared_global_8">,
+ Intrinsic<[],[llvm_i8ptr_ty, llvm_i8ptr_ty],[],
----------------
These all should probably have `IntrArgMemOnly` attribute and, possibly `WriteOnly<0>,ReadOnly<1>` and, maybe `NoAlias` on both arguments, too, because src/dest are in different nonoverlapping address spaces.
Also, the PTX spec is not clear on whether `cp.async` expects to see the pointer arguments in generic AS, or do they need to be converted to shared/global ones first. Normally, the instructions with `.shared` or `.global` in the name expect specific address space. If that's the case here, then we may need to use qualified pointer types here, too.
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:1093-1095
+def int_nvvm_cp_async_wait_group :
+ GCCBuiltin<"__nvvm_cp_async_wait_group">,
+ Intrinsic<[],[llvm_i32_ty],[]>;
----------------
This should have `ImmArg` as the argument must be an immediate value.
================
Comment at: llvm/lib/Target/NVPTX/NVPTXIntrinsics.td:353-356
+def CP_ASYNC_WAIT_GROUP :
+ NVPTXInst<(outs), (ins Int32Regs:$n), "cp.async.wait_group $n;",
+ [(int_nvvm_cp_async_wait_group Int32Regs:$n)]>,
+ Requires<[hasPTX70, hasSM80]>;
----------------
This does not look right. If I read the PTX spec correctly, the argument can't be a register.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-wait-group
> Operand N is an integer constant.
================
Comment at: llvm/test/CodeGen/NVPTX/async-copy.ll:3
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_80 | FileCheck %s -check-prefix=x64
+
+declare void @llvm.nvvm.cp.async.mbarrier.arrive.b64(i64* %a)
----------------
No tests for `cp.async.wait*` and `cp.async.commit_group`
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100394/new/
https://reviews.llvm.org/D100394
More information about the llvm-commits
mailing list