[PATCH] D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Artem Belevich via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 16 10:55:19 PDT 2021
tra added a subscriber: wash.
tra added a comment.
Overall the patch looks good. We may still need to tweak intrinsic properties later, but this is a good starting point.
I'm not familiar enough with the new instructions, so my suggestions are based on just reading the PTX spec and there's a good change I didn't get it all right.
If someone from NVIDIA is watching, now would be a good chance to chime in.
@wash, @jholewinski - any comments on how these instructions should be handled?
================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:470
+
+TARGET_BUILTIN(__nvvm_mbarrier_init_b64, "vWi*i", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_mbarrier_init_shared_b64, "vWi*3i", "", AND(SM_80,PTX70))
----------------
I think _b64 is redundant for the `mbarrier` instructions -- that's the only type they accept.
================
Comment at: clang/test/CodeGen/builtins-nvptx.c:682
+
+__device__ void nvvm_mbarrier(long long* addr, __attribute__((address_space(3))) long long* sharedAddr, int count, long long state) {
+ #if __CUDA_ARCH__ >= 800
----------------
I'd add `CHECK-LABEL: <function_name>` here and in other functions.
================
Comment at: clang/test/CodeGen/builtins-nvptx.c:718
+ __nvvm_mbarrier_pending_count_b64(state);
+// // CHECK_PTX70_SM80: call i32 @llvm.nvvm.mbarrier.pending.count.b64
+ #endif
----------------
Extra `//`
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:34-39
+def llvm_i8ptr_ty : LLVMPointerType<llvm_i8_ty>; // i8*
+def llvm_globali8ptr_ty : LLVMQualPointerType<llvm_i8_ty, 1>; // (global)i8*
+def llvm_sharedi8ptr_ty : LLVMQualPointerType<llvm_i8_ty, 3>; // (shared)i8*
+def llvm_i64ptr_ty : LLVMPointerType<llvm_i64_ty>; // i64*
def llvm_anyi64ptr_ty : LLVMAnyPointerType<llvm_i64_ty>; // (space)i64*
+def llvm_sharedi64ptr_ty : LLVMQualPointerType<llvm_i64_ty, 3>; // (shared)i64*
----------------
`llvm_globali8ptr_ty` -> `llvm_global_i8ptr_ty` would make it a bit easier to read.
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:1112-1122
+def int_nvvm_mbarrier_init_b64 : GCCBuiltin<"__nvvm_mbarrier_init_b64">,
+ Intrinsic<[],[llvm_i64ptr_ty, llvm_i32_ty],[IntrConvergent]>;
+def int_nvvm_mbarrier_init_shared_b64 :
+ GCCBuiltin<"__nvvm_mbarrier_init_shared_b64">,
+ Intrinsic<[],[llvm_sharedi64ptr_ty, llvm_i32_ty],[IntrConvergent]>;
+
+def int_nvvm_mbarrier_inval_b64 : GCCBuiltin<"__nvvm_mbarrier_inval_b64">,
----------------
These are probably safe to mark as `IntrWriteMem, IntrArgMemOnly, WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>`
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:1156-1158
+def int_nvvm_mbarrier_pending_count_b64 :
+ GCCBuiltin<"__nvvm_mbarrier_pending_count_b64">,
+ Intrinsic<[llvm_i32_ty],[llvm_i64_ty],[IntrConvergent]>;
----------------
This one can be `IntrNoMem`.
================
Comment at: llvm/test/CodeGen/NVPTX/async-copy.ll:6-8
+define void @asyncwaitgroup() {
+ ; CHECK_PTX32: cp.async.wait_group 8;
+ ; CHECK_PTX64: cp.async.wait_group 8;
----------------
I'd recommend adding a common check label (le'ts say `ALL` and running the tests with `--check-prefixes=ALL,CHECK_PTX64`.
This would allow you to use `ALL` for things that do not change. In this case `; ALL: cp.async.wait_group 8;`
It would also be great to add `ALL-LABEL: <function_name>` for each function to limit the range FileCheck operates for.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100394/new/
https://reviews.llvm.org/D100394
More information about the llvm-commits
mailing list