[PATCH] D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions

Artem Belevich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 16 10:55:19 PDT 2021


tra added a subscriber: wash.
tra added a comment.

Overall the patch looks good. We may still need to tweak intrinsic properties later, but this is a good starting point.

I'm not familiar enough with the new instructions, so my suggestions are based on just reading the PTX spec and there's a good change I didn't get it all right. 
If someone from NVIDIA is watching, now would be a good chance to chime in.
@wash, @jholewinski  - any comments on how these instructions should be handled?



================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:470
+
+TARGET_BUILTIN(__nvvm_mbarrier_init_b64, "vWi*i", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_mbarrier_init_shared_b64, "vWi*3i", "", AND(SM_80,PTX70))
----------------
I think _b64 is redundant for the `mbarrier` instructions -- that's the only type they accept.


================
Comment at: clang/test/CodeGen/builtins-nvptx.c:682
+
+__device__ void nvvm_mbarrier(long long* addr, __attribute__((address_space(3))) long long* sharedAddr, int count, long long state) {
+  #if __CUDA_ARCH__ >= 800
----------------
I'd add `CHECK-LABEL: <function_name>` here and in other functions.


================
Comment at: clang/test/CodeGen/builtins-nvptx.c:718
+  __nvvm_mbarrier_pending_count_b64(state);
+//  // CHECK_PTX70_SM80: call i32 @llvm.nvvm.mbarrier.pending.count.b64
+  #endif
----------------
Extra `//`


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:34-39
+def llvm_i8ptr_ty         : LLVMPointerType<llvm_i8_ty>;         // i8*
+def llvm_globali8ptr_ty   : LLVMQualPointerType<llvm_i8_ty, 1>;  // (global)i8*
+def llvm_sharedi8ptr_ty   : LLVMQualPointerType<llvm_i8_ty, 3>;  // (shared)i8*
+def llvm_i64ptr_ty        : LLVMPointerType<llvm_i64_ty>;        // i64*
 def llvm_anyi64ptr_ty     : LLVMAnyPointerType<llvm_i64_ty>;     // (space)i64*
+def llvm_sharedi64ptr_ty  : LLVMQualPointerType<llvm_i64_ty, 3>; // (shared)i64*
----------------
`llvm_globali8ptr_ty` -> `llvm_global_i8ptr_ty` would make it a bit easier to read.


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:1112-1122
+def int_nvvm_mbarrier_init_b64 : GCCBuiltin<"__nvvm_mbarrier_init_b64">,
+    Intrinsic<[],[llvm_i64ptr_ty, llvm_i32_ty],[IntrConvergent]>;
+def int_nvvm_mbarrier_init_shared_b64 :
+    GCCBuiltin<"__nvvm_mbarrier_init_shared_b64">,
+    Intrinsic<[],[llvm_sharedi64ptr_ty, llvm_i32_ty],[IntrConvergent]>;
+
+def int_nvvm_mbarrier_inval_b64 : GCCBuiltin<"__nvvm_mbarrier_inval_b64">,
----------------
These are probably safe to mark as `IntrWriteMem, IntrArgMemOnly, WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>`


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:1156-1158
+def int_nvvm_mbarrier_pending_count_b64 :
+    GCCBuiltin<"__nvvm_mbarrier_pending_count_b64">,
+    Intrinsic<[llvm_i32_ty],[llvm_i64_ty],[IntrConvergent]>;
----------------
This one can be `IntrNoMem`.


================
Comment at: llvm/test/CodeGen/NVPTX/async-copy.ll:6-8
+define void @asyncwaitgroup() {
+  ; CHECK_PTX32: cp.async.wait_group 8;
+  ; CHECK_PTX64: cp.async.wait_group 8;
----------------
I'd recommend adding a common check label (le'ts say `ALL` and running the tests with `--check-prefixes=ALL,CHECK_PTX64`.

This would allow you to use `ALL` for things that do not change. In this case `; ALL: cp.async.wait_group 8;`
It would also be great to add `ALL-LABEL: <function_name>` for each function to limit the range FileCheck operates for.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100394/new/

https://reviews.llvm.org/D100394



More information about the llvm-commits mailing list