[PATCH] D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions

Stuart Adams via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 20 10:10:30 PDT 2021


nyalloc marked 13 inline comments as done.
nyalloc added inline comments.


================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:733-736
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_b64, "vv*", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_shared_b64, "vv*", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_noinc_b64, "vv*", "", AND(SM_80,PTX70))
+TARGET_BUILTIN(__nvvm_cp_async_mbarrier_arrive_noinc_shared_b64, "vv*", "", AND(SM_80,PTX70))
----------------
tra wrote:
> For `cp.async.mbarrier` instructions to work we do need to have `mbarrier.init`to init the barrier object and other mbarrier ops to use them inpractice. Perhaps these should be added if/when all mbarrier instructions are added.
> 
> Also, mbarrier object has additional requirements for the pointer (aligned by 8, in __shared__ space): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-size-alignment
> 
> So, technically they all should use `v*3`, but I don't think it will work well in practice -- a lot of things assume that we start with all the pointers bein in generic AS. Nevertheless, we do want to have some sort of safeguards for these builtins.
> 
> Perhaps it would make sense to add a custom type checker and only allow references to `__shared__` variables. 
Well spotted, I'll add the mbarrier intrinsics and builtins to this patch. They were originally going to be put up separately but I'll add them in to this seen as they are related.


================
Comment at: llvm/lib/Target/NVPTX/NVPTXIntrinsics.td:415
+           (ins Int32Regs:$addr, Int32Regs:$count),
+           !strconcat("mbarrier.arrive.noComplete", AddrSpace,
+                      ".b64 $state, [$addr], $count;"),
----------------
tra wrote:
> Does ptxas accept all-lower-case `nocomplete`?
> 
> The `no*C*omplete` stands out as a sore thumb. Capital letters are used in few LLVM intrinsics, so it's not a showstopper, but I think lower case everywhere makes more sense. WDYT?
> 
I 100% agree. For some reason the PTX decide to use camel case here. They also throw some snake case in other names. We can change the LLVM intrinsics to use a more consistent naming scheme, but it will come at the cost that it's no longer a clean mapping of names to the PTX.


================
Comment at: llvm/test/CodeGen/NVPTX/mbarrier.ll:7
+
+define void @barrierinit(i64* %a, i32 %b) {
+; CHECK_PTX32: mbarrier.init.b64 [%r{{[0-9]+}}], %r{{[0-9]+}};
----------------
tra wrote:
> Same here. Please add `-LABEL` checks for all functions.
Will do!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100394/new/

https://reviews.llvm.org/D100394



More information about the llvm-commits mailing list