[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Artem Belevich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 8 10:34:27 PDT 2021


tra added inline comments.


================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468
+TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_add_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_and_b32, "iii", "", SM_80)
----------------
Instead of creating one builtin per integer variant, can we use a more generic builtin `__nvvm_redux_sync_add_i`, similar to how we handle `__nvvm_atom_add_gen_i` ?



================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:4103
+// redux.sync.add.u32 dst, src, membermask;
+def int_nvvm_redux_sync_add_u32 : GCCBuiltin<"__nvvm_redux_sync_add_u32">,
+  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
----------------
This could also be consolidated into an overloaded intrinsic operating on `llvm_anyint_ty`


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:4105
+  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
+            [IntrConvergent, IntrNoMem]>;
+
----------------
Similar to `shfl`, these intrinsics probably need `IntrInaccessibleMemOnly` as they exchange data with other threads.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100124/new/

https://reviews.llvm.org/D100124



More information about the llvm-commits mailing list