[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Thu Apr 22 02:06:44 PDT 2021

steffenlarsen added a comment.

> Do you know if any existing code already uses the __nvvm_* builtins for cp.async? In other words, does nvcc provide them already or is it something we're free to name as we wish? I do not see any relevant intrinsics mentioned in NVVM IR spec: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html and I don't think NVCC's builtins are publicly documented anywhere.

I don't know of any yet. We will be using these in the relatively near future, but we can still change them no problem. However, the intrinsic and builtin naming for NVVM and NVPTX seems a bit inconsistent so it may be a long discussion (or maybe not.)

================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468
+TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_add_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_and_b32, "iii", "", SM_80)
----------------
tra wrote:
> steffenlarsen wrote:
> > tra wrote:
> > > steffenlarsen wrote:
> > > > tra wrote:
> > > > > steffenlarsen wrote:
> > > > > > tra wrote:
> > > > > > > Instead of creating one builtin per integer variant, can we use a more generic builtin `__nvvm_redux_sync_add_i`, similar to how we handle `__nvvm_atom_add_gen_i` ?
> > > > > > > 
> > > > > > What gives me pause is that a for atomic minimum there are both `__nvvm_atom_min_gen_i` and `__nvvm_atom_min_gen_ui` to distinguish between signed and unsigned. What makes the difference?
> > > > > > 
> > > > > > That noted, I'll happily rename the builtins to be more in line with the other builtins. `__nvvm_redux_sync_*_i` and `__nvvm_redux_sync_*_ui` maybe?
> > > > > > What gives me pause is that a for atomic minimum there are both __nvvm_atom_min_gen_i and __nvvm_atom_min_gen_ui to distinguish between signed and unsigned. What makes the difference?
> > > > > 
> > > > > Good point. We do not need unsigned variant for `add`.  We do need explicit signed and unsigned variants ad LLVM IR integer types do not take signedness into account, and the underlying min/max instructions do.  Maybe, rename min_i/min_ui -> min/umin as LLVM does with atomics? 
> > > > > 
> > > > > We may skip the `_i` suffix on logical ops as they only apply to integers anyways.
> > > > > 
> > > > Sorry, I completely missed your responses.
> > > > 
> > > > > Maybe, rename min_i/min_ui -> min/umin as LLVM does with atomics?
> > > > 
> > > > Sounds good to me. Would there also be umax and uadd?
> > > > 
> > > > > We may skip the _i suffix on logical ops as they only apply to integers anyways.
> > > > 
> > > > Absolutely. I'll make that happen! 
> > > > Would there also be umax and uadd?
> > > 
> > > You will need `umax`, but there's no need for `uadd` as 2-complement addition is the same for signed/unsigned.
> > > 
> > > E.g `umax(0xffffffff, 1) -> 0xffffffff`, `max(-1,1) -> 1`, give different answers, but `uadd(0xffffffff, 1) -> 0` and `add(-1,1) -> 0`.
> > Ah, of course. Though I do wonder as to the motivation of having signed and unsigned add variants in PTX. I'll drop the unsigned variant.
> It's for uniformity sake, I guess. All arithmetic ops in PTX operate on sXX/uXX arguments, though not all of them have to.
> 
I bet you're right. Thanks for the help. 😄 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100124/new/

https://reviews.llvm.org/D100124