[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Apr 20 10:52:31 PDT 2021
tra added inline comments.
================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468
+TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_add_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_and_b32, "iii", "", SM_80)
----------------
steffenlarsen wrote:
> tra wrote:
> > steffenlarsen wrote:
> > > tra wrote:
> > > > Instead of creating one builtin per integer variant, can we use a more generic builtin `__nvvm_redux_sync_add_i`, similar to how we handle `__nvvm_atom_add_gen_i` ?
> > > >
> > > What gives me pause is that a for atomic minimum there are both `__nvvm_atom_min_gen_i` and `__nvvm_atom_min_gen_ui` to distinguish between signed and unsigned. What makes the difference?
> > >
> > > That noted, I'll happily rename the builtins to be more in line with the other builtins. `__nvvm_redux_sync_*_i` and `__nvvm_redux_sync_*_ui` maybe?
> > > What gives me pause is that a for atomic minimum there are both __nvvm_atom_min_gen_i and __nvvm_atom_min_gen_ui to distinguish between signed and unsigned. What makes the difference?
> >
> > Good point. We do not need unsigned variant for `add`. We do need explicit signed and unsigned variants ad LLVM IR integer types do not take signedness into account, and the underlying min/max instructions do. Maybe, rename min_i/min_ui -> min/umin as LLVM does with atomics?
> >
> > We may skip the `_i` suffix on logical ops as they only apply to integers anyways.
> >
> Sorry, I completely missed your responses.
>
> > Maybe, rename min_i/min_ui -> min/umin as LLVM does with atomics?
>
> Sounds good to me. Would there also be umax and uadd?
>
> > We may skip the _i suffix on logical ops as they only apply to integers anyways.
>
> Absolutely. I'll make that happen!
> Would there also be umax and uadd?
You will need `umax`, but there's no need for `uadd` as 2-complement addition is the same for signed/unsigned.
E.g `umax(0xffffffff, 1) -> 0xffffffff`, `max(-1,1) -> 1`, give different answers, but `uadd(0xffffffff, 1) -> 0` and `add(-1,1) -> 0`.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100124/new/
https://reviews.llvm.org/D100124
More information about the cfe-commits
mailing list