[Openmp-commits] [PATCH] D95060: [libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify

Wed Jan 20 10:53:42 PST 2021

JonChesterfield added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/common/src/reduction.cu:187

+INLINE static uint32_t kmpcMin(uint32_t x, uint32_t y) {
+  return x < y ? x : y;
----------------
tianshilei1992 wrote:
> JonChesterfield wrote:
> > tianshilei1992 wrote:
> > > JonChesterfield wrote:
> > > > Also considered 'min', but expect it to collide with one of the included headers, and 'min_uint32_t' but couldn't see a pretty way to write that inTheNamingConvention.
> > > > 
> > > > INLINE static is a slightly strange construct, chosen to match the functions above.
> > > The semantics of `min(x, y)` is same as `x < y ? x : y` but the implementation could be very different. For some data type (especially integers) we don't even need a branch here. I don't think it's good for performance to just use branch. If we know the type is `uint32_t`, then we can use bitwise operation here.
> > This will codegen as selec (https://godbolt.org/z/8fxPo7)
> > 
> > ```
> > define dso_local i64 @min(i64 %0, i64 %1) local_unnamed_addr #0 {
> >   %3 = icmp ult i64 %0, %1
> >   %4 = select i1 %3, i64 %0, i64 %1
> >   ret i64 %4
> > }
> > ```
> > 
> > which the backend will have an easier time with than decoding the equivalent bit shuffling.
> > 
> Okay. If this can solve the problem, `__kmpc_impl_min` can be removed as well.
Yep, __kmpc_impl_min is gone.

I've kept kmpcMin as a local function in preference to open coding x<y?x:y at each call site because it's shorter and fractionally easier to read than checking whether the argument order is min or max

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95060/new/

https://reviews.llvm.org/D95060