[Openmp-commits] [PATCH] D95327: [OpenMP][NVPTX] Rewrite CUDA intrinsics with NVVM intrinsics

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Jan 25 02:33:32 PST 2021


JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

The cuda_intrisics header would need to be substantially refactored to support including from openmp. Doesn't presently seem worthwhile for four straightforward functions.



================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:82
                                           int32_t Width) {
 #if CUDA_VERSION >= 9000
+  return __nvvm_shfl_sync_down_i32(Mask, Var, Delta,
----------------
The expression `((WARPSIZE - Width) << 8) | 0x1f)` occurs on both branches, maybe assign it to a local variable before the #if


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95327/new/

https://reviews.llvm.org/D95327



More information about the Openmp-commits mailing list