[Openmp-commits] [PATCH] D95327: [OpenMP][NVPTX] Rewrite CUDA intrinsics with NVVM intrinsics
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Jan 25 02:33:32 PST 2021
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.
The cuda_intrisics header would need to be substantially refactored to support including from openmp. Doesn't presently seem worthwhile for four straightforward functions.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:82
int32_t Width) {
#if CUDA_VERSION >= 9000
+ return __nvvm_shfl_sync_down_i32(Mask, Var, Delta,
----------------
The expression `((WARPSIZE - Width) << 8) | 0x1f)` occurs on both branches, maybe assign it to a local variable before the #if
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D95327/new/
https://reviews.llvm.org/D95327
More information about the Openmp-commits
mailing list