[Openmp-commits] [PATCH] D94731: [libomptarget][nvptx] Call builtins instead of cuda
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Jan 14 17:08:26 PST 2021
JonChesterfield added a comment.
Note that this is incremental (on the basis that it's already hard enough to review). Cuda.h is still used for CUDA_VERSION here, but also for the atomic functions and a few libc prototypes. I have the library compiling without it locally.
The complicated stuff that I'd prefer not reimplement here are shuffles and the CUDA_VERSION condition. The shuffles are in `__clang_cuda_intrinsics.h`, which includes `crt/sm_70_rt.hpp` from cuda-dev. Some derived macros that we could use instead of CUDA_VERSION are in `__clang_cuda_runtime_wrapper.h`, which includes lots of pieces of cuda-dev.
`__clang_cuda_device_functions.h` looks standalone. It provides one-line definitions like `__DEVICE__ void __threadfence(void) { __nvvm_membar_gl(); }`. We could use that, though we don't gain much and we would break if it changed to depend on a cuda header.
I don't have a solution to unknown CUDA_VERSION yet. I'd like to derive the branch from the architecture we're compiling for - I think all that matters here is whether the target arch has lockstep execution, which is easier to determine than the cuda library version on another machine.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94731/new/
https://reviews.llvm.org/D94731
More information about the Openmp-commits
mailing list