[Openmp-commits] [PATCH] D94731: [libomptarget][nvptx] Call builtins instead of cuda

Thu Jan 14 17:08:26 PST 2021

JonChesterfield added a comment.

Note that this is incremental (on the basis that it's already hard enough to review). Cuda.h is still used for CUDA_VERSION here, but also for the atomic functions and a few libc prototypes. I have the library compiling without it locally.

The complicated stuff that I'd prefer not reimplement here are shuffles and the CUDA_VERSION condition. The shuffles are in `__clang_cuda_intrinsics.h`, which includes `crt/sm_70_rt.hpp` from cuda-dev. Some derived macros that we could use instead of CUDA_VERSION are in `__clang_cuda_runtime_wrapper.h`, which includes lots of pieces of cuda-dev.

`__clang_cuda_device_functions.h` looks standalone. It provides one-line definitions like `__DEVICE__ void __threadfence(void) { __nvvm_membar_gl(); }`. We could use that, though we don't gain much and we would break if it changed to depend on a cuda header.

I don't have a solution to unknown CUDA_VERSION yet. I'd like to derive the branch from the architecture we're compiling for - I think all that matters here is whether the target arch has lockstep execution, which is easier to determine than the cuda library version on another machine.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94731/new/

https://reviews.llvm.org/D94731