[Openmp-commits] [PATCH] D95313: [WIP] Move part of nvptx devicertl under clang

Sun Jan 24 10:41:09 PST 2021

JonChesterfield added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1204
+    {
+      auto *CTC = static_cast<const toolchains::CudaToolChain *>(
+          C.getSingleOffloadToolChain<Action::OFK_Cuda>());
----------------
Logic very like this could pick out a second, small devicertl bitcode library

================
Comment at: clang/lib/Headers/openmp_wrappers/__clang_openmp_devicertl_cuda_ge90.h:18
+#ifdef _OPENMP
+#define DEVICE __attribute__((used))
+#else
----------------
linkonce and linkonce_odr can both be discarded, but these symbols need to survive until devicertl is linked

================
Comment at: clang/lib/Headers/openmp_wrappers/__clang_openmp_devicertl_cuda_ge90.h:39
+  int WARPSIZE = 32;
+  int tmp = ((WARPSIZE - Width) << 8) | 0x1f;
+  return __nvvm_shfl_sync_down_i32(Mask, Var, Delta, tmp);
----------------
calling into intrinsics would remove this messy expression, if we can work out how to reliably compile parts of the cuda sdk as openmp

these functions probably can't be instantiated on the host, so when changing the devicertl build over to openmp we will also need to guard these with variant / macro

================
Comment at: openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip:49
 // Warp vote function
-DEVICE __kmpc_impl_lanemask_t __kmpc_impl_activemask() {
+EXTERN __kmpc_impl_lanemask_t __kmpc_impl_activemask() {
   return __builtin_amdgcn_read_exec();
----------------
Changing from c++ to c name mangling seems fine here. We don't consistently use one or the other in the devicertl. Using c mangling is simpler if the implementation is in a header.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95313/new/

https://reviews.llvm.org/D95313