[Openmp-commits] [PATCH] D94745: [OpenMP][WIP] Build the deviceRTLs with OpenMP instead of target dependent language

Tue Jan 19 14:04:14 PST 2021

JonChesterfield added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/common/debug.h:132

+#pragma omp declare target
 template <typename... Arguments>
----------------
Not sure fine grained pragma omp declare target is worth the noise, we can put a declare at the top of the file (before the includes) and an end declare at the end with the same semantics.

================
Comment at: openmp/libomptarget/deviceRTLs/common/omptargeti.h:16

+#pragma omp declare target
+
----------------
if we put them at the start/end of the source, don't need this noise in any of the headers

================
Comment at: openmp/libomptarget/deviceRTLs/common/target_atomic.h:22
+// __clang_cuda_device_functions.h.
+template <typename T> T atomicAdd(T *, T);
+template <typename T> T atomicInc(T *, T);
----------------
Not keen. This is common/, shouldn't be calling functions from cuda.

How about we move target_atomic.h under nvptx, and implement __kmpc_atomic_add etc there?

The amdgpu target_atomic.h would be simpler if it moved under amdgpu, as presently it implements atomicAdd in terms of another function, and could elide that layer entirely if __kmpc_atomic_add called the intrinsic.

We could also implement these in terms of clang intrinsics (i.e., identically as done by amdgpu now), and the result would then work for both platforms.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt:100
+                 -fopenmp -Xclang -fopenmp-is-device
+                 -D__CUDACC__
                  -I${devicertl_base_directory}
----------------
This is suspect - why does openmp want to claim to be cuda?

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt:103
+                 -I${devicertl_nvptx_directory}/src
+                 -I${CUDA_TOOLKIT_ROOT_DIR}/include
+                 -fdeclspec
----------------
i guess this survives until the last use of cuda.h is dropped

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:21
+
+// FIXME: Forward declaration
+extern "C" {
----------------
shouldn't these be in the cuda header above, and also in the clang-injected cuda headers?

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:27
+// FIXME: This attribute doesn't work here
+#define SHARED __attribute__((shared))
+#define ALIGN(N) __attribute__((aligned(N)))
----------------
example implementation using omp allocators at D93135 

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:79
+// cannot use #include <cmath> directly.
+uint32_t min(uint32_t, uint32_t);
+}
----------------
let's just use x < y ? x : y, as it'll codegen to the same thing or better anyway

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94745/new/

https://reviews.llvm.org/D94745