[Openmp-commits] [PATCH] D94745: [OpenMP][WIP] Build the deviceRTLs with OpenMP instead of target dependent language

Tue Jan 19 14:14:13 PST 2021

tianshilei1992 marked 4 inline comments as done.
tianshilei1992 added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/common/debug.h:132

+#pragma omp declare target
 template <typename... Arguments>
----------------
JonChesterfield wrote:
> Not sure fine grained pragma omp declare target is worth the noise, we can put a declare at the top of the file (before the includes) and an end declare at the end with the same semantics.
`clang` has a bug before that if there is global variable, it crashed. So at the very beginning, it crashed on `extern FILE *stdin`. Now it has been fixed so I guess we could do that.

================
Comment at: openmp/libomptarget/deviceRTLs/common/target_atomic.h:22
+// __clang_cuda_device_functions.h.
+template <typename T> T atomicAdd(T *, T);
+template <typename T> T atomicInc(T *, T);
----------------
JonChesterfield wrote:
> Not keen. This is common/, shouldn't be calling functions from cuda.
> 
> How about we move target_atomic.h under nvptx, and implement __kmpc_atomic_add etc there?
> 
> The amdgpu target_atomic.h would be simpler if it moved under amdgpu, as presently it implements atomicAdd in terms of another function, and could elide that layer entirely if __kmpc_atomic_add called the intrinsic.
> 
> We could also implement these in terms of clang intrinsics (i.e., identically as done by amdgpu now), and the result would then work for both platforms.
I'm okay with your proposal. I'd do it in another patch after this one can work because I want minimal changes in this patch to make everything work, and then start to optimize them. BTW, the template functions doesn't call CUDA here. They're just declarations, and will be lowered to variant of functions based on their type like `fatomicAdd`, which are defined in eventually in different implementations.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt:100
+                 -fopenmp -Xclang -fopenmp-is-device
+                 -D__CUDACC__
                  -I${devicertl_base_directory}
----------------
JonChesterfield wrote:
> This is suspect - why does openmp want to claim to be cuda?
To maintain minimal change. There is an include wrapped into a macro in `interface.h`. For AMD GPU, it includes one header in AMD implementation, and for CUDA device, it includes a header in NVPTX implementation.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt:103
+                 -I${devicertl_nvptx_directory}/src
+                 -I${CUDA_TOOLKIT_ROOT_DIR}/include
+                 -fdeclspec
----------------
JonChesterfield wrote:
> i guess this survives until the last use of cuda.h is dropped
yep

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:21
+
+// FIXME: Forward declaration
+extern "C" {
----------------
JonChesterfield wrote:
> shouldn't these be in the cuda header above, and also in the clang-injected cuda headers?
All functions that can be called by CUDA are declared as `__device__`. In `declare target`, we cannot call those functions. Instead, we need them to be in a format of OpenMP, so those in `cuda.h` cannot be used.  If not those CUDA version macros, we can drop the header.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:27
+// FIXME: This attribute doesn't work here
+#define SHARED __attribute__((shared))
+#define ALIGN(N) __attribute__((aligned(N)))
----------------
JonChesterfield wrote:
> example implementation using omp allocators at D93135 
yep, will do it later.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94745/new/

https://reviews.llvm.org/D94745