[Openmp-commits] [PATCH] D65836: Factor architecture dependent code out of loop.cu

Tue Aug 6 17:46:01 PDT 2019

JonChesterfield marked 2 inline comments as done.
JonChesterfield added a comment.

Couple of comments from me inline.

This is working from the branch at https://github.com/ROCm-Developer-Tools/llvm-project. I'm hoping to move the openmp repo incrementally towards a point where it makes few enough nvptx-specific assumptions that adding the amdgcn target only involves a different version of target_impl.h and a few lines of CMake. Currently our repo has six identical files, fourteen different between the src directories. I've tried to pick a representative starting point with loop.cu.

Feedback very welcome.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:19
+
+#define FORCEINLINE __forceinline__ __device__
+
----------------
I would prefer to have declarations in this file and implementations in target_impl.cu. That works for amdgcn but the CMake for nvptx doesn't allow these to be inlined across translation units. This has the advantage that the bitcode library is unchanged. `__inline__` was not sufficient for that with nvptx.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:31
+
+typedef uint32_t __kmpc_impl_lanemask_t;
+
----------------
Several differences between nvptx and amdgcn follow from warp size. The wrapper around `__ffs` allows the source to call an overloaded function instead of `#ifdef` between `__ffs` and `__ffsll`.  `__SHFL_SYNC` is currently defined in `omptarget-nvptx.h`and similarly needs different implementations, deferred for a future diff.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65836/new/

https://reviews.llvm.org/D65836