[Openmp-commits] [PATCH] D65836: Factor architecture dependent code out of loop.cu
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Aug 6 17:46:01 PDT 2019
JonChesterfield marked 2 inline comments as done.
JonChesterfield added a comment.
Couple of comments from me inline.
This is working from the branch at https://github.com/ROCm-Developer-Tools/llvm-project. I'm hoping to move the openmp repo incrementally towards a point where it makes few enough nvptx-specific assumptions that adding the amdgcn target only involves a different version of target_impl.h and a few lines of CMake. Currently our repo has six identical files, fourteen different between the src directories. I've tried to pick a representative starting point with loop.cu.
Feedback very welcome.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:19
+
+#define FORCEINLINE __forceinline__ __device__
+
----------------
I would prefer to have declarations in this file and implementations in target_impl.cu. That works for amdgcn but the CMake for nvptx doesn't allow these to be inlined across translation units. This has the advantage that the bitcode library is unchanged. `__inline__` was not sufficient for that with nvptx.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:31
+
+typedef uint32_t __kmpc_impl_lanemask_t;
+
----------------
Several differences between nvptx and amdgcn follow from warp size. The wrapper around `__ffs` allows the source to call an overloaded function instead of `#ifdef` between `__ffs` and `__ffsll`. `__SHFL_SYNC` is currently defined in `omptarget-nvptx.h`and similarly needs different implementations, deferred for a future diff.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D65836/new/
https://reviews.llvm.org/D65836
More information about the Openmp-commits
mailing list