[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.
Jonas Hahnfeld via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Nov 8 08:36:07 PST 2017
Hahnfeld added a subscriber: arpith-jacob.
Hahnfeld added a comment.
Comments on the build system
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:27
+# gcc.
+if(CUDA_HOST_COMPILER MATCHES "(clang)|(.*/clang)$")
+
----------------
`MATCHES` evaluates a regex, so `clang` should be enough
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:36
+ endif()
+ set(CUDA_HOST_COMPILER "${LIBOMPTARGET_NVPTX_ALTERNATE_GCC_HOST_COMPILER}")
+endif()
----------------
I think we need a `FORCE` here as above
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:63
+
+ # Get all the compute capabilities the user requested or use SM_35 by default.
+ if(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
----------------
Clang defaults to `sm_30`, should we be compatible here?
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:73-76
+ # Activate RTL message dumps if requested by the user.
+ if(LIBOMPTARGET_NVPTX_DEBUG)
+ set(CUDA_DEBUG -DOMPTARGET_NVPTX_DEBUG=-1 -g --ptxas-options=-v)
+ endif()
----------------
Not used elsewhere and not documented to the user, remove?
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:120
+ endif()
+ set(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER_FROM_TREE clang)
+ else()
----------------
This means that this runtime library is rebuilt whenever the compiler in the tree changes which takes quite some time. Can we maybe only compile the bitcode library if CMAKE_C_COMPILER is Clang?
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:139-146
+ if(NOT LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER)
+ libomptarget_say("Cannot find a CUDA compiler capable of emitting LLVM bitcode.")
+ libomptarget_say("Please configure with flag -DLIBOMPTARGET_NVPTX_CUDA_COMPILER")
+ endif()
+ if(NOT LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER)
+ libomptarget_say("Cannot find a linker capable of linking LLVM bitcode objects.")
+ libomptarget_say("Please configure with flag -DLIBOMPTARGET_NVPTX_BC_LINKER")
----------------
These can only happen in the `else()` cases above, move it there
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:151-158
+ # Decide which ptx version to use. We use 5.0 for CUDA 8 or 4.2 for older versions
+ if(CUDA_VERSION_MAJOR LESS 8)
+ set(CUDA_PTX_VERSION ptx42)
+ elseif(CUDA_VERSION_MAJOR LESS 9)
+ set(CUDA_PTX_VERSION ptx50)
+ else()
+ set(CUDA_PTX_VERSION ptx60)
----------------
Why do we need a newer ptx version, what's wrong with the default? And anyway, we are building LLVM bitcode, that shouldn't involve ptx at all, right?
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:179
+ # This is currently broken
+ set(CUDA_INCLUDES -I/usr/include/powerpc64le-linux-gnu)
+
----------------
No
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:184-186
+ foreach(sm ${nvptx_sm_list})
+ set(CUDA_ARCH ${CUDA_ARCH} --cuda-gpu-arch=sm_${sm})
+ endforeach()
----------------
Do multiple GPU architectures work? I remember that Clang complained about a single output file for multiple architectures...
================
Comment at: libomptarget/deviceRTLs/nvptx/docs/ReductionDesign.txt:2
+
+**Design document for OpenMP reductions on the GPU**
+
----------------
Do we want to have this in here or should this be bundled with the Clang documentation as the compiler has to generate most of this? @arpith-jacob
Repository:
rL LLVM
https://reviews.llvm.org/D14254
More information about the Openmp-commits
mailing list