[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.

Wed Nov 8 08:36:07 PST 2017

Hahnfeld added a subscriber: arpith-jacob.
Hahnfeld added a comment.

Comments on the build system

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:27
+# gcc.
+if(CUDA_HOST_COMPILER MATCHES "(clang)|(.*/clang)$")
+
----------------
`MATCHES` evaluates a regex, so `clang` should be enough

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:36
+  endif()
+  set(CUDA_HOST_COMPILER "${LIBOMPTARGET_NVPTX_ALTERNATE_GCC_HOST_COMPILER}")
+endif()
----------------
I think we need a `FORCE` here as above

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:63
+
+  # Get all the compute capabilities the user requested or use SM_35 by default.
+  if(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
----------------
Clang defaults to `sm_30`, should we be compatible here?

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:73-76
+  # Activate RTL message dumps if requested by the user.
+  if(LIBOMPTARGET_NVPTX_DEBUG)
+    set(CUDA_DEBUG -DOMPTARGET_NVPTX_DEBUG=-1 -g --ptxas-options=-v)
+  endif()
----------------
Not used elsewhere and not documented to the user, remove?

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:120
+      endif()
+      set(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER_FROM_TREE clang)
+    else()
----------------
This means that this runtime library is rebuilt whenever the compiler in the tree changes which takes quite some time. Can we maybe only compile the bitcode library if CMAKE_C_COMPILER is Clang?

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:139-146
+    if(NOT LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER)
+      libomptarget_say("Cannot find a CUDA compiler capable of emitting LLVM bitcode.")
+      libomptarget_say("Please configure with flag -DLIBOMPTARGET_NVPTX_CUDA_COMPILER")
+    endif()
+    if(NOT LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER)
+      libomptarget_say("Cannot find a linker capable of linking LLVM bitcode objects.")
+      libomptarget_say("Please configure with flag -DLIBOMPTARGET_NVPTX_BC_LINKER")
----------------
These can only happen in the `else()` cases above, move it there

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:151-158
+      # Decide which ptx version to use. We use 5.0 for CUDA 8 or 4.2 for older versions
+      if(CUDA_VERSION_MAJOR LESS 8)
+	set(CUDA_PTX_VERSION ptx42)
+      elseif(CUDA_VERSION_MAJOR LESS 9)
+	set(CUDA_PTX_VERSION ptx50)
+      else()
+        set(CUDA_PTX_VERSION ptx60)
----------------
Why do we need a newer ptx version, what's wrong with the default? And anyway, we are building LLVM bitcode, that shouldn't involve ptx at all, right?

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:179
+      # This is currently broken
+      set(CUDA_INCLUDES -I/usr/include/powerpc64le-linux-gnu)
+
----------------
No

================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:184-186
+        foreach(sm ${nvptx_sm_list})
+          set(CUDA_ARCH ${CUDA_ARCH} --cuda-gpu-arch=sm_${sm})
+        endforeach()
----------------
Do multiple GPU architectures work? I remember that Clang complained about a single output file for multiple architectures...

================
Comment at: libomptarget/deviceRTLs/nvptx/docs/ReductionDesign.txt:2
+
+**Design document for OpenMP reductions on the GPU** 
+
----------------
Do we want to have this in here or should this be bundled with the Clang documentation as the compiler has to generate most of this? @arpith-jacob 

Repository:
  rL LLVM

https://reviews.llvm.org/D14254