[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.
Jonas Hahnfeld via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Dec 7 14:25:22 PST 2017
Hahnfeld added a comment.
Some comments, still mostly about the build system for the `bclib`
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:73-76
+ # Activate RTL message dumps if requested by the user.
+ if(LIBOMPTARGET_NVPTX_DEBUG)
+ set(CUDA_DEBUG -DOMPTARGET_NVPTX_DEBUG=-1 -g --ptxas-options=-v)
+ endif()
----------------
grokos wrote:
> Hahnfeld wrote:
> > Not used elsewhere and not documented to the user, remove?
> I think we should keep this one. I added the LIBOMPTARGET_NVPTX_DEBUG flag to the list of NVPTX Cmake options in `Build_With_Cmake.txt`.
Needs also to be defined as `CACHE` variable.
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:63-67
+ if(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
+ set(CUDA_ARCH ${CUDA_ARCH} -gencode arch=compute_${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY},code=sm_${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY})
+ else()
+ set(CUDA_ARCH -arch sm_35)
+ endif()
----------------
1. No need for different cases if we don't support building multiple architectures for now.
2. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY` needs to be defined as `CACHE` variable, with a default of `35` in this case.
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:104-105
+ # Trace we require something from the tree.
+ set(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER_FROM_TREE "")
+ set(LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER_FROM_TREE "")
+
----------------
Remove, as discussed we should not depend on in-tree components being built.
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:109-111
+ elseif(NOT ${LIBOMPTARGET_STANDALONE_BUILD} AND
+ EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/../../../../../tools/clang/)
+ if(${CMAKE_C_COMPILER_ID} STREQUAL "Clang")
----------------
This does not depend on a Clang checkout anymore, remove
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:116-122
+ else()
+ find_program(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER clang++)
+ if(NOT LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER)
+ libomptarget_say("Cannot find a CUDA compiler capable of emitting LLVM bitcode.")
+ libomptarget_say("Please configure with flag -DLIBOMPTARGET_NVPTX_CUDA_COMPILER")
+ endif()
+ endif()
----------------
(Probably this case will go then too...)
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:124-139
+ if (NOT LIBOMPTARGET_NVPTX_BC_LINKER STREQUAL "")
+ set(LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER ${LIBOMPTARGET_NVPTX_BC_LINKER})
+ elseif(NOT ${LIBOMPTARGET_STANDALONE_BUILD})
+ if(MSVC)
+ set(LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER ${LLVM_TOOLS_BINARY_DIR}/llvm-link.exe)
+ else()
+ set(LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER ${LLVM_TOOLS_BINARY_DIR}/llvm-link)
----------------
1. Do not depend on in-tree components.
2. I think you should be able to do the "link" step with Clang as well, can you please test this?
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:171-175
+ if(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
+ set(CUDA_ARCH ${CUDA_ARCH} --cuda-gpu-arch=sm_${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY})
+ else()
+ set(CUDA_ARCH --cuda-gpu-arch=sm_35)
+ endif()
----------------
No need for different cases if we don't support multiple architectures.
================
Comment at: libomptarget/deviceRTLs/nvptx/CMakeLists.txt:200
+ -o ${CMAKE_CURRENT_BINARY_DIR}/libomptarget-nvptx.bc ${bc_files}
+ DEPENDS ${bc_files} ${LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER_FROM_TREE}
+ COMMENT "Linking LLVM bitcode libomptarget-nvptx.bc"
----------------
Should not depend on `LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER_FROM_TREE` as discussed
================
Comment at: libomptarget/deviceRTLs/nvptx/src/counter_group.h:17-20
+#include <stdlib.h>
+#include <stdio.h>
+
+#include <cuda.h>
----------------
grokos wrote:
> Hahnfeld wrote:
> > Needed?
> >
> > This file should probably include `option.h` which defines `Counter`
> Right, `cuda.h` is not needed. I've included `option.h`
`stdlib.h` and `stdio.h` are still there...
================
Comment at: libomptarget/deviceRTLs/nvptx/src/libcall.cu:17
+
+#define TICK ((double) 1.0 / 745000000.0)
+
----------------
grokos wrote:
> Hahnfeld wrote:
> > grokos wrote:
> > > Hahnfeld wrote:
> > > > grokos wrote:
> > > > > This is where the hard-coded definition of the GPU clock frequency has been moved. I'll try to find a viable solution to make the library find the clock frequency dynamically.
> > > > Yes, this doesn't sound like a good idea to have that hard-coded...
> > > Getting the clock frequency in device code cannot be done. We can only query it on the host.
> > >
> > > I tried having a device global variable TICK and set it via a call to `cuModuleGetGlobal(..., "TICK")` from the CUDA plugin (the plugin can query the frequency via `cudaGetDeviceProperties`). This solution did not work because `libomptarget-nvptx.a` is a static library so the clock frequency should be set at compile time. We cannot use dynamic libraries (yet?) because the CUDA toolchain does not support dynamic linking.
> > >
> > > Eventually I implemented `omp_get_wtime()' using the `%globaltimer` register. That's the only viable option. If the register gets removed in the future, there's nothing we can do.
> > >
> > > `omp_get_wtick()` is probably not needed. No one will ever query the time between clock ticks from within device code... I left the function there so that the linker can find it but it prints a message that this functionality is not implemented.
> > >
> > > I leave this issue open for further discussion anyway.
> > That's not a justification, I really doubt that anyone will call `omp_get_wtime()` either.
> >
> > Why not just return 1 nanosecond (the resolution of `%globaltimer`) for `omp_get_wtick` as I proposed?
> OK, I got your point. It's better than having nothing. I've changed the code to return 1ns.
`TICK` can go away then (hopefully)
================
Comment at: libomptarget/deviceRTLs/nvptx/src/libcall.cu:29
+ asm("mov.u64 %0, %%globaltimer;" : "=l"(nsecs));
+ double rc = (double) nsecs / 1E9;
+ PRINT(LD_IO, "call omp_get_wtime() returns %g\n", rc);
----------------
Please reuse `TIMER_PRECISION`
================
Comment at: libomptarget/deviceRTLs/nvptx/src/libcall.cu:401-402
+EXTERN int omp_test_lock(omp_lock_t *lock) {
+ // int atomicCAS(int* address, int compare, int val);
+ // (old == compare ? val : old)
+ int compare = UNSET;
----------------
What's this comment about?
Repository:
rL LLVM
https://reviews.llvm.org/D14254
More information about the Openmp-commits
mailing list