[Openmp-dev] Target construct not offloading to GPU

Fri Oct 5 07:49:20 PDT 2018

Both LLVM and clang complains about that flag:
CMake Warning:
   Manually-specified variables were not used by the project:

     LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES

It seems that developers disabled this flag some time ago 
(https://github.com/clang-ykt/clang/issues/11).
I haven't found any other way to generate the runtime libraries, though.

Best,
-Cristobal

On 10/05/2018 04:42 PM, Jonas Hahnfeld wrote:
> Yes, now you need to pass 
> LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,60,70 (or whatever you like 
> to have) to get the runtime libraries.
>
> On 2018-10-05 16:33, Cristobal Ortega wrote:
>> I compiled clang with the following line:
>> cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
>> -DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
>> -DGCC_INSTALL_PREFIX=${HOST_GCC}
>> -DCMAKE_CXX_LINK_FLAGS="-L${HOST_GCC}/lib64
>> -Wl,-rpath,${HOST_GCC}/lib64"
>> -DCMAKE_INSTALL_PREFIX=/gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0
>> -DGCC_INSTALL_PREFIX=${HOST_GCC}
>>
>> Indeed, output with verbose confirms that clang is trying to compile
>> with march=sm_35 (output is attached).
>> Also, trying to compile the program with
>> "-Xopenmp-target -march=sm_70"
>> fails with
>> "clang-7: error: nvlink command failed with exit code 255 (use -v to
>> see invocation)" because of several undefined references (details in
>> the attached file).
>>
>> So, I'm trying to re-compile clang with
>> CLANG_OPENMP_NVPTX_DEFAULT_ARCH but, still, clang is not generating
>> the library 'libomptarget-nvptx-sm_70.bc'. Therefore, compilation
>> doesn't complete.
>> Where should this library be? I have one bc file in
>> "clang_src/test/Driver/Inputs/libomptarget/" but it's for sm_20
>> (libomptarget-nvptx-sm_20.bc).
>
> That's an empty file for testing. To get Bitcode libraries, you need 
> to compile the OpenMP project using Clang.
> (I've started putting together step-by-step instructions how to build 
> LLVM/Clang 7.0 for OpenMP offloading, I'll send a link to the mailing 
> list once ready.)
>
>> This is how I'm trying to compile clang:
>> cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
>> -DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
>> -DGCC_INSTALL_PREFIX=${HOST_GCC} -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=70
>
> Nit: Should this be -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_70?
>
>> Yet, in the compilation process, clang complains about the missing
>> library for sm_70.
>>
>> Do I need to pass some flag to LLVM too?
>>
>> Best,
>> -Cristobal
>>
>>
>>
>> On 10/05/2018 03:22 PM, Jonas Hahnfeld wrote:
>>> Hi,
>>>
>>> how did you build your compiler? If you didn't specify 
>>> CLANG_OPENMP_NVPTX_DEFAULT_ARCH Clang will default to sm_35 which 
>>> doesn't run on Volta (sm_70).
>>> Can you post the output of
>>>> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>>>> -fopenmp-targets="nvptx64-nvidia-cuda"
>>>
>>> If it's indeed compiling for sm_35, can you try adding 
>>> -Xopenmp-target -march=sm_70?
>>>
>>> Regards,
>>> Jonas
>>>
>>> On 2018-10-05 15:09, Cristobal Ortega via Openmp-dev wrote:
>>>> Hello,
>>>>
>>>> I've been trying to compile a program (source code is attached) that
>>>> offloads to a NVIDIA V-100 GPU with LLVM 7.0 and clang 7.0.
>>>>
>>>> It seems that the program is successfully compiled, yet nvprof reports
>>>> that "no kernels were profiled".
>>>> The application seems that is running on the CPU (as "top" command
>>>> reports a high usage of CPUs).
>>>>
>>>> Compilation line that I used:
>>>> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>>>> -fopenmp-targets="nvptx64-nvidia-cuda"
>>>>
>>>> Output after executing the binary:
>>>> ==74802== NVPROF is profiling process 74802, command: ./openmp_offload
>>>> 10 10 10000 1
>>>> Number of processors:     160
>>>> Number of devices:        4
>>>> Default device:           0
>>>> Is initial device:        1
>>>> ==74802== Profiling application: ./openmp_offload 10 10 10000 1
>>>> ==74802== Profiling result:
>>>> No kernels were profiled.
>>>>             Type  Time(%)      Time     Calls       Avg Min       
>>>> Max  Name
>>>>       API calls:   99.99%  311.50ms         1  311.50ms 311.50ms
>>>> 311.50ms  cuCtxCreate
>>>>                     0.00%  11.462us         4  2.8650us 1.1450us
>>>> 6.2010us  cuDeviceGetPCIBusId
>>>>                     0.00%  5.4850us         5  1.0970us 387ns
>>>> 3.7770us  cuDeviceGet
>>>>                     0.00%  4.8070us        12     400ns 232ns
>>>> 1.0350us  cuDeviceGetAttribute
>>>>                     0.00%  1.4360us         3     478ns 384ns
>>>> 640ns  cuDeviceGetCount
>>>>
>>>>
>>>>
>>>> When compiled with GCC, the application does the offloading to the 
>>>> GPU.
>>>>
>>>> clang information:
>>>> $ clang -v
>>>> Version 6
>>>> Version >= 90 selected
>>>> libdevice.10.bc exists
>>>> clang version 7.0.0 (tags/RELEASE_700/final)
>>>> Target: powerpc64le-unknown-linux-gnu
>>>> Thread model: posix
>>>> InstalledDir: /gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0/bin
>>>> Found candidate GCC installation:
>>>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>>>> Selected GCC installation:
>>>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>>>> Candidate multilib: .;@m64
>>>> Selected multilib: .;@m64
>>>> Found CUDA installation: /usr/local/cuda-9.2, version 9.2
>>>>
>>>>
>>>> Hopefully somebody has an idea on what's going on here.
>>>> If you need any more information to find the issue, let me know.
>>>> Thank you.
>>>>
>>>> Best,
>>>> -Cristobal
>>>>
>>>>
>>>> http://bsc.es/disclaimer
>>>> _______________________________________________
>>>> Openmp-dev mailing list
>>>> Openmp-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>>
>>
>> http://bsc.es/disclaimer

http://bsc.es/disclaimer