[Openmp-dev] Target construct not offloading to GPU

Fri Oct 5 07:42:23 PDT 2018

Yes, now you need to pass 
LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,60,70 (or whatever you like 
to have) to get the runtime libraries.

On 2018-10-05 16:33, Cristobal Ortega wrote:
> I compiled clang with the following line:
> cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
> -DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
> -DGCC_INSTALL_PREFIX=${HOST_GCC}
> -DCMAKE_CXX_LINK_FLAGS="-L${HOST_GCC}/lib64
> -Wl,-rpath,${HOST_GCC}/lib64"
> -DCMAKE_INSTALL_PREFIX=/gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0
> -DGCC_INSTALL_PREFIX=${HOST_GCC}
> 
> Indeed, output with verbose confirms that clang is trying to compile
> with march=sm_35 (output is attached).
> Also, trying to compile the program with
> "-Xopenmp-target -march=sm_70"
> fails with
> "clang-7: error: nvlink command failed with exit code 255 (use -v to
> see invocation)" because of several undefined references (details in
> the attached file).
> 
> So, I'm trying to re-compile clang with
> CLANG_OPENMP_NVPTX_DEFAULT_ARCH but, still, clang is not generating
> the library 'libomptarget-nvptx-sm_70.bc'. Therefore, compilation
> doesn't complete.
> Where should this library be? I have one bc file in
> "clang_src/test/Driver/Inputs/libomptarget/" but it's for sm_20
> (libomptarget-nvptx-sm_20.bc).

That's an empty file for testing. To get Bitcode libraries, you need to 
compile the OpenMP project using Clang.
(I've started putting together step-by-step instructions how to build 
LLVM/Clang 7.0 for OpenMP offloading, I'll send a link to the mailing 
list once ready.)

> This is how I'm trying to compile clang:
> cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
> -DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
> -DGCC_INSTALL_PREFIX=${HOST_GCC} -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=70

Nit: Should this be -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_70?

> Yet, in the compilation process, clang complains about the missing
> library for sm_70.
> 
> Do I need to pass some flag to LLVM too?
> 
> Best,
> -Cristobal
> 
> 
> 
> On 10/05/2018 03:22 PM, Jonas Hahnfeld wrote:
>> Hi,
>> 
>> how did you build your compiler? If you didn't specify 
>> CLANG_OPENMP_NVPTX_DEFAULT_ARCH Clang will default to sm_35 which 
>> doesn't run on Volta (sm_70).
>> Can you post the output of
>>> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>>> -fopenmp-targets="nvptx64-nvidia-cuda"
>> 
>> If it's indeed compiling for sm_35, can you try adding -Xopenmp-target 
>> -march=sm_70?
>> 
>> Regards,
>> Jonas
>> 
>> On 2018-10-05 15:09, Cristobal Ortega via Openmp-dev wrote:
>>> Hello,
>>> 
>>> I've been trying to compile a program (source code is attached) that
>>> offloads to a NVIDIA V-100 GPU with LLVM 7.0 and clang 7.0.
>>> 
>>> It seems that the program is successfully compiled, yet nvprof 
>>> reports
>>> that "no kernels were profiled".
>>> The application seems that is running on the CPU (as "top" command
>>> reports a high usage of CPUs).
>>> 
>>> Compilation line that I used:
>>> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>>> -fopenmp-targets="nvptx64-nvidia-cuda"
>>> 
>>> Output after executing the binary:
>>> ==74802== NVPROF is profiling process 74802, command: 
>>> ./openmp_offload
>>> 10 10 10000 1
>>> Number of processors:     160
>>> Number of devices:        4
>>> Default device:           0
>>> Is initial device:        1
>>> ==74802== Profiling application: ./openmp_offload 10 10 10000 1
>>> ==74802== Profiling result:
>>> No kernels were profiled.
>>>             Type  Time(%)      Time     Calls       Avg Min       
>>> Max  Name
>>>       API calls:   99.99%  311.50ms         1  311.50ms 311.50ms
>>> 311.50ms  cuCtxCreate
>>>                     0.00%  11.462us         4  2.8650us 1.1450us
>>> 6.2010us  cuDeviceGetPCIBusId
>>>                     0.00%  5.4850us         5  1.0970us 387ns
>>> 3.7770us  cuDeviceGet
>>>                     0.00%  4.8070us        12     400ns 232ns
>>> 1.0350us  cuDeviceGetAttribute
>>>                     0.00%  1.4360us         3     478ns 384ns
>>> 640ns  cuDeviceGetCount
>>> 
>>> 
>>> 
>>> When compiled with GCC, the application does the offloading to the 
>>> GPU.
>>> 
>>> clang information:
>>> $ clang -v
>>> Version 6
>>> Version >= 90 selected
>>> libdevice.10.bc exists
>>> clang version 7.0.0 (tags/RELEASE_700/final)
>>> Target: powerpc64le-unknown-linux-gnu
>>> Thread model: posix
>>> InstalledDir: /gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0/bin
>>> Found candidate GCC installation:
>>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>>> Selected GCC installation:
>>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>>> Candidate multilib: .;@m64
>>> Selected multilib: .;@m64
>>> Found CUDA installation: /usr/local/cuda-9.2, version 9.2
>>> 
>>> 
>>> Hopefully somebody has an idea on what's going on here.
>>> If you need any more information to find the issue, let me know.
>>> Thank you.
>>> 
>>> Best,
>>> -Cristobal
>>> 
>>> 
>>> http://bsc.es/disclaimer
>>> _______________________________________________
>>> Openmp-dev mailing list
>>> Openmp-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> 
> 
> 
> http://bsc.es/disclaimer