[Openmp-dev] Target construct not offloading to GPU

Jonas Hahnfeld via Openmp-dev openmp-dev at lists.llvm.org
Fri Oct 5 06:22:42 PDT 2018


Hi,

how did you build your compiler? If you didn't specify 
CLANG_OPENMP_NVPTX_DEFAULT_ARCH Clang will default to sm_35 which 
doesn't run on Volta (sm_70).
Can you post the output of
> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
> -fopenmp-targets="nvptx64-nvidia-cuda"

If it's indeed compiling for sm_35, can you try adding -Xopenmp-target 
-march=sm_70?

Regards,
Jonas

On 2018-10-05 15:09, Cristobal Ortega via Openmp-dev wrote:
> Hello,
> 
> I've been trying to compile a program (source code is attached) that
> offloads to a NVIDIA V-100 GPU with LLVM 7.0 and clang 7.0.
> 
> It seems that the program is successfully compiled, yet nvprof reports
> that "no kernels were profiled".
> The application seems that is running on the CPU (as "top" command
> reports a high usage of CPUs).
> 
> Compilation line that I used:
> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
> -fopenmp-targets="nvptx64-nvidia-cuda"
> 
> Output after executing the binary:
> ==74802== NVPROF is profiling process 74802, command: ./openmp_offload
> 10 10 10000 1
> Number of processors:     160
> Number of devices:        4
> Default device:           0
> Is initial device:        1
> ==74802== Profiling application: ./openmp_offload 10 10 10000 1
> ==74802== Profiling result:
> No kernels were profiled.
>             Type  Time(%)      Time     Calls       Avg Min       Max  
> Name
>       API calls:   99.99%  311.50ms         1  311.50ms  311.50ms
> 311.50ms  cuCtxCreate
>                     0.00%  11.462us         4  2.8650us  1.1450us
> 6.2010us  cuDeviceGetPCIBusId
>                     0.00%  5.4850us         5  1.0970us     387ns
> 3.7770us  cuDeviceGet
>                     0.00%  4.8070us        12     400ns     232ns
> 1.0350us  cuDeviceGetAttribute
>                     0.00%  1.4360us         3     478ns 384ns    
> 640ns  cuDeviceGetCount
> 
> 
> 
> When compiled with GCC, the application does the offloading to the GPU.
> 
> clang information:
> $ clang -v
> Version 6
> Version >= 90 selected
> libdevice.10.bc exists
> clang version 7.0.0 (tags/RELEASE_700/final)
> Target: powerpc64le-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0/bin
> Found candidate GCC installation:
> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
> Selected GCC installation:
> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
> Candidate multilib: .;@m64
> Selected multilib: .;@m64
> Found CUDA installation: /usr/local/cuda-9.2, version 9.2
> 
> 
> Hopefully somebody has an idea on what's going on here.
> If you need any more information to find the issue, let me know.
> Thank you.
> 
> Best,
> -Cristobal
> 
> 
> http://bsc.es/disclaimer
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev


More information about the Openmp-dev mailing list