[Openmp-dev] Target construct not offloading to GPU
Cristobal Ortega via Openmp-dev
openmp-dev at lists.llvm.org
Fri Oct 5 06:09:00 PDT 2018
I've been trying to compile a program (source code is attached) that
offloads to a NVIDIA V-100 GPU with LLVM 7.0 and clang 7.0.
It seems that the program is successfully compiled, yet nvprof reports
that "no kernels were profiled".
The application seems that is running on the CPU (as "top" command
reports a high usage of CPUs).
Compilation line that I used:
clang -v -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
Output after executing the binary:
==74802== NVPROF is profiling process 74802, command: ./openmp_offload
10 10 10000 1
Number of processors: 160
Number of devices: 4
Default device: 0
Is initial device: 1
==74802== Profiling application: ./openmp_offload 10 10 10000 1
==74802== Profiling result:
No kernels were profiled.
Type Time(%) Time Calls Avg Min Max Name
API calls: 99.99% 311.50ms 1 311.50ms 311.50ms
0.00% 11.462us 4 2.8650us 1.1450us
0.00% 5.4850us 5 1.0970us 387ns
0.00% 4.8070us 12 400ns 232ns
0.00% 1.4360us 3 478ns 384ns
When compiled with GCC, the application does the offloading to the GPU.
$ clang -v
Version >= 90 selected
clang version 7.0.0 (tags/RELEASE_700/final)
Thread model: posix
Found candidate GCC installation:
Selected GCC installation:
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda-9.2, version 9.2
Hopefully somebody has an idea on what's going on here.
If you need any more information to find the issue, let me know.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1761 bytes
Desc: not available
More information about the Openmp-dev