[Openmp-dev] Target construct not offloading to GPU

Cristobal Ortega via Openmp-dev openmp-dev at lists.llvm.org
Fri Oct 5 07:33:18 PDT 2018


I compiled clang with the following line:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc 
-DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++ 
-DGCC_INSTALL_PREFIX=${HOST_GCC} 
-DCMAKE_CXX_LINK_FLAGS="-L${HOST_GCC}/lib64 
-Wl,-rpath,${HOST_GCC}/lib64" 
-DCMAKE_INSTALL_PREFIX=/gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0 
-DGCC_INSTALL_PREFIX=${HOST_GCC}

Indeed, output with verbose confirms that clang is trying to compile 
with march=sm_35 (output is attached).
Also, trying to compile the program with
"-Xopenmp-target -march=sm_70"
fails with
"clang-7: error: nvlink command failed with exit code 255 (use -v to see 
invocation)" because of several undefined references (details in the 
attached file).

So, I'm trying to re-compile clang with CLANG_OPENMP_NVPTX_DEFAULT_ARCH 
but, still, clang is not generating the library 
'libomptarget-nvptx-sm_70.bc'. Therefore, compilation doesn't complete.
Where should this library be? I have one bc file in 
"clang_src/test/Driver/Inputs/libomptarget/" but it's for sm_20 
(libomptarget-nvptx-sm_20.bc).

This is how I'm trying to compile clang:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc 
-DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++ 
-DGCC_INSTALL_PREFIX=${HOST_GCC} -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=70
Yet, in the compilation process, clang complains about the missing 
library for sm_70.

Do I need to pass some flag to LLVM too?

Best,
-Cristobal



On 10/05/2018 03:22 PM, Jonas Hahnfeld wrote:
> Hi,
>
> how did you build your compiler? If you didn't specify 
> CLANG_OPENMP_NVPTX_DEFAULT_ARCH Clang will default to sm_35 which 
> doesn't run on Volta (sm_70).
> Can you post the output of
>> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>> -fopenmp-targets="nvptx64-nvidia-cuda"
>
> If it's indeed compiling for sm_35, can you try adding -Xopenmp-target 
> -march=sm_70?
>
> Regards,
> Jonas
>
> On 2018-10-05 15:09, Cristobal Ortega via Openmp-dev wrote:
>> Hello,
>>
>> I've been trying to compile a program (source code is attached) that
>> offloads to a NVIDIA V-100 GPU with LLVM 7.0 and clang 7.0.
>>
>> It seems that the program is successfully compiled, yet nvprof reports
>> that "no kernels were profiled".
>> The application seems that is running on the CPU (as "top" command
>> reports a high usage of CPUs).
>>
>> Compilation line that I used:
>> clang -v  -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>> -fopenmp-targets="nvptx64-nvidia-cuda"
>>
>> Output after executing the binary:
>> ==74802== NVPROF is profiling process 74802, command: ./openmp_offload
>> 10 10 10000 1
>> Number of processors:     160
>> Number of devices:        4
>> Default device:           0
>> Is initial device:        1
>> ==74802== Profiling application: ./openmp_offload 10 10 10000 1
>> ==74802== Profiling result:
>> No kernels were profiled.
>>             Type  Time(%)      Time     Calls       Avg Min       
>> Max  Name
>>       API calls:   99.99%  311.50ms         1  311.50ms 311.50ms
>> 311.50ms  cuCtxCreate
>>                     0.00%  11.462us         4  2.8650us 1.1450us
>> 6.2010us  cuDeviceGetPCIBusId
>>                     0.00%  5.4850us         5  1.0970us 387ns
>> 3.7770us  cuDeviceGet
>>                     0.00%  4.8070us        12     400ns 232ns
>> 1.0350us  cuDeviceGetAttribute
>>                     0.00%  1.4360us         3     478ns 384ns
>> 640ns  cuDeviceGetCount
>>
>>
>>
>> When compiled with GCC, the application does the offloading to the GPU.
>>
>> clang information:
>> $ clang -v
>> Version 6
>> Version >= 90 selected
>> libdevice.10.bc exists
>> clang version 7.0.0 (tags/RELEASE_700/final)
>> Target: powerpc64le-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0/bin
>> Found candidate GCC installation:
>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>> Selected GCC installation:
>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>> Candidate multilib: .;@m64
>> Selected multilib: .;@m64
>> Found CUDA installation: /usr/local/cuda-9.2, version 9.2
>>
>>
>> Hopefully somebody has an idea on what's going on here.
>> If you need any more information to find the issue, let me know.
>> Thank you.
>>
>> Best,
>> -Cristobal
>>
>>
>> http://bsc.es/disclaimer
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev



http://bsc.es/disclaimer
-------------- next part --------------
$ clang -v    -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_70
Version 6
Version >= 90 selected
libdevice.10.bc exists
clang version 7.0.0 (tags/RELEASE_700/final)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/user/pkg/clang/7.0.0/bin
Found candidate GCC installation: /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
Selected GCC installation: /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda-9.2, version 9.2
Version 6
Version >= 90 selected
libdevice.10.bc exists
Searching for libomptarget-nvptx-sm_70.bc
GpuArch: sm_70
LibraryPath: /home/user/pkg/clang/7.0.0/lib
LibraryPath: /usr/local/cuda-9.2/extras/CUPTI/lib64
LibraryPath: /usr/local/cuda-9.2/nvvm/lib64
LibraryPath: /usr/local/cuda-9.2/lib64
LibraryPath: /usr/lib64/nvidia/xorg
LibraryPath: /usr/lib64/nvidia
LibraryPath: /usr/local/cuda-9.2/nvvm/libdevice
LibraryPath: /gpfs/apps/POWER9/LLVM/7.0.0/GCC/lib
clang-7: warning: No library 'libomptarget-nvptx-sm_70.bc' found in the default clang lib directory or in LIBRARY_PATH. Expect degraded performance due to no inlining of runtime functions on target devices. [-Wopenmp-target]
 "/home/user/pkg/clang/7.0.0/bin/clang-7" -cc1 -triple powerpc64le-unknown-linux-gnu -emit-llvm-bc -emit-llvm-uselists -disable-free -main-file-name openmp_offload.c -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu ppc64le -mfloat-abi hard -target-abi elfv2 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -resource-dir /home/user/pkg/clang/7.0.0/lib/clang/7.0.0 -c-isystem /usr/local/cuda-9.2/extras/CUPTI/include -c-isystem /usr/local/cuda-9.2/nvvm/include -c-isystem /usr/local/cuda-9.2/include -c-isystem /home/user/pkg/clang/7.0.0/include -c-isystem /home/user/pkg/llvm/7.0.0/include -c-isystem /home/user/pkg/gcc/8.2.0/include -cxx-isystem /usr/local/cuda-9.2/extras/CUPTI/include -cxx-isystem /usr/local/cuda-9.2/nvvm/include -cxx-isystem /usr/local/cuda-9.2/include -cxx-isystem /home/user/pkg/clang/7.0.0/include -cxx-isystem /home/user/pkg/llvm/7.0.0/include -cxx-isystem /home/user/pkg/gcc/8.2.0/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -fdebug-compilation-dir /home/user/matmul_cpu_gpu -ferror-limit 19 -fmessage-length 103 -fopenmp -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/openmp_offload-76dc4d.bc -x c openmp_offload.c -fopenmp-targets=nvptx64-nvidia-cuda -faddrsig
clang -cc1 version 7.0.0 based upon LLVM 7.0.0 default target powerpc64le-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/local/include"
ignoring duplicate directory "/home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include"
ignoring duplicate directory "/usr/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/cuda-9.2/extras/CUPTI/include
 /usr/local/cuda-9.2/nvvm/include
 /usr/local/cuda-9.2/include
 /home/user/pkg/clang/7.0.0/include
 /home/user/pkg/llvm/7.0.0/include
 /home/user/pkg/gcc/8.2.0/include
 /usr/local/include
 /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include
 /usr/include
End of search list.
 "/home/user/pkg/clang/7.0.0/bin/clang-7" -cc1 -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-linux-gnu -S -disable-free -main-file-name openmp_offload.c -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -no-integrated-as -fuse-init-array -mlink-cuda-bitcode /usr/local/cuda-9.2/nvvm/libdevice/libdevice.10.bc -target-feature +ptx61 -target-cpu sm_70 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /home/user/pkg/clang/7.0.0/lib/clang/7.0.0 -c-isystem /usr/local/cuda-9.2/extras/CUPTI/include -c-isystem /usr/local/cuda-9.2/nvvm/include -c-isystem /usr/local/cuda-9.2/include -c-isystem /home/user/pkg/clang/7.0.0/include -c-isystem /home/user/pkg/llvm/7.0.0/include -c-isystem /home/user/pkg/gcc/8.2.0/include -cxx-isystem /usr/local/cuda-9.2/extras/CUPTI/include -cxx-isystem /usr/local/cuda-9.2/nvvm/include -cxx-isystem /usr/local/cuda-9.2/include -cxx-isystem /home/user/pkg/clang/7.0.0/include -cxx-isystem /home/user/pkg/llvm/7.0.0/include -cxx-isystem /home/user/pkg/gcc/8.2.0/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -fno-dwarf-directory-asm -fdebug-compilation-dir /home/user/matmul_cpu_gpu -ferror-limit 19 -fmessage-length 103 -fopenmp -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/openmp_offload-19c687.s -x c openmp_offload.c -fopenmp-is-device -fopenmp-host-ir-file-path /tmp/openmp_offload-76dc4d.bc
clang -cc1 version 7.0.0 based upon LLVM 7.0.0 default target powerpc64le-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/local/include"
ignoring duplicate directory "/home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include"
ignoring duplicate directory "/usr/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/cuda-9.2/extras/CUPTI/include
 /usr/local/cuda-9.2/nvvm/include
 /usr/local/cuda-9.2/include
 /home/user/pkg/clang/7.0.0/include
 /home/user/pkg/llvm/7.0.0/include
 /home/user/pkg/gcc/8.2.0/include
 /usr/local/include
 /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include
 /usr/include
End of search list.
 "/usr/local/cuda-9.2/bin/ptxas" -m64 -O3 -v --gpu-name sm_70 --output-file /tmp/openmp_offload-ffaaae.cubin /tmp/openmp_offload-19c687.s -c
ptxas info    : 96 bytes gmem
ptxas info    : Compiling entry function '__omp_offloading_33_4fd06b8_target_matmul_l18' for 'sm_70'
ptxas info    : Function properties for __omp_offloading_33_4fd06b8_target_matmul_l18
    40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 47 registers, 416 bytes cmem[0]
 "/usr/local/cuda-9.2/bin/nvlink" -o /tmp/openmp_offload-285066.out -v -arch sm_70 -L/usr/local/cuda-9.2/extras/CUPTI/lib64 -L/usr/local/cuda-9.2/nvvm/lib64 -L/usr/local/cuda-9.2/lib64 -L/usr/lib64/nvidia/xorg -L/usr/lib64/nvidia -L/usr/local/cuda-9.2/nvvm/libdevice -L/gpfs/apps/POWER9/LLVM/7.0.0/GCC/lib -L/home/user/pkg/clang/7.0.0/lib -lomptarget-nvptx /tmp/openmp_offload-ffaaae.cubin
nvlink error   : Undefined reference to '__kmpc_spmd_kernel_init' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error   : Undefined reference to '__kmpc_data_sharing_init_stack_spmd' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_init_4' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_fini' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error   : Undefined reference to '__kmpc_global_thread_num' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error   : Undefined reference to '__kmpc_spmd_kernel_deinit' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink info    : 96 bytes gmem
nvlink info    : Function properties for '__omp_offloading_33_4fd06b8_target_matmul_l18':
nvlink info    : used 47 registers, 40 stack, 0 bytes smem, 416 bytes cmem[0], 0 bytes lmem
 "/home/user/pkg/clang/7.0.0/bin/clang-7" -cc1 -triple powerpc64le-unknown-linux-gnu -emit-obj -disable-free -main-file-name openmp_offload.c -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu ppc64le -mfloat-abi hard -target-abi elfv2 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -resource-dir /home/user/pkg/clang/7.0.0/lib/clang/7.0.0 -O3 -fdebug-compilation-dir /home/user/matmul_cpu_gpu -ferror-limit 19 -fmessage-length 103 -fopenmp -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/openmp_offload-b11422.o -x ir /tmp/openmp_offload-76dc4d.bc -fopenmp-targets=nvptx64-nvidia-cuda -faddrsig
clang -cc1 version 7.0.0 based upon LLVM 7.0.0 default target powerpc64le-unknown-linux-gnu
 "/usr/bin/ld" --hash-style=gnu --no-add-needed --eh-frame-hdr -m elf64lppc -dynamic-linker /lib64/ld64.so.2 -o openmp_offload /lib/../lib64/crt1.o /lib/../lib64/crti.o /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/crtbegin.o -L/home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0 -L/home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/../../.. -L/home/user/pkg/clang/7.0.0/bin/../lib -L/lib -L/usr/lib /tmp/openmp_offload-b11422.o -L/usr/local/cuda-9.2/extras/CUPTI/lib64 -L/usr/local/cuda-9.2/nvvm/lib64 -L/usr/local/cuda-9.2/lib64 -L/usr/lib64/nvidia/xorg -L/usr/lib64/nvidia -L/usr/local/cuda-9.2/nvvm/libdevice -L/gpfs/apps/POWER9/LLVM/7.0.0/GCC/lib -lomp -lomptarget -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/crtend.o /lib/../lib64/crtn.o -T /tmp/openmp_offload-da8a6c.lk
clang-7: error: nvlink command failed with exit code 255 (use -v to see invocation)



More information about the Openmp-dev mailing list