[Openmp-dev] Target construct not offloading to GPU
Cristobal Ortega via Openmp-dev
openmp-dev at lists.llvm.org
Fri Oct 5 07:33:18 PDT 2018
I compiled clang with the following line:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
-DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
-DGCC_INSTALL_PREFIX=${HOST_GCC}
-DCMAKE_CXX_LINK_FLAGS="-L${HOST_GCC}/lib64
-Wl,-rpath,${HOST_GCC}/lib64"
-DCMAKE_INSTALL_PREFIX=/gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0
-DGCC_INSTALL_PREFIX=${HOST_GCC}
Indeed, output with verbose confirms that clang is trying to compile
with march=sm_35 (output is attached).
Also, trying to compile the program with
"-Xopenmp-target -march=sm_70"
fails with
"clang-7: error: nvlink command failed with exit code 255 (use -v to see
invocation)" because of several undefined references (details in the
attached file).
So, I'm trying to re-compile clang with CLANG_OPENMP_NVPTX_DEFAULT_ARCH
but, still, clang is not generating the library
'libomptarget-nvptx-sm_70.bc'. Therefore, compilation doesn't complete.
Where should this library be? I have one bc file in
"clang_src/test/Driver/Inputs/libomptarget/" but it's for sm_20
(libomptarget-nvptx-sm_20.bc).
This is how I'm trying to compile clang:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
-DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
-DGCC_INSTALL_PREFIX=${HOST_GCC} -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=70
Yet, in the compilation process, clang complains about the missing
library for sm_70.
Do I need to pass some flag to LLVM too?
Best,
-Cristobal
On 10/05/2018 03:22 PM, Jonas Hahnfeld wrote:
> Hi,
>
> how did you build your compiler? If you didn't specify
> CLANG_OPENMP_NVPTX_DEFAULT_ARCH Clang will default to sm_35 which
> doesn't run on Volta (sm_70).
> Can you post the output of
>> clang -v -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>> -fopenmp-targets="nvptx64-nvidia-cuda"
>
> If it's indeed compiling for sm_35, can you try adding -Xopenmp-target
> -march=sm_70?
>
> Regards,
> Jonas
>
> On 2018-10-05 15:09, Cristobal Ortega via Openmp-dev wrote:
>> Hello,
>>
>> I've been trying to compile a program (source code is attached) that
>> offloads to a NVIDIA V-100 GPU with LLVM 7.0 and clang 7.0.
>>
>> It seems that the program is successfully compiled, yet nvprof reports
>> that "no kernels were profiled".
>> The application seems that is running on the CPU (as "top" command
>> reports a high usage of CPUs).
>>
>> Compilation line that I used:
>> clang -v -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
>> -fopenmp-targets="nvptx64-nvidia-cuda"
>>
>> Output after executing the binary:
>> ==74802== NVPROF is profiling process 74802, command: ./openmp_offload
>> 10 10 10000 1
>> Number of processors: 160
>> Number of devices: 4
>> Default device: 0
>> Is initial device: 1
>> ==74802== Profiling application: ./openmp_offload 10 10 10000 1
>> ==74802== Profiling result:
>> No kernels were profiled.
>> Type Time(%) Time Calls Avg Min
>> Max Name
>> API calls: 99.99% 311.50ms 1 311.50ms 311.50ms
>> 311.50ms cuCtxCreate
>> 0.00% 11.462us 4 2.8650us 1.1450us
>> 6.2010us cuDeviceGetPCIBusId
>> 0.00% 5.4850us 5 1.0970us 387ns
>> 3.7770us cuDeviceGet
>> 0.00% 4.8070us 12 400ns 232ns
>> 1.0350us cuDeviceGetAttribute
>> 0.00% 1.4360us 3 478ns 384ns
>> 640ns cuDeviceGetCount
>>
>>
>>
>> When compiled with GCC, the application does the offloading to the GPU.
>>
>> clang information:
>> $ clang -v
>> Version 6
>> Version >= 90 selected
>> libdevice.10.bc exists
>> clang version 7.0.0 (tags/RELEASE_700/final)
>> Target: powerpc64le-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0/bin
>> Found candidate GCC installation:
>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>> Selected GCC installation:
>> /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
>> Candidate multilib: .;@m64
>> Selected multilib: .;@m64
>> Found CUDA installation: /usr/local/cuda-9.2, version 9.2
>>
>>
>> Hopefully somebody has an idea on what's going on here.
>> If you need any more information to find the issue, let me know.
>> Thank you.
>>
>> Best,
>> -Cristobal
>>
>>
>> http://bsc.es/disclaimer
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
http://bsc.es/disclaimer
-------------- next part --------------
$ clang -v -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_70
Version 6
Version >= 90 selected
libdevice.10.bc exists
clang version 7.0.0 (tags/RELEASE_700/final)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/user/pkg/clang/7.0.0/bin
Found candidate GCC installation: /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
Selected GCC installation: /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda-9.2, version 9.2
Version 6
Version >= 90 selected
libdevice.10.bc exists
Searching for libomptarget-nvptx-sm_70.bc
GpuArch: sm_70
LibraryPath: /home/user/pkg/clang/7.0.0/lib
LibraryPath: /usr/local/cuda-9.2/extras/CUPTI/lib64
LibraryPath: /usr/local/cuda-9.2/nvvm/lib64
LibraryPath: /usr/local/cuda-9.2/lib64
LibraryPath: /usr/lib64/nvidia/xorg
LibraryPath: /usr/lib64/nvidia
LibraryPath: /usr/local/cuda-9.2/nvvm/libdevice
LibraryPath: /gpfs/apps/POWER9/LLVM/7.0.0/GCC/lib
clang-7: warning: No library 'libomptarget-nvptx-sm_70.bc' found in the default clang lib directory or in LIBRARY_PATH. Expect degraded performance due to no inlining of runtime functions on target devices. [-Wopenmp-target]
"/home/user/pkg/clang/7.0.0/bin/clang-7" -cc1 -triple powerpc64le-unknown-linux-gnu -emit-llvm-bc -emit-llvm-uselists -disable-free -main-file-name openmp_offload.c -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu ppc64le -mfloat-abi hard -target-abi elfv2 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -resource-dir /home/user/pkg/clang/7.0.0/lib/clang/7.0.0 -c-isystem /usr/local/cuda-9.2/extras/CUPTI/include -c-isystem /usr/local/cuda-9.2/nvvm/include -c-isystem /usr/local/cuda-9.2/include -c-isystem /home/user/pkg/clang/7.0.0/include -c-isystem /home/user/pkg/llvm/7.0.0/include -c-isystem /home/user/pkg/gcc/8.2.0/include -cxx-isystem /usr/local/cuda-9.2/extras/CUPTI/include -cxx-isystem /usr/local/cuda-9.2/nvvm/include -cxx-isystem /usr/local/cuda-9.2/include -cxx-isystem /home/user/pkg/clang/7.0.0/include -cxx-isystem /home/user/pkg/llvm/7.0.0/include -cxx-isystem /home/user/pkg/gcc/8.2.0/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -fdebug-compilation-dir /home/user/matmul_cpu_gpu -ferror-limit 19 -fmessage-length 103 -fopenmp -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/openmp_offload-76dc4d.bc -x c openmp_offload.c -fopenmp-targets=nvptx64-nvidia-cuda -faddrsig
clang -cc1 version 7.0.0 based upon LLVM 7.0.0 default target powerpc64le-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/local/include"
ignoring duplicate directory "/home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include"
ignoring duplicate directory "/usr/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/cuda-9.2/extras/CUPTI/include
/usr/local/cuda-9.2/nvvm/include
/usr/local/cuda-9.2/include
/home/user/pkg/clang/7.0.0/include
/home/user/pkg/llvm/7.0.0/include
/home/user/pkg/gcc/8.2.0/include
/usr/local/include
/home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include
/usr/include
End of search list.
"/home/user/pkg/clang/7.0.0/bin/clang-7" -cc1 -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-linux-gnu -S -disable-free -main-file-name openmp_offload.c -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -no-integrated-as -fuse-init-array -mlink-cuda-bitcode /usr/local/cuda-9.2/nvvm/libdevice/libdevice.10.bc -target-feature +ptx61 -target-cpu sm_70 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /home/user/pkg/clang/7.0.0/lib/clang/7.0.0 -c-isystem /usr/local/cuda-9.2/extras/CUPTI/include -c-isystem /usr/local/cuda-9.2/nvvm/include -c-isystem /usr/local/cuda-9.2/include -c-isystem /home/user/pkg/clang/7.0.0/include -c-isystem /home/user/pkg/llvm/7.0.0/include -c-isystem /home/user/pkg/gcc/8.2.0/include -cxx-isystem /usr/local/cuda-9.2/extras/CUPTI/include -cxx-isystem /usr/local/cuda-9.2/nvvm/include -cxx-isystem /usr/local/cuda-9.2/include -cxx-isystem /home/user/pkg/clang/7.0.0/include -cxx-isystem /home/user/pkg/llvm/7.0.0/include -cxx-isystem /home/user/pkg/gcc/8.2.0/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -fno-dwarf-directory-asm -fdebug-compilation-dir /home/user/matmul_cpu_gpu -ferror-limit 19 -fmessage-length 103 -fopenmp -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/openmp_offload-19c687.s -x c openmp_offload.c -fopenmp-is-device -fopenmp-host-ir-file-path /tmp/openmp_offload-76dc4d.bc
clang -cc1 version 7.0.0 based upon LLVM 7.0.0 default target powerpc64le-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/local/include"
ignoring duplicate directory "/home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include"
ignoring duplicate directory "/usr/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/cuda-9.2/extras/CUPTI/include
/usr/local/cuda-9.2/nvvm/include
/usr/local/cuda-9.2/include
/home/user/pkg/clang/7.0.0/include
/home/user/pkg/llvm/7.0.0/include
/home/user/pkg/gcc/8.2.0/include
/usr/local/include
/home/user/pkg/clang/7.0.0/lib/clang/7.0.0/include
/usr/include
End of search list.
"/usr/local/cuda-9.2/bin/ptxas" -m64 -O3 -v --gpu-name sm_70 --output-file /tmp/openmp_offload-ffaaae.cubin /tmp/openmp_offload-19c687.s -c
ptxas info : 96 bytes gmem
ptxas info : Compiling entry function '__omp_offloading_33_4fd06b8_target_matmul_l18' for 'sm_70'
ptxas info : Function properties for __omp_offloading_33_4fd06b8_target_matmul_l18
40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 416 bytes cmem[0]
"/usr/local/cuda-9.2/bin/nvlink" -o /tmp/openmp_offload-285066.out -v -arch sm_70 -L/usr/local/cuda-9.2/extras/CUPTI/lib64 -L/usr/local/cuda-9.2/nvvm/lib64 -L/usr/local/cuda-9.2/lib64 -L/usr/lib64/nvidia/xorg -L/usr/lib64/nvidia -L/usr/local/cuda-9.2/nvvm/libdevice -L/gpfs/apps/POWER9/LLVM/7.0.0/GCC/lib -L/home/user/pkg/clang/7.0.0/lib -lomptarget-nvptx /tmp/openmp_offload-ffaaae.cubin
nvlink error : Undefined reference to '__kmpc_spmd_kernel_init' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error : Undefined reference to '__kmpc_data_sharing_init_stack_spmd' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error : Undefined reference to '__kmpc_for_static_init_4' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error : Undefined reference to '__kmpc_for_static_fini' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error : Undefined reference to '__kmpc_global_thread_num' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink error : Undefined reference to '__kmpc_spmd_kernel_deinit' in '/tmp/openmp_offload-ffaaae.cubin'
nvlink info : 96 bytes gmem
nvlink info : Function properties for '__omp_offloading_33_4fd06b8_target_matmul_l18':
nvlink info : used 47 registers, 40 stack, 0 bytes smem, 416 bytes cmem[0], 0 bytes lmem
"/home/user/pkg/clang/7.0.0/bin/clang-7" -cc1 -triple powerpc64le-unknown-linux-gnu -emit-obj -disable-free -main-file-name openmp_offload.c -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu ppc64le -mfloat-abi hard -target-abi elfv2 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -resource-dir /home/user/pkg/clang/7.0.0/lib/clang/7.0.0 -O3 -fdebug-compilation-dir /home/user/matmul_cpu_gpu -ferror-limit 19 -fmessage-length 103 -fopenmp -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/openmp_offload-b11422.o -x ir /tmp/openmp_offload-76dc4d.bc -fopenmp-targets=nvptx64-nvidia-cuda -faddrsig
clang -cc1 version 7.0.0 based upon LLVM 7.0.0 default target powerpc64le-unknown-linux-gnu
"/usr/bin/ld" --hash-style=gnu --no-add-needed --eh-frame-hdr -m elf64lppc -dynamic-linker /lib64/ld64.so.2 -o openmp_offload /lib/../lib64/crt1.o /lib/../lib64/crti.o /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/crtbegin.o -L/home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0 -L/home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/../../.. -L/home/user/pkg/clang/7.0.0/bin/../lib -L/lib -L/usr/lib /tmp/openmp_offload-b11422.o -L/usr/local/cuda-9.2/extras/CUPTI/lib64 -L/usr/local/cuda-9.2/nvvm/lib64 -L/usr/local/cuda-9.2/lib64 -L/usr/lib64/nvidia/xorg -L/usr/lib64/nvidia -L/usr/local/cuda-9.2/nvvm/libdevice -L/gpfs/apps/POWER9/LLVM/7.0.0/GCC/lib -lomp -lomptarget -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0/crtend.o /lib/../lib64/crtn.o -T /tmp/openmp_offload-da8a6c.lk
clang-7: error: nvlink command failed with exit code 255 (use -v to see invocation)
More information about the Openmp-dev
mailing list