[Openmp-dev] Compiling OpenMP code with CUDA support using libc++

Thu Mar 5 12:22:48 PST 2020

Yeah, the example code is below:

  int N = 1<<20;

  float *x = new float[N];
  float *y = new float[N];

  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  float *z = new float[N];
  int i;
  #pragma omp target map(x, y, z)
  #pragma omp parallel for
  for (i=0; i < N; i++) {
    z[i] = x[i] + y[i];
  }

I just grab a piece of code from
https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf for
testing. I also tested other examples in that document, but none worked.
Initially, I was working on a piece of code from a legacy project.

BTW, when I compiled the example code, I got some warnings:

clang-11: warning: Unknown CUDA version 10.2. Assuming the latest supported
version 10.1 [-Wunknown-cuda-version]
clang-11: warning: Unknown CUDA version 10.2. Assuming the latest supported
version 10.1 [-Wunknown-cuda-version]
clang-11: warning: No library 'libomptarget-nvptx-sm_35.bc' found in the
default clang lib directory or in LIBRARY_PATH. Expect degraded performance
due to no inlining of runtime functions on target devices. [-Wopenmp-target]

I am not sure if the error is caused by the newer CUDA version (I thought
10.2 should be compatible with 10.1)

Thanks!

Gang Zhao

Alexey.Bataev <a.bataev at outlook.com> 于2020年3月5日周四 下午12:59写道：

> Could you provide an example how do you map the data in the target region?
>
> -------------
> Best regards,
> Alexey Bataev
>
> 05.03.2020 1:45 PM, G Zhao пишет:
>
> Thanks! I didn't notice this. The code is from a legacy project and I just
> checked all the target regions. It did use STL vectors. I just replaced all
> those vectors with arrays. Now it can successfully compile. But When I run
> it, there is another error:
>
> Libomptarget fatal error 1: failure of target construct while offloading
> is mandatory
>
> I tried it on a simple vector add example, and got the same error.
>
> Below is the debug information with LIBOMPTARGET_DEBUG=1:
>
> Libomptarget --> Loading RTLs...
> Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
> Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so':
> libomptarget.rtl.ppc64.so: cannot open shared object file: No such file
> or directory!
> Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
> Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
> Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting 4
> devices!
> Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
> Target CUDA RTL --> Start initializing CUDA
> Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
> Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 1
> devices!
> Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
> Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so':
> libomptarget.rtl.aarch64.so: cannot open shared object file: No such file
> or directory!
> Libomptarget --> RTLs loaded!
> Libomptarget --> Image 0x000000000041ad20 is NOT compatible with RTL
> libomptarget.rtl.x86_64.so!
> Libomptarget --> Image 0x000000000041ad20 is compatible with RTL
> libomptarget.rtl.cuda.so!
> Libomptarget --> RTL 0x00000000015b3c40 has index 0!
> Libomptarget --> Registering image 0x000000000041ad20 with RTL
> libomptarget.rtl.cuda.so!
> Libomptarget --> Done registering entries!
> Libomptarget --> Call to omp_get_num_devices returning 1
> Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices
> were found)
> Libomptarget --> Checking whether device 0 is ready.
> Libomptarget --> Is the device 0 (local ID 0) initialized? 0
> Target CUDA RTL --> Init requires flags to 1
> Target CUDA RTL --> Getting device 0
> Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the hard
> team limit 65536, capping at the hard limit
> Target CUDA RTL --> Using 1024 CUDA threads per block
> Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 & warp
> size 32
> Target CUDA RTL --> Default number of teams set according to library's
> default 128
> Target CUDA RTL --> Default number of threads set according to library's
> default 128
> Libomptarget --> Device 0 is ready to use.
> Target CUDA RTL --> Load data from image 0x000000000041ad20
> Target CUDA RTL --> Error when loading CUDA module
> Target CUDA RTL --> CUDA error is: device kernel image is invalid
> Libomptarget --> Unable to generate entries table for device id 0.
> Libomptarget --> Failed to init globals on device 0
> Libomptarget --> Failed to get device 0 ready
> Libomptarget fatal error 1: failure of target construct while offloading
> is mandatory
> Libomptarget --> Unloading target library!
> Libomptarget --> Image 0x000000000041ad20 is compatible with RTL
> 0x00000000015b3c40!
> Libomptarget --> Unregistered image 0x000000000041ad20 from RTL
> 0x00000000015b3c40!
> Libomptarget --> Done unregistering images!
> Libomptarget --> Removing translation table for descriptor
> 0x0000000000440810
> Libomptarget --> Done unregistering library!
> Libomptarget --> Deinit target library!
>
> Any hints about this?
>
> Regards,
> Gang Zhao
>
> Alexey Bataev <a.bataev at hotmail.com> 于2020年3月5日周四 上午5:44写道：
>
>> Neither libc++, nor libstdc++ cannot be linked with nvlink. NVidia does
>> not provide implementations for either libc++, or libstdc++. You must
>> exclude the use of the standard c++ library from target regions.
>>
>> Best regards,
>> Alexey Bataev
>>
>> > 5 марта 2020 г., в 00:25, G Zhao via Openmp-dev <
>> openmp-dev at lists.llvm.org> написал(а):
>> >
>> > 
>> > Hi,
>> >
>> > I just compiled LLVM and enable NVPTX by
>> -DLLVM_TARGETS_TO_BUILD="X86;NVPTX". But when I compiled my code using the
>> below command:
>> >
>> > clang++ main.cpp -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -o
>> a_gpu.exe
>> >
>> > I got the below error:
>> >
>> >
>> /usr/lib64/gcc/x86_64-pc-linux-gnu/9.2.1/../../../../include/c++/9.2.1/bits/std_abs.h:75:3:
>> error: declaration conflicts with target of using declaration already in
>> scope
>> >   abs(float __x)
>> >
>> > I think the reason is I am using GCC-9. I did a bit search and someone
>> said using libc++ can address this. So I compiled libcxx and libcxxabi, and
>> used the below command to compile my code again:
>> >
>> > clang++ stdlib=libc++  main.cpp -fopenmp
>> -fopenmp-targets=nvptx64-nvidia-cuda -o a_gpu.exe
>> >
>> > I got different errors:
>> >
>> > nvlink error   : Undefined reference to
>> '_ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv' in
>> '/tmp/main-42e0a6.cubin'
>> > nvlink error   : Undefined reference to 'abort' in
>> '/tmp/main-42e0a6.cubin'
>> >
>> > I think the reason here is nvlink don't know we should link libc++ with
>> those cubin files together. But I don't know how to solve this.
>> >
>> > Any one know a workaround to this?
>> >
>> > Thanks!
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Openmp-dev mailing list
>> > Openmp-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200305/df461464/attachment.html>