<div dir="ltr"><div>Yeah, the example code is below:</div><div><br></div><div>  int N = 1<<20;<br><br>  float *x = new float[N];<br>  float *y = new float[N];<br><br>  for (int i = 0; i < N; i++) {<br>    x[i] = 1.0f;<br>    y[i] = 2.0f;<br>  }<br><br>  float *z = new float[N];<br>  int i;<br>  #pragma omp target map(x, y, z)<br>  #pragma omp parallel for<br>  for (i=0; i < N; i++) {<br>    z[i] = x[i] + y[i];<br>  }</div><div><br></div><div>I just grab a piece of code from <a href="https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf">https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf</a> for testing. I also tested other examples in that document, but none worked. Initially, I was working on a piece of code from a legacy project.</div><div><br></div><div>BTW, when I compiled the example code, I got some warnings:</div><div><br></div><div>clang-11: warning: Unknown CUDA version 10.2. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]<br>clang-11: warning: Unknown CUDA version 10.2. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]<br>clang-11: warning: No library 'libomptarget-nvptx-sm_35.bc' found in the default clang lib directory or in LIBRARY_PATH. Expect degraded performance due to no inlining of runtime functions on target devices. [-Wopenmp-target]</div><div><br></div><div>I am not sure if the error is caused by the newer CUDA version (I thought 10.2 should be compatible with 10.1)</div><div><br></div><div>Thanks!</div><div><br></div><div>Gang Zhao<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Alexey.Bataev <<a href="mailto:a.bataev@outlook.com">a.bataev@outlook.com</a>> 于2020年3月5日周四 下午12:59写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p>Could you provide an example how do you map the data in the

      target region?<br>

    </p>

    <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

    <div>05.03.2020 1:45 PM, G Zhao пишет:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>Thanks! I didn't notice this. The code is from a legacy

          project and I just checked all the target regions. It did use

          STL vectors. I just replaced all those vectors with arrays.

          Now it can successfully compile. But When I run it, there is

          another error:</div>

        <div><br>

        </div>

        <div>Libomptarget fatal error 1: failure of target construct

          while offloading is mandatory</div>

        <div><br>

        </div>

        <div>I tried it on a simple vector add example, and got the same

          error.</div>

        <div><br>

        </div>

        <div>Below is the debug information with LIBOMPTARGET_DEBUG=1:</div>

        <div><br>

        </div>

        <div>Libomptarget --> Loading RTLs...<br>

          Libomptarget --> Loading library '<a href="http://libomptarget.rtl.ppc64.so" target="_blank">libomptarget.rtl.ppc64.so</a>'...<br>

          Libomptarget --> Unable to load library '<a href="http://libomptarget.rtl.ppc64.so" target="_blank">libomptarget.rtl.ppc64.so</a>': <a href="http://libomptarget.rtl.ppc64.so" target="_blank">libomptarget.rtl.ppc64.so</a>: cannot

          open shared object file: No such file or directory!<br>

          Libomptarget --> Loading library '<a href="http://libomptarget.rtl.x86_64.so" target="_blank">libomptarget.rtl.x86_64.so</a>'...<br>

          Libomptarget --> Successfully loaded library '<a href="http://libomptarget.rtl.x86_64.so" target="_blank">libomptarget.rtl.x86_64.so</a>'!<br>

          Libomptarget --> Registering RTL <a href="http://libomptarget.rtl.x86_64.so" target="_blank">libomptarget.rtl.x86_64.so</a>

          supporting 4 devices!<br>

          Libomptarget --> Loading library '<a href="http://libomptarget.rtl.cuda.so" target="_blank">libomptarget.rtl.cuda.so</a>'...<br>

          Target CUDA RTL --> Start initializing CUDA<br>

          Libomptarget --> Successfully loaded library '<a href="http://libomptarget.rtl.cuda.so" target="_blank">libomptarget.rtl.cuda.so</a>'!<br>

          Libomptarget --> Registering RTL <a href="http://libomptarget.rtl.cuda.so" target="_blank">libomptarget.rtl.cuda.so</a>

          supporting 1 devices!<br>

          Libomptarget --> Loading library '<a href="http://libomptarget.rtl.aarch64.so" target="_blank">libomptarget.rtl.aarch64.so</a>'...<br>

          Libomptarget --> Unable to load library '<a href="http://libomptarget.rtl.aarch64.so" target="_blank">libomptarget.rtl.aarch64.so</a>': <a href="http://libomptarget.rtl.aarch64.so" target="_blank">libomptarget.rtl.aarch64.so</a>:

          cannot open shared object file: No such file or directory!<br>

          Libomptarget --> RTLs loaded!<br>

          Libomptarget --> Image 0x000000000041ad20 is NOT compatible

          with RTL <a href="http://libomptarget.rtl.x86_64.so" target="_blank">libomptarget.rtl.x86_64.so</a>!<br>

          Libomptarget --> Image 0x000000000041ad20 is compatible

          with RTL <a href="http://libomptarget.rtl.cuda.so" target="_blank">libomptarget.rtl.cuda.so</a>!<br>

          Libomptarget --> RTL 0x00000000015b3c40 has index 0!<br>

          Libomptarget --> Registering image 0x000000000041ad20 with

          RTL <a href="http://libomptarget.rtl.cuda.so" target="_blank">libomptarget.rtl.cuda.so</a>!<br>

          Libomptarget --> Done registering entries!<br>

          Libomptarget --> Call to omp_get_num_devices returning 1<br>

          Libomptarget --> Default TARGET OFFLOAD policy is now

          mandatory (devices were found)<br>

          Libomptarget --> Checking whether device 0 is ready.<br>

          Libomptarget --> Is the device 0 (local ID 0) initialized?

          0<br>

          Target CUDA RTL --> Init requires flags to 1<br>

          Target CUDA RTL --> Getting device 0<br>

          Target CUDA RTL --> Max CUDA blocks per grid 2147483647

          exceeds the hard team limit 65536, capping at the hard limit<br>

          Target CUDA RTL --> Using 1024 CUDA threads per block<br>

          Target CUDA RTL --> Max number of CUDA blocks 65536,

          threads 1024 & warp size 32<br>

          Target CUDA RTL --> Default number of teams set according

          to library's default 128<br>

          Target CUDA RTL --> Default number of threads set according

          to library's default 128<br>

          Libomptarget --> Device 0 is ready to use.<br>

          Target CUDA RTL --> Load data from image 0x000000000041ad20<br>

          Target CUDA RTL --> Error when loading CUDA module<br>

          Target CUDA RTL --> CUDA error is: device kernel image is

          invalid<br>

          Libomptarget --> Unable to generate entries table for

          device id 0.<br>

          Libomptarget --> Failed to init globals on device 0<br>

          Libomptarget --> Failed to get device 0 ready<br>

          Libomptarget fatal error 1: failure of target construct while

          offloading is mandatory<br>

          Libomptarget --> Unloading target library!<br>

          Libomptarget --> Image 0x000000000041ad20 is compatible

          with RTL 0x00000000015b3c40!<br>

          Libomptarget --> Unregistered image 0x000000000041ad20 from

          RTL 0x00000000015b3c40!<br>

          Libomptarget --> Done unregistering images!<br>

          Libomptarget --> Removing translation table for descriptor

          0x0000000000440810<br>

          Libomptarget --> Done unregistering library!<br>

          Libomptarget --> Deinit target library!<br>

        </div>

        <div><br>

        </div>

        <div>Any hints about this? <br>

        </div>

        <div><br>

        </div>

        <div>Regards,</div>

        <div>Gang Zhao<br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">Alexey Bataev <<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>>

            于2020年3月5日周四 上午5:44写道：<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Neither libc++, nor

            libstdc++ cannot be linked with nvlink. NVidia does not

            provide implementations for either libc++, or libstdc++. You

            must exclude the use of the standard c++ library from target

            regions.<br>

            <br>

            Best regards,<br>

            Alexey Bataev<br>

            <br>

            > 5 марта 2020 г., в 00:25, G Zhao via Openmp-dev <<a href="mailto:openmp-dev@lists.llvm.org" target="_blank">openmp-dev@lists.llvm.org</a>>

            написал(а):<br>

            > <br>

            > <br>

            > Hi,<br>

            > <br>

            > I just compiled LLVM and enable NVPTX by

            -DLLVM_TARGETS_TO_BUILD="X86;NVPTX". But when I compiled my

            code using the below command:<br>

            > <br>

            > clang++ main.cpp -fopenmp

            -fopenmp-targets=nvptx64-nvidia-cuda -o a_gpu.exe<br>

            > <br>

            > I got the below error:<br>

            > <br>

            >

/usr/lib64/gcc/x86_64-pc-linux-gnu/9.2.1/../../../../include/c++/9.2.1/bits/std_abs.h:75:3:

            error: declaration conflicts with target of using

            declaration already in scope<br>

            >   abs(float __x)<br>

            > <br>

            > I think the reason is I am using GCC-9. I did a bit

            search and someone said using libc++ can address this. So I

            compiled libcxx and libcxxabi, and used the below command to

            compile my code again:<br>

            > <br>

            > clang++ stdlib=libc++  main.cpp -fopenmp

            -fopenmp-targets=nvptx64-nvidia-cuda -o a_gpu.exe<br>

            > <br>

            > I got different errors:<br>

            > <br>

            > nvlink error   : Undefined reference to

            '_ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv'

            in '/tmp/main-42e0a6.cubin'<br>

            > nvlink error   : Undefined reference to 'abort' in

            '/tmp/main-42e0a6.cubin'<br>

            > <br>

            > I think the reason here is nvlink don't know we should

            link libc++ with those cubin files together. But I don't

            know how to solve this.<br>

            > <br>

            > Any one know a workaround to this?<br>

            > <br>

            > Thanks!<br>

            > <br>

            > <br>

            > <br>

            > <br>

            > _______________________________________________<br>

            > Openmp-dev mailing list<br>

            > <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>

            > <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>

          </blockquote>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div>