<html><body><p>Hi Ahmed,<br><br>I was able to compile the CUDA GPU OMP runtime with Clang directly.  I did have to remove some printf statements and asserts to get it to go through Clang-CUDA.  Depending on your configuration, you may need some or all of these flags: --cuda-gpu-arch=sm_35 -nocudalib -DOMPTARGET_NVPTX_TEST=0 -DOMPTARGET_NVPTX_DEBUG=0 -DOMPTARGET_NVPTX_WARNING=0<br><br>Cheers,<br>Arpith<br><br><img width="16" height="16" src="cid:1__=8FBBF5D2DFD279518f9e8a93df938690918c8FB@" border="0" alt="Inactive hide details for Ahmed ElTantawy ---01/21/2016 04:49:17 AM---Thanks Arpith. I was doing it in almost the same way but "><font color="#424282">Ahmed ElTantawy ---01/21/2016 04:49:17 AM---Thanks Arpith. I was doing it in almost the same way but with nvcc (or apc-llc</font><br><br><font size="2" color="#5F5F5F">From:        </font><font size="2">Ahmed ElTantawy <ahmede@ece.ubc.ca></font><br><font size="2" color="#5F5F5F">To:        </font><font size="2">Arpith C Jacob/Watson/IBM@IBMUS</font><br><font size="2" color="#5F5F5F">Cc:        </font><font size="2">llvm-dev@lists.llvm.org, "Bataev, Alexey" <alexey.bataev@intel.com></font><br><font size="2" color="#5F5F5F">Date:        </font><font size="2">01/21/2016 04:49 AM</font><br><font size="2" color="#5F5F5F">Subject:        </font><font size="2">Re: Executing OpenMP 4.0 code on Nvidia's GPU</font><br><font size="2" color="#5F5F5F">Sent by:        </font><font size="2">ahmed.mohammed.eltantawy@gmail.com</font><br><hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br><br><br><font size="4">Thanks Arpith. <br></font><br><font size="4">I was doing it in almost the same way but with nvcc (or </font><a href="https://github.com/apc-llc/nvcc-llvm-ir" target="_blank"><u><font size="4" color="#0000FF">apc-llc</font></u></a><font size="4">), and of course I had to make the produced LLVM-IR matches my version of LLVM. <br><br>But, I would imagine it will less messy if I can compile CUDA GPU OMP runtime with Clang directly. I found that there is a patch that was committed recently to enable compiling CUDA with clang (</font><a href="http://llvm.org/docs/CompileCudaWithLLVM.html" target="_blank"><u><font size="4" color="#0000FF">http://llvm.org/docs/CompileCudaWithLLVM.html</font></u></a><font size="4">). <br><br>Do you know if there is any restriction about the CUDA version for the compilation of CUDA with clang to work ?<br></font><br><font size="4">Thanks a lot</font><br><br><font size="4">On Wed, Jan 20, 2016 at 7:07 AM, Arpith C Jacob <</font><a href="mailto:acjacob@us.ibm.com" target="_blank"><u><font size="4" color="#0000FF">acjacob@us.ibm.com</font></u></a><font size="4">> wrote:</font><ul><font size="4">Hi Ahmed,<br><br>I am experimenting with LTO, but as you said, it's still *very* hacky.<br><br>Here's what I did. First compile the CUDA GPU OMP runtime with Clang (rather than nvcc) to bitcode. When I looked at Clang-CUDA a couple of weeks ago I could only get device side bitcode by using the temporary files generated after passing -save-temps to Clang. The OMP-GPU version of LLVM that you are using is not up to date with trunk, so I had to do a bit of massaging on the generated IR.<br><br>I then had to manually link the various device side bitcodes, call opt, llc, ptxas, and finally link it with the host object file.<br><br>We don't have support for this in the driver as yet but once we move to trunk I will look into streamlining this.<br><br>Thanks,<br>Arpith<br><br></font><img src="cid:1__=8FBBF5D2DFD279518f9e8a93df938690918c8FB@" width="16" height="16" alt="Inactive hide details for Ahmed ElTantawy ---01/20/2016 08:44:38 AM---Hi, I see now that the linking happens at the binary leve"><font size="4" color="#424282">Ahmed ElTantawy ---01/20/2016 08:44:38 AM---Hi, I see now that the linking happens at the binary level. I was wondering</font><font size="4"><br></font><font color="#5F5F5F"><br>From: </font>Ahmed ElTantawy <<a href="mailto:ahmede@ece.ubc.ca" target="_blank"><u><font color="#0000FF">ahmede@ece.ubc.ca</font></u></a>><font color="#5F5F5F"><br>To: </font>Arpith C Jacob/Watson/IBM@IBMUS<font color="#5F5F5F"><br>Cc: </font><a href="mailto:llvm-dev@lists.llvm.org" target="_blank"><u><font color="#0000FF">llvm-dev@lists.llvm.org</font></u></a>, "Bataev, Alexey" <<a href="mailto:alexey.bataev@intel.com" target="_blank"><u><font color="#0000FF">alexey.bataev@intel.com</font></u></a>><font color="#5F5F5F"><br>Date: </font>01/20/2016 08:44 AM<font color="#5F5F5F"><br>Subject: </font>Re: Executing OpenMP 4.0 code on Nvidia's GPU<font color="#5F5F5F"><br>Sent by: </font><a href="mailto:ahmed.mohammed.eltantawy@gmail.com" target="_blank"><u><font color="#0000FF">ahmed.mohammed.eltantawy@gmail.com</font></u></a><br><hr width="100%" size="2" align="left" noshade><br><font size="4"><br><br></font><font size="5"><br>Hi,</font><font size="4"><br></font><font size="5"><br>I see now that the linking happens at the binary level. I was wondering whether it is possible to link to the OpenMP runtime library at the LLVM IR level (to enable LTO optimizations for the code after library calls has been replaced). <br><br>I have done this before by linking to the bitcode of a file that contains the compiled CUDA implementation of the OpenMP runtime library. But it was a bit hacky, and offloading was not supported yet. Is it there a cleaner/standard way to do this ?</font><font size="4"><br></font><font size="5"><br>Thanks.</font><font size="4"><br></font><font size="5"><br>On Wed, Jan 20, 2016 at 5:09 AM, Ahmed ElTantawy <</font><a href="mailto:ahmede@ece.ubc.ca" target="_blank"><u><font size="5" color="#0000FF">ahmede@ece.ubc.ca</font></u></a><font size="5">> wrote:</font><ul><ul><font size="5">Hi Arpith,</font><font size="4"><br></font><font size="5"><br>That is exactly what it is :). <br><br>My bad, I thought I copied over the libraries to where LIBRARY_PATH pointing but apparently it was copied to a wrong destination.</font><font size="4"><br></font><font size="5"><br>Thanks a lot.</font><font size="4"><br></font><font size="5"><br>On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <</font><a href="mailto:acjacob@us.ibm.com" target="_blank"><u><font size="5" color="#0000FF">acjacob@us.ibm.com</font></u></a><font size="5">> wrote:<br>Hi Ahmed,<br><br>nvlink is unable to find the GPU OMP runtime library in its path. Does LIBRARY_PATH point to the right location? You could try passing the "-v" option to clang to get more information.<br><br>Regards,<br>Arpith</font></ul></ul><font size="4"><br></font></ul><br>

<p><BR>

</body></html>