<div dir="ltr">Hi Justin,<div><br></div><div>Thanks for your help!  I passed sm_30 as the target gpu arch and the compilation was successful.  </div><div><br></div><div>I'm also curious about how the symlink solution works so I also tried it :p.  The compilation succeeded, but the binary I got crashed with a complaint ' <b>(8): illegal libdevice function</b>' . </div><div><br></div><div>I would appreciate to be kept posted about relevant changes; my username is yuanfeng.peng .</div><div><br></div><div>Thanks again!</div><div>Yuanfeng  </div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 1, 2016 at 2:33 AM, Justin Lebar <span dir="ltr"><<a href="mailto:jlebar@google.com" target="_blank">jlebar@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">OK, I see the problem.  You were right that we weren't picking up libdevice.<br>

<br>

CUDA 7.0 only ships with the following libdevice binaries (found<br>

/path/to/cuda/nvvm/libdevice):<br>

<br>

  libdevice.compute_20.10.bc  libdevice.compute_30.10.bc<br>

libdevice.compute_35.10.bc<br>

<br>

If you ask for sm_50 with cuda 7.0, clang can't find a matching<br>

libdevice binary, and it will apparently silently give up and try to<br>

continue compiling your program.  That's a bug that we should fix.<br>

(If you want the current behavior, you should have to ask clang not to<br>

use libdevice.)<br>

<br>

I see that nvcc from cuda 7.0 works (or at least builds without<br>

error).  I guess it uses the libdevice for compute_35.  We could do<br>

the same thing, although I am not sure how to tell whether that's safe<br>

in general.  I'll look into this as well.<br>

<br>

Anyway if you build with CUDA 7.5 your problem should go away, because<br>

CUDA 7.5 has a libdevice binary for compute_50.  Just pass<br>

--cuda-path=/path/to/cuda-7.5.  Alternatively you could continue<br>

building with cuda 7.0 and pass sm_35 as your gpu arch.  clang always<br>

embeds ptx in the binaries, so the result should still run on your<br>

sm_50 card (although your machine will have to jit the ptx on<br>

startup).<br>

<br>

As a third alternative, you could symlink your<br>

libdevice.compute_35.10.bc to libdevice.compute_50.10.bc, and...maybe<br>

that would work?  If you do that, please let me know how it goes, I am<br>

curious.  :)<br>

<br>

Thank you very much for the bug report!  If you like I'll cc you on<br>

any relevant changes, just create an account at<br>

<a href="https://reviews.llvm.org" rel="noreferrer" target="_blank">https://reviews.llvm.org</a> (if necessary; I can't seem to find you) and<br>

let me know your username.<br>

<br>

Regards,<br>

-Justin<br>

<div class="HOEnZb"><div class="h5"><br>

On Sun, Jul 31, 2016 at 10:59 PM, Yuanfeng Peng <<a href="mailto:yuanfeng@cis.upenn.edu">yuanfeng@cis.upenn.edu</a>> wrote:<br>

> Hi Justin,<br>

><br>

> Thanks for your response!  The clang & llvm I'm using was built from source.<br>

><br>

> Below is the output of compiling with -v.  Any suggestions would be<br>

> appreciated!<br>

><br>

> clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)<br>

> Target: x86_64-unknown-linux-gnu<br>

> Thread model: posix<br>

> InstalledDir: /usr/local/bin<br>

> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8<br>

> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.4<br>

> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9<br>

> Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3<br>

> Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8<br>

> Candidate multilib: .;@m64<br>

> Candidate multilib: 32;@m32<br>

> Candidate multilib: x32;@mx32<br>

> Selected multilib: .;@m64<br>

> Found CUDA installation: /usr/local/cuda<br>

>  "/usr/local/bin/clang-3.9" -cc1 -triple nvptx64-nvidia-cuda -aux-triple<br>

> x86_64-unknown-linux-gnu -S -disable-free -main-file-name scalarProd.cu<br>

> -mrelocation-model static -mthread-model posix -mdisable-fp-elim<br>

> -fmath-errno -no-integrated-as -fcuda-is-device -target-cpu sm_50 -v<br>

> -dwarf-column-info -debugger-tuning=gdb -resource-dir<br>

> /usr/local/bin/../lib/clang/3.9.0 -I ../ -I<br>

> /usr/local/cuda-7.0/samples/common/inc -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8<br>

> -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8<br>

> -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8<br>

> -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward<br>

> -internal-isystem /usr/local/include -internal-isystem<br>

> /usr/local/bin/../lib/clang/3.9.0/include -internal-externc-isystem /include<br>

> -internal-externc-isystem /usr/include -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8<br>

> -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8<br>

> -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8<br>

> -internal-isystem<br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward<br>

> -internal-isystem /usr/local/cuda/include -include<br>

> __clang_cuda_runtime_wrapper.h -fdeprecated-macro -fno-dwarf-directory-asm<br>

> -fdebug-compilation-dir<br>

> /mnt/wtf/workspace/cuda/gpu-race-detection/cuda-compressed-conflict-detection/scalarProd<br>

> -ferror-limit 19 -fmessage-length 144 -fobjc-runtime=gcc -fcxx-exceptions<br>

> -fexceptions -fdiagnostics-show-option -o /tmp/scalarProd-32a530.s -x cuda<br>

> scalarProd.cu<br>

> hooklib.so loading.<br>

> clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target<br>

> x86_64-unknown-linux-gnu<br>

> ignoring nonexistent directory "/include"<br>

> ignoring duplicate directory<br>

> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"<br>

> ignoring duplicate directory<br>

> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8"<br>

> ignoring duplicate directory<br>

> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"<br>

> ignoring duplicate directory<br>

> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"<br>

> ignoring duplicate directory<br>

> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward"<br>

> ignoring duplicate directory "/usr/local/include"<br>

> ignoring duplicate directory "/usr/local/bin/../lib/clang/3.9.0/include"<br>

> ignoring duplicate directory "/usr/include"<br>

> #include "..." search starts here:<br>

> #include <...> search starts here:<br>

>  ..<br>

>  /usr/local/cuda-7.0/samples/common/inc<br>

>  /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8<br>

><br>

> /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8<br>

>  /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward<br>

>  /usr/local/include<br>

>  /usr/local/bin/../lib/clang/3.9.0/include<br>

>  /usr/include<br>

>  /usr/local/cuda/include<br>

> End of search list.<br>

><br>

>  "/usr/local/cuda/bin/ptxas" -m64 -O0 --gpu-name sm_50 --output-file<br>

> /tmp/scalarProd-181f7e.o /tmp/scalarProd-32a530.s<br>

> ptxas fatal   : Unresolved extern function '__nv_mul24'<br>

> clang-3.9: error: ptxas command failed with exit code 255 (use -v to see<br>

> invocation)<br>

><br>

> Thanks!<br>

> Yuanfeng<br>

><br>

> On Mon, Aug 1, 2016 at 1:04 AM, Justin Lebar <<a href="mailto:jlebar@google.com">jlebar@google.com</a>> wrote:<br>

>><br>

>> Hi, Yuanfeng.<br>

>><br>

>> What version of clang are you using?  CUDA is only known to work at<br>

>> tip of head, so you must build clang yourself from source.<br>

>><br>

>> I suspect that's your problem, but if building from source doesn't fix<br>

>> it, please attach the output of compiling with -v.<br>

>><br>

>> Regards,<br>

>> -Justin<br>

>><br>

>> On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>><br>

>> wrote:<br>

>> > Directly CC-ing some folks who may be able to help.<br>

>> ><br>

>> > On Fri, Jul 29, 2016 at 6:27 AM Yuanfeng Peng via llvm-dev<br>

>> > <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

>> >><br>

>> >> Hi,<br>

>> >><br>

>> >> I was trying to compile scalarProd.cu (from CUDA SDK) with the<br>

>> >> following<br>

>> >> command:<br>

>> >><br>

>> >>  clang++ -I../ -I/usr/local/cuda-7.0/samples/common/inc<br>

>> >> --cuda-gpu-arch=sm_50 scalarProd.cu<br>

>> >><br>

>> >>  but ended up with the following error:<br>

>> >><br>

>> >> ptxas fatal   : Unresolved extern function '__nv_mul24'<br>

>> >><br>

>> >> Seems to me that libdevice was not automatically linked.  I wonder what<br>

>> >> flags I need to pass to clang to have the code linked against<br>

>> >> libdevice?<br>

>> >><br>

>> >> Thanks!<br>

>> >> Yuanfeng Peng<br>

>> >> _______________________________________________<br>

>> >> LLVM Developers mailing list<br>

>> >> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

>> >> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

><br>

><br>

</div></div></blockquote></div><br></div>