<div dir="ltr"><div><span class="im" style="font-size:12.8000001907349px">Hi,</span></div><div><span class="im" style="font-size:12.8000001907349px"><br></span></div><span class="im" style="font-size:12.8000001907349px">On Thu, Jun 11, 2015 at 9:50 AM, latzori <span dir="ltr"><<a href="mailto:luca.atzori@cern.ch" target="_blank">luca.atzori@cern.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi all,<br><br>I'm trying to take stock of the situation about clang's CUDA support.<br><br></blockquote><div><br></div></span><div style="font-size:12.8000001907349px">TL;DR version: you *can* use clang -cc1 to compile some CUDA code that does not use Nvidia's CUDA headers.</div><span class="im" style="font-size:12.8000001907349px"><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">The informations I found around the web are very fragmentary and sometimes<br>contradictory, so I decided to open this post to clarify, and perhaps help<br>others in the same situation.<br><br>*Question: is there some way to generate ASTs without touching to the<br>front-end?*<br></blockquote><div><br></div></span><div style="font-size:12.8000001907349px">Cuda code that does not use CUDA headers should be compilable. Device-side will compile all the way down to PTX. Host side can generate appropriate glue to initialize and launch kernels. So, the answer is a qualified "yes".</div><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">Here's a trivial example of device-side compilation. Add -ast-dump if you want to see AST.</div><div style="font-size:12.8000001907349px"><div># echo '__attribute__((global)) void kernel(void) { }' | clang -cc1 -x cuda -fcuda-is-device -triple nvptx64-unknown-cuda -S -<br></div><div><br></div></div><div style="font-size:12.8000001907349px">Driver does not know much about cuda yet and that's something D9509 is intended to help with. For now, though, you'd have to do host and device compilation manually with cc1.<br></div><span class="im" style="font-size:12.8000001907349px"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>These are more or less the informations I found.<br><br><br>1) Looking at the official repositories (this  mirror<br><<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_llvm-2Dmirror_clang&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=mK5Jnl0bRci8zAuGK1k0nNcHcGWALbGiBONby--W140&s=_Ag7KPzIilRuF2FcJAnAcNEqWBxV2bgUKHm3gbxK1mg&e=" rel="noreferrer" target="_blank">https://github.com/llvm-mirror/clang</a>>   for example) it seems that some<br>work is in progress, but of course far away from completion.<br></blockquote><div><br></div></span><div style="font-size:12.8000001907349px">One can hope it's not *that* far from the point where it's usable. I've been digging in that direction and I'm getting the glimpse of a light at the end of the tunnel. I have rough set of changes that can compile and successfully run some of examples that come with CUDA 7.0.</div><span class="im" style="font-size:12.8000001907349px"><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>2) After a brief chat on the  official IRC Channel<br><<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__irc.lc_oftc_clang_irctc-40-40-40&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=mK5Jnl0bRci8zAuGK1k0nNcHcGWALbGiBONby--W140&s=4w6OYh6wKnQNl355QGnBPI-Q5Gwwu8O-Hvy6EDpRNIk&e=" rel="noreferrer" target="_blank">http://irc.lc/oftc/clang/irctc@@@</a>>  , some users confirmed me that<br>end-to-end compilation isn't supported, but that they internally have some<br>parser that runs quite well (?). They also linked me the  D9506<br><<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D9506&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=mK5Jnl0bRci8zAuGK1k0nNcHcGWALbGiBONby--W140&s=TCS6k3CioNWWIU-pXvmz210STklUJPEIEaaPJdcpMms&e=" rel="noreferrer" target="_blank">http://reviews.llvm.org/D9506</a>>  ,  D9507 <<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D9507&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=mK5Jnl0bRci8zAuGK1k0nNcHcGWALbGiBONby--W140&s=bqpEPC_7xSvCfMuleRH1wcE5OlIruAQrs_dICvBcqnY&e=" rel="noreferrer" target="_blank">http://reviews.llvm.org/D9507</a>><br>and  D9509 <<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D9509&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=mK5Jnl0bRci8zAuGK1k0nNcHcGWALbGiBONby--W140&s=rtX4Jc1Y_vnX7UmJaaJxkyq3isWBoHgK9cGvq2qJ3-8&e=" rel="noreferrer" target="_blank">http://reviews.llvm.org/D9509</a>>   patches.<br><br></blockquote><div><br></div></span><div style="font-size:12.8000001907349px">Yup. End-to-end compilation is not here yet. D9509 will get driver to handle CUDA compilation pipeline, but there are other missing pieces.</div><span class="im" style="font-size:12.8000001907349px"><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">3) Some projects claim that they can parse CUDA with clang. I'm referring in<br>particular to  CU2CL <<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__chrec.cs.vt.edu_cu2cl_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=mK5Jnl0bRci8zAuGK1k0nNcHcGWALbGiBONby--W140&s=OVkS10gzXfc3vYvr-O6ODhrqS66YRvM1-CrTv6Avtq8&e=" rel="noreferrer" target="_blank">http://chrec.cs.vt.edu/cu2cl/</a>>   (cited in  this<br>discussion<br><<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__clang-2Ddevelopers.42468.n3.nabble.com_Parsing-2DCUDA-2Dfile-2Dto-2DAST-2Dtd4038287.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=CnzuN65ENJ1H9py9XLiRvC_UQz6u3oG6GUNn7_wosSM&m=mK5Jnl0bRci8zAuGK1k0nNcHcGWALbGiBONby--W140&s=K7xwrALRliHamceELFJxNLqK8IxGjMwmNeBZbFFiJ_4&e=" rel="noreferrer" target="_blank">http://clang-developers.42468.n3.nabble.com/Parsing-CUDA-file-to-AST-td4038287.html</a>><br>too).<br></blockquote><div><br></div></span><div style="font-size:12.8000001907349px">Syntax-wise CUDA is pretty much C++ with triple-brackets kernel launch. The rest boils down to few attributes and builtin variables that can be implemented/faked in an include file, so parsing bare-bones CUDA source file is not particularly challenging. So, yes, it is doable.</div><span class="im" style="font-size:12.8000001907349px"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>4) It is not clear to me if using some other tools, like libnvvm, is<br>possible to at least generate LLVM IR from CUDA source (and then maybe<br>compile it on x86?). (Or maybe only from PTX? Is that the so called NVVM<br>IR?)<br></blockquote><div><br></div></span><div style="font-size:12.8000001907349px">If I understand it correctly, libnvvm provides GPU-specific optimizations on IR level. I.e. front-end (clang) would generate IR, libnvvm would optimize it, and then back-end (llvm) would generate PTX. As far as I can tell, it never sees CUDA source and thus can't help you.</div><span class="im" style="font-size:12.8000001907349px"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>Any answer about any possible solution will be very appreciated!<br></blockquote><div><br></div></span><div style="font-size:12.8000001907349px">Bottom line is that if you can live without CUDA headers, clang is somewhat usable right now.</div><div class="" style="font-size:12.8000001907349px"><div id=":1de" class="" tabindex="0"><br></div><div id=":1de" class="" tabindex="0">--Artem</div><div id=":1de" class="" tabindex="0"><img class="" src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif"></div></div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Thanks in advance,<br>

Luca<br><br></blockquote></div>

</div></div>