<div dir="ltr"><div>Indeed, Clang currently treats device-side compilation in each CUDA file as a whole program compilation. I.e. the result of it is a GPU executable.</div><div><br></div><div>I think clang should be able to compile code with 'extern __device__', and it's ptxas that's unhappy to see unresolved symbols because it expects to see the whole program. Someone/somewhere needs to eventually turn GPU object file into a GPU executable.</div><div><br></div><div>It should be possible to make it work. </div><div>First step is to tell ptxas to compile to a GPU object file:</div><div>* Add separate compilation support in driver. At the very minimum driver should pass appropriate flags to ptxas and warn/error if it's not supported. You may be able to get by with just "-Xcuda-ptxas -c" + external nvlink of CUDA files into a single partially linked .o.</div><div><br></div><div>As for who does GPU-side linking, we should perhaps consider running nvlink completely outside of clang. I.e. clang will produce GPU object files, if required, but it would be up to user's build system to link them together with nvlink before the final linking of the host executable. If that's acceptable, that's probably all you need. </div><div><br></div><div>If you want clang to do GPU-side linking, then there are more things to do.</div><div>* Figure out who's supposed to run nvlink. Driver will need to be augmented to run it. The problem here is that there's no easy way to tell if any of the given .o files given to clang during linking phase contain GPU executables, so it will most likely be controlled by a global flag which would insert another step into the compilation pipeline which will invoke nvlink on object files and would pass .o with partially linked host+GPU executable to the host linker. </div><div>* Figure out whether the way GPU binaries are embedded in host .o is compatible with nvlink and implement missing bits, if necessary.</div><div>* Figure out whether .o (or executable) produced by nvlink is something that Clang-generated init code can still work with. Fix it, if broken.</div><div><br></div><div>This should be about it.</div><div><br></div><div>--Artem</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 16, 2017 at 3:28 PM, Jakub Beránek via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto"><div dir="auto">Clang currently doesn't support CUDA separate compilation and thus extern __device__ functions and variables cannot be used.</div><div dir="auto"><br></div><div dir="auto">Could someone give me any pointers where to look or what has to be done to support this? If at all possible, I'd like to see what's missing and possibly try to tackle it.</div></div>

<br>______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Artem Belevich</div></div>

</div>