[llvm-dev] CUDA separate compilation

Mon Aug 28 14:02:00 PDT 2017

Indeed, Clang currently treats device-side compilation in each CUDA file as
a whole program compilation. I.e. the result of it is a GPU executable.

I think clang should be able to compile code with 'extern __device__', and
it's ptxas that's unhappy to see unresolved symbols because it expects to
see the whole program. Someone/somewhere needs to eventually turn GPU
object file into a GPU executable.

It should be possible to make it work.
First step is to tell ptxas to compile to a GPU object file:
* Add separate compilation support in driver. At the very minimum driver
should pass appropriate flags to ptxas and warn/error if it's not
supported. You may be able to get by with just "-Xcuda-ptxas -c" + external
nvlink of CUDA files into a single partially linked .o.

As for who does GPU-side linking, we should perhaps consider running nvlink
completely outside of clang. I.e. clang will produce GPU object files, if
required, but it would be up to user's build system to link them together
with nvlink before the final linking of the host executable. If that's
acceptable, that's probably all you need.

If you want clang to do GPU-side linking, then there are more things to do.
* Figure out who's supposed to run nvlink. Driver will need to be augmented
to run it. The problem here is that there's no easy way to tell if any of
the given .o files given to clang during linking phase contain GPU
executables, so it will most likely be controlled by a global flag which
would insert another step into the compilation pipeline which will invoke
nvlink on object files and would pass .o with partially linked host+GPU
executable to the host linker.
* Figure out whether the way GPU binaries are embedded in host .o is
compatible with nvlink and implement missing bits, if necessary.
* Figure out whether .o (or executable) produced by nvlink is something
that Clang-generated init code can still work with. Fix it, if broken.

This should be about it.

--Artem

On Wed, Aug 16, 2017 at 3:28 PM, Jakub Beránek via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Clang currently doesn't support CUDA separate compilation and thus extern
> __device__ functions and variables cannot be used.
>
> Could someone give me any pointers where to look or what has to be done to
> support this? If at all possible, I'd like to see what's missing and
> possibly try to tackle it.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170828/ea40374b/attachment.html>