[llvm-dev] CUDA separate compilation

Tue Aug 29 02:34:40 PDT 2017

Thank you for clarification, I'll check the driver and see if I can make
compiling to GPU object files + nvlink work.

Kuba Beránek

2017-08-28 23:02 GMT+02:00 Artem Belevich <tra at google.com>:

> Indeed, Clang currently treats device-side compilation in each CUDA file
> as a whole program compilation. I.e. the result of it is a GPU executable.
>
> I think clang should be able to compile code with 'extern __device__', and
> it's ptxas that's unhappy to see unresolved symbols because it expects to
> see the whole program. Someone/somewhere needs to eventually turn GPU
> object file into a GPU executable.
>
> It should be possible to make it work.
> First step is to tell ptxas to compile to a GPU object file:
> * Add separate compilation support in driver. At the very minimum driver
> should pass appropriate flags to ptxas and warn/error if it's not
> supported. You may be able to get by with just "-Xcuda-ptxas -c" + external
> nvlink of CUDA files into a single partially linked .o.
>
> As for who does GPU-side linking, we should perhaps consider running
> nvlink completely outside of clang. I.e. clang will produce GPU object
> files, if required, but it would be up to user's build system to link them
> together with nvlink before the final linking of the host executable. If
> that's acceptable, that's probably all you need.
>
> If you want clang to do GPU-side linking, then there are more things to do.
> * Figure out who's supposed to run nvlink. Driver will need to be
> augmented to run it. The problem here is that there's no easy way to tell
> if any of the given .o files given to clang during linking phase contain
> GPU executables, so it will most likely be controlled by a global flag
> which would insert another step into the compilation pipeline which will
> invoke nvlink on object files and would pass .o with partially linked
> host+GPU executable to the host linker.
> * Figure out whether the way GPU binaries are embedded in host .o is
> compatible with nvlink and implement missing bits, if necessary.
> * Figure out whether .o (or executable) produced by nvlink is something
> that Clang-generated init code can still work with. Fix it, if broken.
>
> This should be about it.
>
> --Artem
>
>
> On Wed, Aug 16, 2017 at 3:28 PM, Jakub Beránek via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Clang currently doesn't support CUDA separate compilation and thus extern
>> __device__ functions and variables cannot be used.
>>
>> Could someone give me any pointers where to look or what has to be done
>> to support this? If at all possible, I'd like to see what's missing and
>> possibly try to tackle it.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
> --
> --Artem Belevich
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/e8e4c2a4/attachment.html>