[llvm-dev] JIT compiling CUDA source code

Sun Nov 22 00:09:44 PST 2020

Adding Simeon in the loop for Cling and CUDA.

On 11/22/20 2:03 AM, Stefan Gränitz via llvm-dev wrote:
> Hi Geoff
>
> It looks like clang does that altogether: 
> https://llvm.org/docs/CompileCudaWithLLVM.html
>
> And, probably related: CUDA support has been added to Cling and there 
> was a presentation for it at the last Dev Meeting 
> https://www.youtube.com/watch?v=XjjZRhiFDVs
>
> Best,
> Stefan
>
> On 20/11/2020 12:09, Geoff Levner via llvm-dev wrote:
>> Thanks for that, Valentin.
>>
>> To be sure I understand what you are saying... Assume we are talking 
>> about a single .cu file containing both a C++ function and a CUDA 
>> kernel that it invokes, using <<<>>> syntax. Are you suggesting that 
>> we bypass clang altogether and use the Nvidia API to compile and 
>> install the CUDA kernel? If we do that, how will the JIT-compiled C++ 
>> function find the kernel?
>>
>> Geoff
>>
>> On Thu, Nov 19, 2020 at 6:34 PM Valentin Churavy <v.churavy at gmail.com 
>> <mailto:v.churavy at gmail.com>> wrote:
>>
>>     Sound right now like you are emitting an LLVM module?
>>     The best strategy is probably to use to emit a PTX module and
>>     then pass that to the  CUDA driver. This is what we do on the
>>     Julia side in CUDA.jl.
>>
>>     Nvidia has a somewhat helpful tutorial on this at
>>     https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp
>>     <https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp>
>>     and
>>     https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp
>>     <https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp>
>>
>>     Hope that helps.
>>     -V
>>
>>
>>     On Thu, Nov 19, 2020 at 12:11 PM Geoff Levner via llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>         I have made a bit of progress... When compiling CUDA source
>>         code in memory, the Compilation instance returned by
>>         Driver::BuildCompilation() contains two clang Commands: one
>>         for the host and one for the CUDA device. I can execute both
>>         commands using EmitLLVMOnlyActions. I add the Module from the
>>         host compilation to my JIT as usual, but... what to do with
>>         the Module from the device compilation? If I just add it to
>>         the JIT, I get an error message like this:
>>
>>             Added modules have incompatible data layouts:
>>         e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs
>>         e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128
>>         (jit)
>>
>>         Any suggestions as to what to do with the Module containing
>>         CUDA kernel code, so that the host Module can invoke it?
>>
>>         Geoff
>>
>>         On Tue, Nov 17, 2020 at 6:39 PM Geoff Levner
>>         <glevner at gmail.com <mailto:glevner at gmail.com>> wrote:
>>
>>             We have an application that allows the user to compile
>>             and execute C++ code on the fly, using Orc JIT v2, via
>>             the LLJIT class. And we would like to extend it to allow
>>             the user to provide CUDA source code as well, for GPU
>>             programming. But I am having a hard time figuring out how
>>             to do it.
>>
>>             To JIT compile C++ code, we do basically as follows:
>>
>>             1. call Driver::BuildCompilation(), which returns a clang
>>             Command to execute
>>             2. create a CompilerInvocation using the arguments from
>>             the Command
>>             3. create a CompilerInstance around the CompilerInvocation
>>             4. use the CompilerInstance to execute an EmitLLVMOnlyAction
>>             5. retrieve the resulting Module from the action and add
>>             it to the JIT
>>
>>             But to compile C++ requires only a single clang command.
>>             When you add CUDA to the equation, you add several other
>>             steps. If you use the clang front end to compile, clang
>>             does the following:
>>
>>             1. compiles the driver source code
>>             2. compiles the resulting PTX code using the CUDA ptxas
>>             command
>>             3. builds a "fat binary" using the CUDA fatbinary command
>>             4. compiles the host source code and links in the fat binary
>>
>>             So my question is: how do we replicate that process in
>>             memory, to generate modules that we can add to our JIT?
>>
>>             I am no CUDA expert, and not much of a clang expert
>>             either, so if anyone out there can point me in the right
>>             direction, I would be grateful.
>>
>>             Geoff
>>
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>         <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> -- 
> https://flowcrypt.com/pub/stefan.graenitz@gmail.com
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201122/1645c454/attachment.html>