[llvm-dev] JIT compiling CUDA source code

Mon Nov 23 02:34:26 PST 2020

> My impression is that he actually uses nvcc to compile the CUDA
> kernels, not clang
The constructor here looks very much like the CUDA command line options
are added to a clang::CompilerInstance, I might be wrong, but you could
try to follow the trace and see where it ends up:

https://github.com/root-project/cling/blob/master/lib/Interpreter/IncrementalCUDADeviceCompiler.cpp

Disclaimer: I am not familiar with the details of Simeons work or cling
or even with JITing CUDA :) Maybe Simeon can confirm or deny my guess.

On 22/11/2020 09:09, Vassil Vassilev wrote:
> Adding Simeon in the loop for Cling and CUDA. 
Thanks, hi Simeon!

On 22/11/2020 09:22, Geoff Levner wrote:
> Hi, Stefan.
>
> Yes, when compiling from the command line, clang does all the work for
> you transparently. But behind the scenes it performs two passes: one
> to compile source code for the host, and one to compile CUDA kernels. 
>
> When compiling in memory, as far as I can tell, you have to perform
> those two passes yourself. And the CUDA pass produces a Module that is
> incompatible with the host Module. You cannot simply add it to the
> JIT. I don't know what to do with it. 
>
> And yes, I did watch Simeon's presentation, but he didn't get into
> that level of detail (or if he did, I missed it). My impression is
> that he actually uses nvcc to compile the CUDA kernels, not clang,
> using his own parser to separate and adapt the source code... 
>
> Thanks, 
> Geoff 
>
>
> Le dim. 22 nov. 2020 à 01:03, Stefan Gränitz
> <stefan.graenitz at gmail.com <mailto:stefan.graenitz at gmail.com>> a écrit :
>
>     Hi Geoff
>
>     It looks like clang does that altogether:
>     https://llvm.org/docs/CompileCudaWithLLVM.html
>     <https://llvm.org/docs/CompileCudaWithLLVM.html>
>
>     And, probably related: CUDA support has been added to Cling and
>     there was a presentation for it at the last Dev Meeting
>     https://www.youtube.com/watch?v=XjjZRhiFDVs
>     <https://www.youtube.com/watch?v=XjjZRhiFDVs>
>
>     Best,
>     Stefan
>
>     On 20/11/2020 12:09, Geoff Levner via llvm-dev wrote:
>>     Thanks for that, Valentin.
>>
>>     To be sure I understand what you are saying... Assume we are
>>     talking about a single .cu file containing both a C++ function
>>     and a CUDA kernel that it invokes, using <<<>>> syntax. Are you
>>     suggesting that we bypass clang altogether and use the Nvidia API
>>     to compile and install the CUDA kernel? If we do that, how will
>>     the JIT-compiled C++ function find the kernel?
>>
>>     Geoff
>>
>>     On Thu, Nov 19, 2020 at 6:34 PM Valentin Churavy
>>     <v.churavy at gmail.com <mailto:v.churavy at gmail.com>> wrote:
>>
>>         Sound right now like you are emitting an LLVM module?
>>         The best strategy is probably to use to emit a PTX module and
>>         then pass that to the  CUDA driver. This is what we do on the
>>         Julia side in CUDA.jl.
>>
>>         Nvidia has a somewhat helpful tutorial on this at
>>         https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp
>>         <https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp>
>>         and
>>         https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp
>>         <https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp>
>>
>>         Hope that helps.
>>         -V
>>
>>
>>         On Thu, Nov 19, 2020 at 12:11 PM Geoff Levner via llvm-dev
>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>             I have made a bit of progress... When compiling CUDA
>>             source code in memory, the Compilation instance returned
>>             by Driver::BuildCompilation() contains two clang
>>             Commands: one for the host and one for the CUDA device. I
>>             can execute both commands using EmitLLVMOnlyActions. I
>>             add the Module from the host compilation to my JIT as
>>             usual, but... what to do with the Module from the device
>>             compilation? If I just add it to the JIT, I get an error
>>             message like this:
>>
>>                 Added modules have incompatible data layouts:
>>             e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs
>>             e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128
>>             (jit)
>>
>>             Any suggestions as to what to do with the Module
>>             containing CUDA kernel code, so that the host Module can
>>             invoke it?
>>
>>             Geoff
>>
>>             On Tue, Nov 17, 2020 at 6:39 PM Geoff Levner
>>             <glevner at gmail.com <mailto:glevner at gmail.com>> wrote:
>>
>>                 We have an application that allows the user to
>>                 compile and execute C++ code on the fly, using Orc
>>                 JIT v2, via the LLJIT class. And we would like to
>>                 extend it to allow the user to provide CUDA source
>>                 code as well, for GPU programming. But I am having a
>>                 hard time figuring out how to do it.
>>
>>                 To JIT compile C++ code, we do basically as follows:
>>
>>                 1. call Driver::BuildCompilation(), which returns a
>>                 clang Command to execute
>>                 2. create a CompilerInvocation using the arguments
>>                 from the Command
>>                 3. create a CompilerInstance around the
>>                 CompilerInvocation
>>                 4. use the CompilerInstance to execute an
>>                 EmitLLVMOnlyAction
>>                 5. retrieve the resulting Module from the action and
>>                 add it to the JIT
>>
>>                 But to compile C++ requires only a single clang
>>                 command. When you add CUDA to the equation, you add
>>                 several other steps. If you use the clang front end
>>                 to compile, clang does the following:
>>
>>                 1. compiles the driver source code
>>                 2. compiles the resulting PTX code using the CUDA
>>                 ptxas command
>>                 3. builds a "fat binary" using the CUDA fatbinary command
>>                 4. compiles the host source code and links in the fat
>>                 binary
>>
>>                 So my question is: how do we replicate that process
>>                 in memory, to generate modules that we can add to our
>>                 JIT?
>>
>>                 I am no CUDA expert, and not much of a clang expert
>>                 either, so if anyone out there can point me in the
>>                 right direction, I would be grateful.
>>
>>                 Geoff
>>
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>             <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>     -- 
>     https://flowcrypt.com/pub/stefan.graenitz@gmail.com <https://flowcrypt.com/pub/stefan.graenitz@gmail.com>
>
-- 
https://flowcrypt.com/pub/stefan.graenitz@gmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201123/da234682/attachment.html>