[llvm-dev] JIT compiling CUDA source code

Fri Nov 20 06:17:44 PST 2020

This sounds very similar to what I had developed here: 
https://github.com/hfinkel/llvm-project-cxxjit/tree/cxxjit/clang -- 
please look at the code in 
https://github.com/hfinkel/llvm-project-cxxjit/blob/cxxjit/clang/lib/CodeGen/JIT.cpp, 
etc., for an example of how you can get JIT'd CUDA kernels up and running.

  -Hal

On 11/19/20 12:10 PM, Geoff Levner via llvm-dev wrote:
> I have made a bit of progress... When compiling CUDA source code in 
> memory, the Compilation instance returned by 
> Driver::BuildCompilation() contains two clang Commands: one for the 
> host and one for the CUDA device. I can execute both commands using 
> EmitLLVMOnlyActions. I add the Module from the host compilation to my 
> JIT as usual, but... what to do with the Module from the device 
> compilation? If I just add it to the JIT, I get an error message like 
> this:
>
>     Added modules have incompatible data layouts: 
> e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs 
> e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128 
> (jit)
>
> Any suggestions as to what to do with the Module containing CUDA 
> kernel code, so that the host Module can invoke it?
>
> Geoff
>
> On Tue, Nov 17, 2020 at 6:39 PM Geoff Levner <glevner at gmail.com 
> <mailto:glevner at gmail.com>> wrote:
>
>     We have an application that allows the user to compile and execute
>     C++ code on the fly, using Orc JIT v2, via the LLJIT class. And we
>     would like to extend it to allow the user to provide CUDA source
>     code as well, for GPU programming. But I am having a hard time
>     figuring out how to do it.
>
>     To JIT compile C++ code, we do basically as follows:
>
>     1. call Driver::BuildCompilation(), which returns a clang Command
>     to execute
>     2. create a CompilerInvocation using the arguments from the Command
>     3. create a CompilerInstance around the CompilerInvocation
>     4. use the CompilerInstance to execute an EmitLLVMOnlyAction
>     5. retrieve the resulting Module from the action and add it to the JIT
>
>     But to compile C++ requires only a single clang command. When you
>     add CUDA to the equation, you add several other steps. If you use
>     the clang front end to compile, clang does the following:
>
>     1. compiles the driver source code
>     2. compiles the resulting PTX code using the CUDA ptxas command
>     3. builds a "fat binary" using the CUDA fatbinary command
>     4. compiles the host source code and links in the fat binary
>
>     So my question is: how do we replicate that process in memory, to
>     generate modules that we can add to our JIT?
>
>     I am no CUDA expert, and not much of a clang expert either, so if
>     anyone out there can point me in the right direction, I would be
>     grateful.
>
>     Geoff
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201120/3e12ddab/attachment.html>