[llvm-dev] JIT compiling CUDA source code

Fri Nov 20 03:09:23 PST 2020

Thanks for that, Valentin.

To be sure I understand what you are saying... Assume we are talking about
a single .cu file containing both a C++ function and a CUDA kernel that it
invokes, using <<<>>> syntax. Are you suggesting that we bypass clang
altogether and use the Nvidia API to compile and install the CUDA kernel?
If we do that, how will the JIT-compiled C++ function find the kernel?

Geoff

On Thu, Nov 19, 2020 at 6:34 PM Valentin Churavy <v.churavy at gmail.com>
wrote:

> Sound right now like you are emitting an LLVM module?
> The best strategy is probably to use to emit a PTX module and then pass
> that to the  CUDA driver. This is what we do on the Julia side in CUDA.jl.
>
> Nvidia has a somewhat helpful tutorial on this at
> https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp
> and
> https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp
>
> Hope that helps.
> -V
>
>
> On Thu, Nov 19, 2020 at 12:11 PM Geoff Levner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I have made a bit of progress... When compiling CUDA source code in
>> memory, the Compilation instance returned by Driver::BuildCompilation()
>> contains two clang Commands: one for the host and one for the CUDA device.
>> I can execute both commands using EmitLLVMOnlyActions. I add the Module
>> from the host compilation to my JIT as usual, but... what to do with the
>> Module from the device compilation? If I just add it to the JIT, I get an
>> error message like this:
>>
>>     Added modules have incompatible data layouts:
>> e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs
>> e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128 (jit)
>>
>> Any suggestions as to what to do with the Module containing CUDA kernel
>> code, so that the host Module can invoke it?
>>
>> Geoff
>>
>> On Tue, Nov 17, 2020 at 6:39 PM Geoff Levner <glevner at gmail.com> wrote:
>>
>>> We have an application that allows the user to compile and execute C++
>>> code on the fly, using Orc JIT v2, via the LLJIT class. And we would like
>>> to extend it to allow the user to provide CUDA source code as well, for GPU
>>> programming. But I am having a hard time figuring out how to do it.
>>>
>>> To JIT compile C++ code, we do basically as follows:
>>>
>>> 1. call Driver::BuildCompilation(), which returns a clang Command to
>>> execute
>>> 2. create a CompilerInvocation using the arguments from the Command
>>> 3. create a CompilerInstance around the CompilerInvocation
>>> 4. use the CompilerInstance to execute an EmitLLVMOnlyAction
>>> 5. retrieve the resulting Module from the action and add it to the JIT
>>>
>>> But to compile C++ requires only a single clang command. When you add
>>> CUDA to the equation, you add several other steps. If you use the clang
>>> front end to compile, clang does the following:
>>>
>>> 1. compiles the driver source code
>>> 2. compiles the resulting PTX code using the CUDA ptxas command
>>> 3. builds a "fat binary" using the CUDA fatbinary command
>>> 4. compiles the host source code and links in the fat binary
>>>
>>> So my question is: how do we replicate that process in memory, to
>>> generate modules that we can add to our JIT?
>>>
>>> I am no CUDA expert, and not much of a clang expert either, so if anyone
>>> out there can point me in the right direction, I would be grateful.
>>>
>>> Geoff
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201120/806cc793/attachment.html>