[llvm-dev] JIT compiling CUDA source code

Thu Nov 19 09:33:50 PST 2020

Sound right now like you are emitting an LLVM module?
The best strategy is probably to use to emit a PTX module and then pass
that to the  CUDA driver. This is what we do on the Julia side in CUDA.jl.

Nvidia has a somewhat helpful tutorial on this at
https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp
and
https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp

Hope that helps.
-V

On Thu, Nov 19, 2020 at 12:11 PM Geoff Levner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> I have made a bit of progress... When compiling CUDA source code in
> memory, the Compilation instance returned by Driver::BuildCompilation()
> contains two clang Commands: one for the host and one for the CUDA device.
> I can execute both commands using EmitLLVMOnlyActions. I add the Module
> from the host compilation to my JIT as usual, but... what to do with the
> Module from the device compilation? If I just add it to the JIT, I get an
> error message like this:
>
>     Added modules have incompatible data layouts:
> e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs
> e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128 (jit)
>
> Any suggestions as to what to do with the Module containing CUDA kernel
> code, so that the host Module can invoke it?
>
> Geoff
>
> On Tue, Nov 17, 2020 at 6:39 PM Geoff Levner <glevner at gmail.com> wrote:
>
>> We have an application that allows the user to compile and execute C++
>> code on the fly, using Orc JIT v2, via the LLJIT class. And we would like
>> to extend it to allow the user to provide CUDA source code as well, for GPU
>> programming. But I am having a hard time figuring out how to do it.
>>
>> To JIT compile C++ code, we do basically as follows:
>>
>> 1. call Driver::BuildCompilation(), which returns a clang Command to
>> execute
>> 2. create a CompilerInvocation using the arguments from the Command
>> 3. create a CompilerInstance around the CompilerInvocation
>> 4. use the CompilerInstance to execute an EmitLLVMOnlyAction
>> 5. retrieve the resulting Module from the action and add it to the JIT
>>
>> But to compile C++ requires only a single clang command. When you add
>> CUDA to the equation, you add several other steps. If you use the clang
>> front end to compile, clang does the following:
>>
>> 1. compiles the driver source code
>> 2. compiles the resulting PTX code using the CUDA ptxas command
>> 3. builds a "fat binary" using the CUDA fatbinary command
>> 4. compiles the host source code and links in the fat binary
>>
>> So my question is: how do we replicate that process in memory, to
>> generate modules that we can add to our JIT?
>>
>> I am no CUDA expert, and not much of a clang expert either, so if anyone
>> out there can point me in the right direction, I would be grateful.
>>
>> Geoff
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201119/9c004b32/attachment.html>