[llvm-dev] JIT compiling CUDA source code

Sun Nov 22 00:22:09 PST 2020

Hi, Stefan.

Yes, when compiling from the command line, clang does all the work for you
transparently. But behind the scenes it performs two passes: one to compile
source code for the host, and one to compile CUDA kernels.

When compiling in memory, as far as I can tell, you have to perform those
two passes yourself. And the CUDA pass produces a Module that is
incompatible with the host Module. You cannot simply add it to the JIT. I
don't know what to do with it.

And yes, I did watch Simeon's presentation, but he didn't get into that
level of detail (or if he did, I missed it). My impression is that he
actually uses nvcc to compile the CUDA kernels, not clang, using his own
parser to separate and adapt the source code...

Thanks,
Geoff

Le dim. 22 nov. 2020 à 01:03, Stefan Gränitz <stefan.graenitz at gmail.com> a
écrit :

> Hi Geoff
>
> It looks like clang does that altogether:
> https://llvm.org/docs/CompileCudaWithLLVM.html
>
> And, probably related: CUDA support has been added to Cling and there was
> a presentation for it at the last Dev Meeting
> https://www.youtube.com/watch?v=XjjZRhiFDVs
>
> Best,
> Stefan
>
> On 20/11/2020 12:09, Geoff Levner via llvm-dev wrote:
>
> Thanks for that, Valentin.
>
> To be sure I understand what you are saying... Assume we are talking about
> a single .cu file containing both a C++ function and a CUDA kernel that it
> invokes, using <<<>>> syntax. Are you suggesting that we bypass clang
> altogether and use the Nvidia API to compile and install the CUDA kernel?
> If we do that, how will the JIT-compiled C++ function find the kernel?
>
> Geoff
>
> On Thu, Nov 19, 2020 at 6:34 PM Valentin Churavy <v.churavy at gmail.com>
> wrote:
>
>> Sound right now like you are emitting an LLVM module?
>> The best strategy is probably to use to emit a PTX module and then pass
>> that to the  CUDA driver. This is what we do on the Julia side in CUDA.jl.
>>
>> Nvidia has a somewhat helpful tutorial on this at
>> https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp
>> and
>> https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp
>>
>> Hope that helps.
>> -V
>>
>>
>> On Thu, Nov 19, 2020 at 12:11 PM Geoff Levner via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> I have made a bit of progress... When compiling CUDA source code in
>>> memory, the Compilation instance returned by Driver::BuildCompilation()
>>> contains two clang Commands: one for the host and one for the CUDA device.
>>> I can execute both commands using EmitLLVMOnlyActions. I add the Module
>>> from the host compilation to my JIT as usual, but... what to do with the
>>> Module from the device compilation? If I just add it to the JIT, I get an
>>> error message like this:
>>>
>>>     Added modules have incompatible data layouts:
>>> e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs
>>> e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128 (jit)
>>>
>>> Any suggestions as to what to do with the Module containing CUDA kernel
>>> code, so that the host Module can invoke it?
>>>
>>> Geoff
>>>
>>> On Tue, Nov 17, 2020 at 6:39 PM Geoff Levner <glevner at gmail.com> wrote:
>>>
>>>> We have an application that allows the user to compile and execute C++
>>>> code on the fly, using Orc JIT v2, via the LLJIT class. And we would like
>>>> to extend it to allow the user to provide CUDA source code as well, for GPU
>>>> programming. But I am having a hard time figuring out how to do it.
>>>>
>>>> To JIT compile C++ code, we do basically as follows:
>>>>
>>>> 1. call Driver::BuildCompilation(), which returns a clang Command to
>>>> execute
>>>> 2. create a CompilerInvocation using the arguments from the Command
>>>> 3. create a CompilerInstance around the CompilerInvocation
>>>> 4. use the CompilerInstance to execute an EmitLLVMOnlyAction
>>>> 5. retrieve the resulting Module from the action and add it to the JIT
>>>>
>>>> But to compile C++ requires only a single clang command. When you add
>>>> CUDA to the equation, you add several other steps. If you use the clang
>>>> front end to compile, clang does the following:
>>>>
>>>> 1. compiles the driver source code
>>>> 2. compiles the resulting PTX code using the CUDA ptxas command
>>>> 3. builds a "fat binary" using the CUDA fatbinary command
>>>> 4. compiles the host source code and links in the fat binary
>>>>
>>>> So my question is: how do we replicate that process in memory, to
>>>> generate modules that we can add to our JIT?
>>>>
>>>> I am no CUDA expert, and not much of a clang expert either, so if
>>>> anyone out there can point me in the right direction, I would be grateful.
>>>>
>>>> Geoff
>>>>
>>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> -- https://flowcrypt.com/pub/stefan.graenitz@gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201122/9c04f2ba/attachment.html>