<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<blockquote type="cite">My impression is that he actually uses nvcc
to compile the CUDA kernels, not clang</blockquote>
The constructor here looks very much like the CUDA command line
options are added to a clang::CompilerInstance, I might be wrong,
but you could try to follow the trace and see where it ends up:<br>
<br>
<a class="moz-txt-link-freetext" href="https://github.com/root-project/cling/blob/master/lib/Interpreter/IncrementalCUDADeviceCompiler.cpp">https://github.com/root-project/cling/blob/master/lib/Interpreter/IncrementalCUDADeviceCompiler.cpp</a><br>
<br>
Disclaimer: I am not familiar with the details of Simeons work or
cling or even with JITing CUDA :) Maybe Simeon can confirm or deny
my guess.<br>
<br>
<br>
On 22/11/2020 09:09, Vassil Vassilev wrote:<br>
<blockquote type="cite"
cite="mid:0649b677-c765-68ea-3d15-801413e6539d@gmail.com"> Adding
Simeon in the loop for Cling and CUDA. </blockquote>
Thanks, hi Simeon!<br>
<br>
<br>
<div class="moz-cite-prefix">On 22/11/2020 09:22, Geoff Levner
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAHMBa1sOAaKtLxuHsCQRs_4CA+3DAgdo=d9SWtvr_F5LS8-jZw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="auto">
<div>Hi, Stefan.
<div dir="auto"><br>
</div>
<div dir="auto">Yes, when compiling from the command line,
clang does all the work for you transparently. But behind
the scenes it performs two passes: one to compile source
code for the host, and one to compile CUDA kernels. </div>
<div dir="auto"><br>
</div>
<div dir="auto">When compiling in memory, as far as I can
tell, you have to perform those two passes yourself. And the
CUDA pass produces a Module that is incompatible with the
host Module. You cannot simply add it to the JIT. I don't
know what to do with it. </div>
<div dir="auto"><br>
</div>
<div dir="auto">And yes, I did watch Simeon's presentation,
but he didn't get into that level of detail (or if he did, I
missed it). My impression is that he actually uses nvcc to
compile the CUDA kernels, not clang, using his own parser to
separate and adapt the source code... </div>
<div dir="auto"><br>
</div>
<div dir="auto">Thanks, </div>
<div dir="auto">Geoff </div>
<br>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Le dim. 22 nov. 2020 à
01:03, Stefan Gränitz <<a
href="mailto:stefan.graenitz@gmail.com" target="_blank"
rel="noreferrer" moz-do-not-send="true">stefan.graenitz@gmail.com</a>>
a écrit :<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div> Hi Geoff<br>
<br>
It looks like clang does that altogether: <a
href="https://llvm.org/docs/CompileCudaWithLLVM.html"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://llvm.org/docs/CompileCudaWithLLVM.html</a><br>
<br>
And, probably related: CUDA support has been added to
Cling and there was a presentation for it at the last
Dev Meeting <a
href="https://www.youtube.com/watch?v=XjjZRhiFDVs"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://www.youtube.com/watch?v=XjjZRhiFDVs</a><br>
<br>
Best,<br>
Stefan<br>
<br>
<div>On 20/11/2020 12:09, Geoff Levner via llvm-dev
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Thanks for that, Valentin.</div>
<div><br>
</div>
<div>To be sure I understand what you are saying...
Assume we are talking about a single .cu file
containing both a C++ function and a CUDA kernel
that it invokes, using <<<>>>
syntax. Are you suggesting that we bypass clang
altogether and use the Nvidia API to compile and
install the CUDA kernel? If we do that, how will
the JIT-compiled C++ function find the kernel?</div>
<div><br>
</div>
<div>Geoff<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Nov 19,
2020 at 6:34 PM Valentin Churavy <<a
href="mailto:v.churavy@gmail.com"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">v.churavy@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Sound right now like you are emitting an
LLVM module?<br>
</div>
<div>The best strategy is probably to use to
emit a PTX module and then pass that to the
CUDA driver. This is what we do on the Julia
side in CUDA.jl.</div>
<div><br>
</div>
<div>Nvidia has a somewhat helpful tutorial on
this at <a
href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp</a></div>
<div>and <a
href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp</a></div>
<div><br>
</div>
<div>Hope that helps.</div>
<div>-V<br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Nov
19, 2020 at 12:11 PM Geoff Levner via llvm-dev
<<a href="mailto:llvm-dev@lists.llvm.org"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>I have made a bit of progress... When
compiling CUDA source code in memory, the
Compilation instance returned by
Driver::BuildCompilation() contains two
clang Commands: one for the host and one
for the CUDA device. I can execute both
commands using EmitLLVMOnlyActions. I add
the Module from the host compilation to my
JIT as usual, but... what to do with the
Module from the device compilation? If I
just add it to the JIT, I get an error
message like this:</div>
<div><br>
</div>
<div> Added modules have incompatible
data layouts:
e-i64:64-i128:128-v16:16-v32:32-n16:32:64
(module) vs
e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128
(jit)</div>
<div><br>
</div>
<div>Any suggestions as to what to do with
the Module containing CUDA kernel code, so
that the host Module can invoke it?</div>
<div><br>
</div>
<div>Geoff<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue,
Nov 17, 2020 at 6:39 PM Geoff Levner
<<a href="mailto:glevner@gmail.com"
rel="noreferrer noreferrer"
target="_blank" moz-do-not-send="true">glevner@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>We have an application that
allows the user to compile and
execute C++ code on the fly, using
Orc JIT v2, via the LLJIT class. And
we would like to extend it to allow
the user to provide CUDA source code
as well, for GPU programming. But I
am having a hard time figuring out
how to do it.</div>
<div><br>
</div>
<div>To JIT compile C++ code, we do
basically as follows:</div>
<div><br>
</div>
<div>1. call
Driver::BuildCompilation(), which
returns a clang Command to execute</div>
<div>2. create a CompilerInvocation
using the arguments from the Command</div>
<div>3. create a CompilerInstance
around the CompilerInvocation</div>
<div>4. use the CompilerInstance to
execute an EmitLLVMOnlyAction</div>
<div>5. retrieve the resulting Module
from the action and add it to the
JIT</div>
<div><br>
</div>
<div>But to compile C++ requires only
a single clang command. When you add
CUDA to the equation, you add
several other steps. If you use the
clang front end to compile, clang
does the following:</div>
<div><br>
</div>
<div>1. compiles the driver source
code<br>
</div>
<div>2. compiles the resulting PTX
code using the CUDA ptxas command<br>
</div>
<div>3. builds a "fat binary" using
the CUDA fatbinary command</div>
<div>4. compiles the host source code
and links in the fat binary</div>
<div><br>
</div>
<div>So my question is: how do we
replicate that process in memory, to
generate modules that we can add to
our JIT?</div>
<div><br>
</div>
<div>I am no CUDA expert, and not much
of a clang expert either, so if
anyone out there can point me in the
right direction, I would be
grateful.</div>
<div><br>
</div>
<div>Geoff</div>
<div><br>
</div>
</div>
</blockquote>
</div>
</div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer noreferrer noreferrer"
target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<pre cols="72">--
<a href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>
</body>
</html>