<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <blockquote type="cite">My impression is that he actually uses nvcc

      to compile the CUDA kernels, not clang</blockquote>

    The constructor here looks very much like the CUDA command line

    options are added to a clang::CompilerInstance, I might be wrong,

    but you could try to follow the trace and see where it ends up:<br>

    <br>

<a class="moz-txt-link-freetext" href="https://github.com/root-project/cling/blob/master/lib/Interpreter/IncrementalCUDADeviceCompiler.cpp">https://github.com/root-project/cling/blob/master/lib/Interpreter/IncrementalCUDADeviceCompiler.cpp</a><br>

    <br>

    Disclaimer: I am not familiar with the details of Simeons work or

    cling or even with JITing CUDA :) Maybe Simeon can confirm or deny

    my guess.<br>

    <br>

    <br>

    On 22/11/2020 09:09, Vassil Vassilev wrote:<br>

    <blockquote type="cite"

      cite="mid:0649b677-c765-68ea-3d15-801413e6539d@gmail.com"> Adding

      Simeon in the loop for Cling and CUDA. </blockquote>

    Thanks, hi Simeon!<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 22/11/2020 09:22, Geoff Levner

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAHMBa1sOAaKtLxuHsCQRs_4CA+3DAgdo=d9SWtvr_F5LS8-jZw@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="auto">

        <div>Hi, Stefan.

          <div dir="auto"><br>

          </div>

          <div dir="auto">Yes, when compiling from the command line,

            clang does all the work for you transparently. But behind

            the scenes it performs two passes: one to compile source

            code for the host, and one to compile CUDA kernels. </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">When compiling in memory, as far as I can

            tell, you have to perform those two passes yourself. And the

            CUDA pass produces a Module that is incompatible with the

            host Module. You cannot simply add it to the JIT. I don't

            know what to do with it. </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">And yes, I did watch Simeon's presentation,

            but he didn't get into that level of detail (or if he did, I

            missed it). My impression is that he actually uses nvcc to

            compile the CUDA kernels, not clang, using his own parser to

            separate and adapt the source code... </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Thanks, </div>

          <div dir="auto">Geoff </div>

          <br>

          <br>

          <div class="gmail_quote">

            <div dir="ltr" class="gmail_attr">Le dim. 22 nov. 2020 à

              01:03, Stefan Gränitz <<a

                href="mailto:stefan.graenitz@gmail.com" target="_blank"

                rel="noreferrer" moz-do-not-send="true">stefan.graenitz@gmail.com</a>>

              a écrit :<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div> Hi Geoff<br>

                <br>

                It looks like clang does that altogether: <a

                  href="https://llvm.org/docs/CompileCudaWithLLVM.html"

                  rel="noreferrer noreferrer" target="_blank"

                  moz-do-not-send="true">https://llvm.org/docs/CompileCudaWithLLVM.html</a><br>

                <br>

                And, probably related: CUDA support has been added to

                Cling and there was a presentation for it at the last

                Dev Meeting <a

                  href="https://www.youtube.com/watch?v=XjjZRhiFDVs"

                  rel="noreferrer noreferrer" target="_blank"

                  moz-do-not-send="true">https://www.youtube.com/watch?v=XjjZRhiFDVs</a><br>

                <br>

                Best,<br>

                Stefan<br>

                <br>

                <div>On 20/11/2020 12:09, Geoff Levner via llvm-dev

                  wrote:<br>

                </div>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div>Thanks for that, Valentin.</div>

                    <div><br>

                    </div>

                    <div>To be sure I understand what you are saying...

                      Assume we are talking about a single .cu file

                      containing both a C++ function and a CUDA kernel

                      that it invokes, using <<<>>>

                      syntax. Are you suggesting that we bypass clang

                      altogether and use the Nvidia API to compile and

                      install the CUDA kernel? If we do that, how will

                      the JIT-compiled C++ function find the kernel?</div>

                    <div><br>

                    </div>

                    <div>Geoff<br>

                    </div>

                  </div>

                  <br>

                  <div class="gmail_quote">

                    <div dir="ltr" class="gmail_attr">On Thu, Nov 19,

                      2020 at 6:34 PM Valentin Churavy <<a

                        href="mailto:v.churavy@gmail.com"

                        rel="noreferrer noreferrer" target="_blank"

                        moz-do-not-send="true">v.churavy@gmail.com</a>>

                      wrote:<br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">

                      <div dir="ltr">

                        <div>Sound right now like you are emitting an

                          LLVM module?<br>

                        </div>

                        <div>The best strategy is probably to use to

                          emit a PTX module and then pass that to the 

                          CUDA driver. This is what we do on the Julia

                          side in CUDA.jl.</div>

                        <div><br>

                        </div>

                        <div>Nvidia has a somewhat helpful tutorial on

                          this at <a

href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp"

                            rel="noreferrer noreferrer" target="_blank"

                            moz-do-not-send="true">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp</a></div>

                        <div>and <a

href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp"

                            rel="noreferrer noreferrer" target="_blank"

                            moz-do-not-send="true">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp</a></div>

                        <div><br>

                        </div>

                        <div>Hope that helps.</div>

                        <div>-V<br>

                        </div>

                        <div><br>

                        </div>

                      </div>

                      <br>

                      <div class="gmail_quote">

                        <div dir="ltr" class="gmail_attr">On Thu, Nov

                          19, 2020 at 12:11 PM Geoff Levner via llvm-dev

                          <<a href="mailto:llvm-dev@lists.llvm.org"

                            rel="noreferrer noreferrer" target="_blank"

                            moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>

                          wrote:<br>

                        </div>

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

                          0.8ex;border-left:1px solid

                          rgb(204,204,204);padding-left:1ex">

                          <div dir="ltr">

                            <div>I have made a bit of progress... When

                              compiling CUDA source code in memory, the

                              Compilation instance returned by

                              Driver::BuildCompilation() contains two

                              clang Commands: one for the host and one

                              for the CUDA device. I can execute both

                              commands using EmitLLVMOnlyActions. I add

                              the Module from the host compilation to my

                              JIT as usual, but... what to do with the

                              Module from the device compilation? If I

                              just add it to the JIT, I get an error

                              message like this:</div>

                            <div><br>

                            </div>

                            <div>    Added modules have incompatible

                              data layouts:

                              e-i64:64-i128:128-v16:16-v32:32-n16:32:64

                              (module) vs

e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128

                              (jit)</div>

                            <div><br>

                            </div>

                            <div>Any suggestions as to what to do with

                              the Module containing CUDA kernel code, so

                              that the host Module can invoke it?</div>

                            <div><br>

                            </div>

                            <div>Geoff<br>

                            </div>

                            <br>

                            <div class="gmail_quote">

                              <div dir="ltr" class="gmail_attr">On Tue,

                                Nov 17, 2020 at 6:39 PM Geoff Levner

                                <<a href="mailto:glevner@gmail.com"

                                  rel="noreferrer noreferrer"

                                  target="_blank" moz-do-not-send="true">glevner@gmail.com</a>>

                                wrote:<br>

                              </div>

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

                                0.8ex;border-left:1px solid

                                rgb(204,204,204);padding-left:1ex">

                                <div dir="ltr">

                                  <div>We have an application that

                                    allows the user to compile and

                                    execute C++ code on the fly, using

                                    Orc JIT v2, via the LLJIT class. And

                                    we would like to extend it to allow

                                    the user to provide CUDA source code

                                    as well, for GPU programming. But I

                                    am having a hard time figuring out

                                    how to do it.</div>

                                  <div><br>

                                  </div>

                                  <div>To JIT compile C++ code, we do

                                    basically as follows:</div>

                                  <div><br>

                                  </div>

                                  <div>1. call

                                    Driver::BuildCompilation(), which

                                    returns a clang Command to execute</div>

                                  <div>2. create a CompilerInvocation

                                    using the arguments from the Command</div>

                                  <div>3. create a CompilerInstance

                                    around the CompilerInvocation</div>

                                  <div>4. use the CompilerInstance to

                                    execute an EmitLLVMOnlyAction</div>

                                  <div>5. retrieve the resulting Module

                                    from the action and add it to the

                                    JIT</div>

                                  <div><br>

                                  </div>

                                  <div>But to compile C++ requires only

                                    a single clang command. When you add

                                    CUDA to the equation, you add

                                    several other steps. If you use the

                                    clang front end to compile, clang

                                    does the following:</div>

                                  <div><br>

                                  </div>

                                  <div>1. compiles the driver source

                                    code<br>

                                  </div>

                                  <div>2. compiles the resulting PTX

                                    code using the CUDA ptxas command<br>

                                  </div>

                                  <div>3. builds a "fat binary" using

                                    the CUDA fatbinary command</div>

                                  <div>4. compiles the host source code

                                    and links in the fat binary</div>

                                  <div><br>

                                  </div>

                                  <div>So my question is: how do we

                                    replicate that process in memory, to

                                    generate modules that we can add to

                                    our JIT?</div>

                                  <div><br>

                                  </div>

                                  <div>I am no CUDA expert, and not much

                                    of a clang expert either, so if

                                    anyone out there can point me in the

                                    right direction, I would be

                                    grateful.</div>

                                  <div><br>

                                  </div>

                                  <div>Geoff</div>

                                  <div><br>

                                  </div>

                                </div>

                              </blockquote>

                            </div>

                          </div>

_______________________________________________<br>

                          LLVM Developers mailing list<br>

                          <a href="mailto:llvm-dev@lists.llvm.org"

                            rel="noreferrer noreferrer" target="_blank"

                            moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

                          <a

                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                            rel="noreferrer noreferrer noreferrer"

                            target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

                        </blockquote>

                      </div>

                    </blockquote>

                  </div>

                  <br>

                  <fieldset></fieldset>

                  <pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:llvm-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

                </blockquote>

                <pre cols="72">-- 

<a href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

<a class="moz-txt-link-freetext" href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>

  </body>

</html>