<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    Adding Simeon in the loop for Cling and CUDA.

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">On 11/22/20 2:03 AM, Stefan Gränitz via

      llvm-dev wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:be4923ac-8173-03d1-c190-755e0baaaa90@gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      Hi Geoff<br>

      <br>

      It looks like clang does that altogether: <a

        class="moz-txt-link-freetext"

        href="https://llvm.org/docs/CompileCudaWithLLVM.html"

        moz-do-not-send="true">https://llvm.org/docs/CompileCudaWithLLVM.html</a><br>

      <br>

      And, probably related: CUDA support has been added to Cling and

      there was a presentation for it at the last Dev Meeting <a

        class="moz-txt-link-freetext"

        href="https://www.youtube.com/watch?v=XjjZRhiFDVs"

        moz-do-not-send="true">https://www.youtube.com/watch?v=XjjZRhiFDVs</a><br>

      <br>

      Best,<br>

      Stefan<br>

      <br>

      <div class="moz-cite-prefix">On 20/11/2020 12:09, Geoff Levner via

        llvm-dev wrote:<br>

      </div>

      <blockquote type="cite"

cite="mid:CAHMBa1sPDq479VG+wdzACiaj8j1XD+_Xm_SkfrvmY8_Q+-ANsg@mail.gmail.com">

        <meta http-equiv="content-type" content="text/html;

          charset=UTF-8">

        <div dir="ltr">

          <div>Thanks for that, Valentin.</div>

          <div><br>

          </div>

          <div>To be sure I understand what you are saying... Assume we

            are talking about a single .cu file containing both a C++

            function and a CUDA kernel that it invokes, using

            <<<>>> syntax. Are you suggesting that we

            bypass clang altogether and use the Nvidia API to compile

            and install the CUDA kernel? If we do that, how will the

            JIT-compiled C++ function find the kernel?</div>

          <div><br>

          </div>

          <div>Geoff<br>

          </div>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Thu, Nov 19, 2020 at 6:34

            PM Valentin Churavy <<a href="mailto:v.churavy@gmail.com"

              moz-do-not-send="true">v.churavy@gmail.com</a>> wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div dir="ltr">

              <div>Sound right now like you are emitting an LLVM module?<br>

              </div>

              <div>The best strategy is probably to use to emit a PTX

                module and then pass that to the  CUDA driver. This is

                what we do on the Julia side in CUDA.jl.</div>

              <div><br>

              </div>

              <div>Nvidia has a somewhat helpful tutorial on this at <a

href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp"

                  target="_blank" moz-do-not-send="true">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp</a></div>

              <div>and <a

href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp"

                  target="_blank" moz-do-not-send="true">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp</a></div>

              <div><br>

              </div>

              <div>Hope that helps.</div>

              <div>-V<br>

              </div>

              <div><br>

              </div>

            </div>

            <br>

            <div class="gmail_quote">

              <div dir="ltr" class="gmail_attr">On Thu, Nov 19, 2020 at

                12:11 PM Geoff Levner via llvm-dev <<a

                  href="mailto:llvm-dev@lists.llvm.org" target="_blank"

                  moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>

                wrote:<br>

              </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

                0.8ex;border-left:1px solid

                rgb(204,204,204);padding-left:1ex">

                <div dir="ltr">

                  <div>I have made a bit of progress... When compiling

                    CUDA source code in memory, the Compilation instance

                    returned by Driver::BuildCompilation() contains two

                    clang Commands: one for the host and one for the

                    CUDA device. I can execute both commands using

                    EmitLLVMOnlyActions. I add the Module from the host

                    compilation to my JIT as usual, but... what to do

                    with the Module from the device compilation? If I

                    just add it to the JIT, I get an error message like

                    this:</div>

                  <div><br>

                  </div>

                  <div>    Added modules have incompatible data layouts:

                    e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module)

                    vs

e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128

                    (jit)</div>

                  <div><br>

                  </div>

                  <div>Any suggestions as to what to do with the Module

                    containing CUDA kernel code, so that the host Module

                    can invoke it?</div>

                  <div><br>

                  </div>

                  <div>Geoff<br>

                  </div>

                  <br>

                  <div class="gmail_quote">

                    <div dir="ltr" class="gmail_attr">On Tue, Nov 17,

                      2020 at 6:39 PM Geoff Levner <<a

                        href="mailto:glevner@gmail.com" target="_blank"

                        moz-do-not-send="true">glevner@gmail.com</a>>

                      wrote:<br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">

                      <div dir="ltr">

                        <div>We have an application that allows the user

                          to compile and execute C++ code on the fly,

                          using Orc JIT v2, via the LLJIT class. And we

                          would like to extend it to allow the user to

                          provide CUDA source code as well, for GPU

                          programming. But I am having a hard time

                          figuring out how to do it.</div>

                        <div><br>

                        </div>

                        <div>To JIT compile C++ code, we do basically as

                          follows:</div>

                        <div><br>

                        </div>

                        <div>1. call Driver::BuildCompilation(), which

                          returns a clang Command to execute</div>

                        <div>2. create a CompilerInvocation using the

                          arguments from the Command</div>

                        <div>3. create a CompilerInstance around the

                          CompilerInvocation</div>

                        <div>4. use the CompilerInstance to execute an

                          EmitLLVMOnlyAction</div>

                        <div>5. retrieve the resulting Module from the

                          action and add it to the JIT</div>

                        <div><br>

                        </div>

                        <div>But to compile C++ requires only a single

                          clang command. When you add CUDA to the

                          equation, you add several other steps. If you

                          use the clang front end to compile, clang does

                          the following:</div>

                        <div><br>

                        </div>

                        <div>1. compiles the driver source code<br>

                        </div>

                        <div>2. compiles the resulting PTX code using

                          the CUDA ptxas command<br>

                        </div>

                        <div>3. builds a "fat binary" using the CUDA

                          fatbinary command</div>

                        <div>4. compiles the host source code and links

                          in the fat binary</div>

                        <div><br>

                        </div>

                        <div>So my question is: how do we replicate that

                          process in memory, to generate modules that we

                          can add to our JIT?</div>

                        <div><br>

                        </div>

                        <div>I am no CUDA expert, and not much of a

                          clang expert either, so if anyone out there

                          can point me in the right direction, I would

                          be grateful.</div>

                        <div><br>

                        </div>

                        <div>Geoff</div>

                        <div><br>

                        </div>

                      </div>

                    </blockquote>

                  </div>

                </div>

                _______________________________________________<br>

                LLVM Developers mailing list<br>

                <a href="mailto:llvm-dev@lists.llvm.org" target="_blank"

                  moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

                <a

                  href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                  rel="noreferrer" target="_blank"

                  moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

              </blockquote>

            </div>

          </blockquote>

        </div>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <pre class="moz-quote-pre" wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

      </blockquote>

      <pre class="moz-signature" cols="72">-- 

<a class="moz-txt-link-freetext" href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com" moz-do-not-send="true">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>