<div dir="auto"><div>Hi, Stefan.<div dir="auto"><br></div><div dir="auto">Yes, when compiling from the command line, clang does all the work for you transparently. But behind the scenes it performs two passes: one to compile source code for the host, and one to compile CUDA kernels. </div><div dir="auto"><br></div><div dir="auto">When compiling in memory, as far as I can tell, you have to perform those two passes yourself. And the CUDA pass produces a Module that is incompatible with the host Module. You cannot simply add it to the JIT. I don't know what to do with it. </div><div dir="auto"><br></div><div dir="auto">And yes, I did watch Simeon's presentation, but he didn't get into that level of detail (or if he did, I missed it). My impression is that he actually uses nvcc to compile the CUDA kernels, not clang, using his own parser to separate and adapt the source code... </div><div dir="auto"><br></div><div dir="auto">Thanks, </div><div dir="auto">Geoff </div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le dim. 22 nov. 2020 à 01:03, Stefan Gränitz <<a href="mailto:stefan.graenitz@gmail.com" target="_blank" rel="noreferrer">stefan.graenitz@gmail.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div>
    Hi Geoff<br>
    <br>
    It looks like clang does that altogether:
    <a href="https://llvm.org/docs/CompileCudaWithLLVM.html" rel="noreferrer noreferrer" target="_blank">https://llvm.org/docs/CompileCudaWithLLVM.html</a><br>
    <br>
    And, probably related: CUDA support has been added to Cling and
    there was a presentation for it at the last Dev Meeting
    <a href="https://www.youtube.com/watch?v=XjjZRhiFDVs" rel="noreferrer noreferrer" target="_blank">https://www.youtube.com/watch?v=XjjZRhiFDVs</a><br>
    <br>
    Best,<br>
    Stefan<br>
    <br>
    <div>On 20/11/2020 12:09, Geoff Levner via
      llvm-dev wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>Thanks for that, Valentin.</div>
        <div><br>
        </div>
        <div>To be sure I understand what you are saying... Assume we
          are talking about a single .cu file containing both a C++
          function and a CUDA kernel that it invokes, using
          <<<>>> syntax. Are you suggesting that we
          bypass clang altogether and use the Nvidia API to compile and
          install the CUDA kernel? If we do that, how will the
          JIT-compiled C++ function find the kernel?</div>
        <div><br>
        </div>
        <div>Geoff<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Thu, Nov 19, 2020 at 6:34
          PM Valentin Churavy <<a href="mailto:v.churavy@gmail.com" rel="noreferrer noreferrer" target="_blank">v.churavy@gmail.com</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">
            <div>Sound right now like you are emitting an LLVM module?<br>
            </div>
            <div>The best strategy is probably to use to emit a PTX
              module and then pass that to the  CUDA driver. This is
              what we do on the Julia side in CUDA.jl.</div>
            <div><br>
            </div>
            <div>Nvidia has a somewhat helpful tutorial on this at <a href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp" rel="noreferrer noreferrer" target="_blank">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp</a></div>
            <div>and <a href="https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp" rel="noreferrer noreferrer" target="_blank">https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/simpleDrvRuntime/simpleDrvRuntime.cpp</a></div>
            <div><br>
            </div>
            <div>Hope that helps.</div>
            <div>-V<br>
            </div>
            <div><br>
            </div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Thu, Nov 19, 2020 at
              12:11 PM Geoff Levner via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">llvm-dev@lists.llvm.org</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div dir="ltr">
                <div>I have made a bit of progress... When compiling
                  CUDA source code in memory, the Compilation instance
                  returned by Driver::BuildCompilation() contains two
                  clang Commands: one for the host and one for the CUDA
                  device. I can execute both commands using
                  EmitLLVMOnlyActions. I add the Module from the host
                  compilation to my JIT as usual, but... what to do with
                  the Module from the device compilation? If I just add
                  it to the JIT, I get an error message like this:</div>
                <div><br>
                </div>
                <div>    Added modules have incompatible data layouts:
                  e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs
e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128
                  (jit)</div>
                <div><br>
                </div>
                <div>Any suggestions as to what to do with the Module
                  containing CUDA kernel code, so that the host Module
                  can invoke it?</div>
                <div><br>
                </div>
                <div>Geoff<br>
                </div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Tue, Nov 17, 2020
                    at 6:39 PM Geoff Levner <<a href="mailto:glevner@gmail.com" rel="noreferrer noreferrer" target="_blank">glevner@gmail.com</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div dir="ltr">
                      <div>We have an application that allows the user
                        to compile and execute C++ code on the fly,
                        using Orc JIT v2, via the LLJIT class. And we
                        would like to extend it to allow the user to
                        provide CUDA source code as well, for GPU
                        programming. But I am having a hard time
                        figuring out how to do it.</div>
                      <div><br>
                      </div>
                      <div>To JIT compile C++ code, we do basically as
                        follows:</div>
                      <div><br>
                      </div>
                      <div>1. call Driver::BuildCompilation(), which
                        returns a clang Command to execute</div>
                      <div>2. create a CompilerInvocation using the
                        arguments from the Command</div>
                      <div>3. create a CompilerInstance around the
                        CompilerInvocation</div>
                      <div>4. use the CompilerInstance to execute an
                        EmitLLVMOnlyAction</div>
                      <div>5. retrieve the resulting Module from the
                        action and add it to the JIT</div>
                      <div><br>
                      </div>
                      <div>But to compile C++ requires only a single
                        clang command. When you add CUDA to the
                        equation, you add several other steps. If you
                        use the clang front end to compile, clang does
                        the following:</div>
                      <div><br>
                      </div>
                      <div>1. compiles the driver source code<br>
                      </div>
                      <div>2. compiles the resulting PTX code using the
                        CUDA ptxas command<br>
                      </div>
                      <div>3. builds a "fat binary" using the CUDA
                        fatbinary command</div>
                      <div>4. compiles the host source code and links in
                        the fat binary</div>
                      <div><br>
                      </div>
                      <div>So my question is: how do we replicate that
                        process in memory, to generate modules that we
                        can add to our JIT?</div>
                      <div><br>
                      </div>
                      <div>I am no CUDA expert, and not much of a clang
                        expert either, so if anyone out there can point
                        me in the right direction, I would be grateful.</div>
                      <div><br>
                      </div>
                      <div>Geoff</div>
                      <div><br>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </div>
              _______________________________________________<br>
              LLVM Developers mailing list<br>
              <a href="mailto:llvm-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">llvm-dev@lists.llvm.org</a><br>
              <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
            </blockquote>
          </div>
        </blockquote>
      </div>
      <br>
      <fieldset></fieldset>
      <pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" rel="noreferrer noreferrer" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <pre cols="72">-- 
<a href="https://flowcrypt.com/pub/stefan.graenitz@gmail.com" rel="noreferrer noreferrer" target="_blank">https://flowcrypt.com/pub/stefan.graenitz@gmail.com</a></pre>
  </div>

</blockquote></div></div></div>