<div dir="ltr"><div>Thanks very much, Hal! At first glance, your code makes it look like a lot more effort than I was hoping to put into this, but I will give it a study.</div><div><br></div><div>Geoff<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 20, 2020 at 3:17 PM Hal Finkel <<a href="mailto:hal.finkel.llvm@gmail.com">hal.finkel.llvm@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  
  <div>

    <p>This sounds very similar to what I had developed here:

      <a href="https://github.com/hfinkel/llvm-project-cxxjit/tree/cxxjit/clang" target="_blank">https://github.com/hfinkel/llvm-project-cxxjit/tree/cxxjit/clang</a>

      -- please look at the code in

<a href="https://github.com/hfinkel/llvm-project-cxxjit/blob/cxxjit/clang/lib/CodeGen/JIT.cpp" target="_blank">https://github.com/hfinkel/llvm-project-cxxjit/blob/cxxjit/clang/lib/CodeGen/JIT.cpp</a>,

      etc., for an example of how you can get JIT'd CUDA kernels up and

      running.</p>

    <p> -Hal<br>

    </p>

    <div>On 11/19/20 12:10 PM, Geoff Levner via

      llvm-dev wrote:<br>

    </div>

    <blockquote type="cite">

      
      <div dir="ltr">

        <div>I have made a bit of progress... When compiling CUDA source

          code in memory, the Compilation instance returned by

          Driver::BuildCompilation() contains two clang Commands: one

          for the host and one for the CUDA device. I can execute both

          commands using EmitLLVMOnlyActions. I add the Module from the

          host compilation to my JIT as usual, but... what to do with

          the Module from the device compilation? If I just add it to

          the JIT, I get an error message like this:</div>

        <div><br>

        </div>

        <div>    Added modules have incompatible data layouts:

          e-i64:64-i128:128-v16:16-v32:32-n16:32:64 (module) vs

          e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128

          (jit)</div>

        <div><br>

        </div>

        <div>Any suggestions as to what to do with the Module containing

          CUDA kernel code, so that the host Module can invoke it?</div>

        <div><br>

        </div>

        <div>Geoff<br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Tue, Nov 17, 2020 at 6:39

            PM Geoff Levner <<a href="mailto:glevner@gmail.com" target="_blank">glevner@gmail.com</a>> wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div dir="ltr">

              <div>We have an application that allows the user to

                compile and execute C++ code on the fly, using Orc JIT

                v2, via the LLJIT class. And we would like to extend it

                to allow the user to provide CUDA source code as well,

                for GPU programming. But I am having a hard time

                figuring out how to do it.</div>

              <div><br>

              </div>

              <div>To JIT compile C++ code, we do basically as follows:</div>

              <div><br>

              </div>

              <div>1. call Driver::BuildCompilation(), which returns a

                clang Command to execute</div>

              <div>2. create a CompilerInvocation using the arguments

                from the Command</div>

              <div>3. create a CompilerInstance around the

                CompilerInvocation</div>

              <div>4. use the CompilerInstance to execute an

                EmitLLVMOnlyAction</div>

              <div>5. retrieve the resulting Module from the action and

                add it to the JIT</div>

              <div><br>

              </div>

              <div>But to compile C++ requires only a single clang

                command. When you add CUDA to the equation, you add

                several other steps. If you use the clang front end to

                compile, clang does the following:</div>

              <div><br>

              </div>

              <div>1. compiles the driver source code<br>

              </div>

              <div>2. compiles the resulting PTX code using the CUDA

                ptxas command<br>

              </div>

              <div>3. builds a "fat binary" using the CUDA fatbinary

                command</div>

              <div>4. compiles the host source code and links in the fat

                binary</div>

              <div><br>

              </div>

              <div>So my question is: how do we replicate that process

                in memory, to generate modules that we can add to our

                JIT?</div>

              <div><br>

              </div>

              <div>I am no CUDA expert, and not much of a clang expert

                either, so if anyone out there can point me in the right

                direction, I would be grateful.</div>

              <div><br>

              </div>

              <div>Geoff</div>

              <div><br>

              </div>

            </div>

          </blockquote>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

  </div>


</blockquote></div>