<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi Lang,</p>
    <p>thank You very much. I've used Your code and the creating of the
      object file works. I think the problem is after creating the
      object file. When I link the object file with ld I get an
      executable, which is working right.</p>
    <p>After changing the clang and llvm libraries from the package
      control version (.deb) to a own compiled version with debug
      options, I get an assert() fault.<br>
      In <br>
      void RuntimeDyldELF::resolveX86_64Relocation() at the case
      ELF::R_X86_64_PC32 <br>
      this will throw an assert. You can find the code in the file
      llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp . I don't
      know exactly, what this function do but after first research, I
      think it has something to do with the linking. Maybe You know more
      about the function?</p>
    <p>Your code also helps me to understand more, how the interpreter
      library works. I have also some new ideas, how I could find the
      concrete problem and solve it.</p>
    Cheers,<br>
    Simeon<br>
    <br>
    <div class="moz-cite-prefix">Am 09.11.2017 um 00:59 schrieb Lang
      Hames:<br>
    </div>
    <blockquote type="cite"
cite="mid:CALLttgqHaBTinwVqC3PGf01KVa8m-0iEaAoSz7SA00+TKWfkDw@mail.gmail.com">
      <div dir="ltr">Hi Simon,
        <div><br>
        </div>
        <div>I think the best thing would be to add an
          ObjectTransformLayer between your CompileLayer and
          LinkingLayer so that you can capture the object files as
          they're generated. Then you can inspect the object files being
          generated by the compiler to see what might be wrong with
          them.</div>
        <div><br>
        </div>
        <div>Something like this:</div>
        <div><br>
        </div>
        <div>
          <div><font face="monospace, monospace">class KaleidoscopeJIT {</font></div>
          <div><font face="monospace, monospace">private:</font></div>
          <div><font face="monospace, monospace"><br>
            </font></div>
          <div><font face="monospace, monospace">  using ObjectPtr =
              std::shared_ptr<object::OwningBinary<object::ObjectFile>>;</font></div>
          <div><font face="monospace, monospace"><br>
            </font></div>
          <div><font face="monospace, monospace">  static ObjectPtr
              dumpObject(ObjectPtr Obj) {</font></div>
          <div><font face="monospace, monospace">   
              SmallVector<char, 256> UniqueObjFileName;</font></div>
          <div><font face="monospace, monospace">   
              sys::fs::createUniqueFile("jit-object-%%%.o",
              UniqueObjFileName);</font></div>
          <div><font face="monospace, monospace">    std::error_code EC;</font></div>
          <div><font face="monospace, monospace">    raw_fd_ostream
              ObjFileStream(UniqueObjFileName.data(), EC,
              sys::fs::F_RW);</font></div>
          <div><font face="monospace, monospace">   
              ObjFileStream.write(Obj->getBinary()->getData().data(),
                                                                       
                                                                       
                                                                       
                                                                       
                                                                       
                                                    </font></div>
          <div><font face="monospace, monospace">                       
              Obj->getBinary()->getData().size());                
                                                                       
                                                                       
                                                                       
                                                                       
                                                                       
                                   </font></div>
          <div><font face="monospace, monospace">    return Obj;</font></div>
          <div><font face="monospace, monospace">  }</font></div>
          <div><font face="monospace, monospace"><br>
            </font></div>
          <div><font face="monospace, monospace"> 
              std::unique_ptr<TargetMachine> TM;</font></div>
          <div><font face="monospace, monospace">  const DataLayout DL;</font></div>
          <div><font face="monospace, monospace"> 
              RTDyldObjectLinkingLayer ObjectLayer;</font></div>
          <div><font face="monospace, monospace"> 
              ObjectTransformLayer<decltype(ObjectLayer),</font></div>
          <div><font face="monospace, monospace">                     
               decltype(&KaleidoscopeJIT::dumpObject)>
              DumpObjectsLayer;</font></div>
          <div><font face="monospace, monospace"> 
              IRCompileLayer<decltype(DumpObjectsLayer),
              SimpleCompiler> CompileLayer;</font></div>
          <div><font face="monospace, monospace"><br>
            </font></div>
          <div><font face="monospace, monospace">public:</font></div>
          <div><font face="monospace, monospace">  using ModuleHandle =
              decltype(CompileLayer)::ModuleHandleT;</font></div>
          <div><font face="monospace, monospace"><br>
            </font></div>
          <div><font face="monospace, monospace">  KaleidoscopeJIT()</font></div>
          <div><font face="monospace, monospace">      :
              TM(EngineBuilder().selectTarget()),
              DL(TM->createDataLayout()),</font></div>
          <div><font face="monospace, monospace">       
              ObjectLayer([]() { return
              std::make_shared<SectionMemoryManager>(); }),</font></div>
          <div><font face="monospace, monospace">       
              DumpObjectsLayer(ObjectLayer,
              &KaleidoscopeJIT::dumpObject),</font></div>
          <div><font face="monospace, monospace">       
              CompileLayer(DumpObjectsLayer, SimpleCompiler(*TM)) {</font></div>
          <div><font face="monospace, monospace">   
              llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);</font></div>
          <div><font face="monospace, monospace">  }</font></div>
        </div>
        <div><br>
        </div>
        <div>Hope this helps!</div>
        <div><br>
        </div>
        <div>Cheers,</div>
        <div>Lang.</div>
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Sep 27, 2017 at 10:32 AM,
          Simeon Ehrig via llvm-dev <span dir="ltr"><<a
              href="mailto:llvm-dev@lists.llvm.org" target="_blank"
              moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <p>Dear LLVM-Developers and Vinod Grover,<br>
                <br>
                we are trying to extend the cling C++ interpreter (<a
                  class="m_-8219667011232476936moz-txt-link-freetext"
                  href="https://github.com/root-project/cling"
                  target="_blank" moz-do-not-send="true">https://github.com/root-<wbr>project/cling</a>)
                with CUDA functionality for Nvidia GPUs.<br>
                <br>
                I already developed a prototype based on OrcJIT and am
                seeking for feedback. I am currently a stuck with a
                runtime issue, on which my interpreter prototype fails
                to execute kernels with a CUDA runtime error.<br>
              </p>
              <p><br>
                === How to use the prototype<br>
                <br>
                This application interprets cuda runtime code. The
                program needs the whole cuda-program (.cu-file) and its
                pre-compiled device code (as fatbin) as an input:<br>
                <br>
                    command: cuda-interpreter [source].cu
                [kernels].fatbin<br>
                <br>
                I also implemented an alternative mode, which is
                generating an object file. The object file can be linked
                (ld) to an exectuable. This mode is just implemented to
                check if the LLVM module generation works as expected.
                Activate it by changing the define INTERPRET from 1 to 0
                .<br>
                <br>
                === Implementation<br>
                <br>
                The prototype is based on the clang example in<br>
                <br>
                <a class="m_-8219667011232476936moz-txt-link-freetext"
href="https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter"
                  target="_blank" moz-do-not-send="true">https://github.com/llvm-<wbr>mirror/clang/tree/master/<wbr>examples/clang-interpreter</a><br>
                <br>
                I also pushed the source code to github with the install
                instructions and examples:<br>
                  <a
                  class="m_-8219667011232476936moz-txt-link-freetext"
                  href="https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter"
                  target="_blank" moz-do-not-send="true">https://github.com/<wbr>SimeonEhrig/CUDA-Runtime-<wbr>Interpreter</a><br>
                <br>
                The device code generation can be performed with either
                clang's CUDA frontend or NVCC to ptx.<br>
                <br>
                Here is the workflow in five stages:<br>
              </p>
              <ol>
                <li>generating ptx device code (a kind of nvidia
                  assembler)</li>
                <li>translate ptx to sass (machine code of ptx)</li>
                <li>generate a fatbinray (a kind of wrapper for the
                  device code)</li>
                <li>generate host code object file (use fatbinary as
                  input)</li>
                <li>link to executable</li>
              </ol>
              <p>(The exact commands are stored in the commands.txt in
                the github repo)<br>
                <br>
                The interpreter replaces the 4th and 5th step. It
                interprets the host code with pre-compiled device code
                as fatbinary. The fatbinary (Step 1 to 3) will be
                generated with the clang compiler and the nvidia tools
                ptxas and fatbinary.<br>
                <br>
                === Test Cases and Issues<br>
                <br>
                You find the test sources on GitHub in the directory
                "example_prog".<br>
                <br>
                Run the tests with cuda-interpeter and the two arguments
                as above:<br>
                <br>
                 [1] path to the source code in "example_prog"<br>
                     - note: even for host-only code, use the
                file-ending .cu<br>
                     <br>
                 [2] path to the runtime .fatbin<br>
                     - note: needs the file ending .fatbin<br>
                     - a fatbin file is necessary, but if the program
                doesn't need a kernel, the content of the file will
                ignore</p>
              Note: As a prototype, the input is just static and barely
              checked yet.<br>
              <br>
              1. <a href="http://hello.cu" target="_blank"
                moz-do-not-send="true">hello.cu</a>: simple c++ hello
              world program with cmath library call sqrt() -> works
              without problems<br>
              <br>
              2. <a href="http://pthread_test.cu" target="_blank"
                moz-do-not-send="true">pthread_test.cu</a>: c++ program,
              which starts a second thread -> works without problems<br>
              <br>
              3. <a href="http://fat_memory.cu" target="_blank"
                moz-do-not-send="true">fat_memory.cu</a>: use cuda
              library and allocate about 191 MB of VRAM. After the
              allocation, the program waits for 3 seconds, so you can
              check the memory usage with the nvidia-smi -> works
              without problems<br>
              <br>
              4. <a href="http://runtime.cu" target="_blank"
                moz-do-not-send="true">runtime.cu</a>: combine cuda
              library with a simple cuda kernel -> Generating an
              object file, which can be linked (see 5th call in commands
              above -> ld ...) to a working executable.<br>
              <br>
              The last example has the following issues: Running the
              executable works fine. Interpreting the code instead does
              not work. The Cuda Runtime returns the error 8 (<span
                class="m_-8219667011232476936enum-member-name-def">cudaErrorInvalidDeviceFunctio<wbr>n</span>)
              , the kernel failed.<br>
              <br>
              Do you have any idea how to proceed?<br>
              <br>
              <br>
              Best regards,<br>
              Simeon Ehrig </div>
            <br>
            ______________________________<wbr>_________________<br>
            LLVM Developers mailing list<br>
            <a href="mailto:llvm-dev@lists.llvm.org"
              moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
            <a
              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
              rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>