<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi LLVM-Developers and Lang,</p>

    <p>I solved the relocation problem and another problem, so I have a

      working cuda-runtime-code-interpreter [1].</p>

    <p>The solution of the relocation problem is, that I change die

      reloc mode from dynamic relocation to PIC (position independent

      code).</p>

    <p>The second problem was, that I got an <span

        class="m_-8219667011232476936enum-member-name-def">cudaErrorInvalidDeviceFunctio<wbr>n

        error (error code 8), if I want to run a kernel. After some

        research, I found out, that the kernel-code will be registered

        (__cudaRegisterFatBinary(...) ) in a global constructor, which

        is generated by the cuda backend (</span><span

        class="m_-8219667011232476936enum-member-name-def">lib/CodeGen/CGCUDANV.cpp).

        The ctor should start before the main function but I called the

        main directly. So the error was caused by running directly the

        main function and skipped the global cuda ctor and dtor. So I

        wrote a fix, which runs the ctor and dtor before and after the

        main and all works fine.</span></p>

    <p><span class="m_-8219667011232476936enum-member-name-def">Cheers,<br>

        Simeon<br>

      </span></p>

    <br>

    <div class="moz-cite-prefix">Am 14.11.2017 um 22:15 schrieb Simeon

      Ehrig:<br>

    </div>

    <blockquote type="cite"

      cite="mid:ff92bab2-a5b4-ba07-e3ac-e2fe93c4e3e9@tu-dresden.de">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p>Hi Lang,</p>

      <p>thank You very much. I've used Your code and the creating of

        the object file works. I think the problem is after creating the

        object file. When I link the object file with ld I get an

        executable, which is working right.</p>

      <p>After changing the clang and llvm libraries from the package

        control version (.deb) to a own compiled version with debug

        options, I get an assert() fault.<br>

        In <br>

        void RuntimeDyldELF::resolveX86_64Relocation() at the case

        ELF::R_X86_64_PC32 <br>

        this will throw an assert. You can find the code in the file

        llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp . I

        don't know exactly, what this function do but after first

        research, I think it has something to do with the linking. Maybe

        You know more about the function?</p>

      <p>Your code also helps me to understand more, how the interpreter

        library works. I have also some new ideas, how I could find the

        concrete problem and solve it.</p>

      Cheers,<br>

      Simeon<br>

      <br>

      <div class="moz-cite-prefix">Am 09.11.2017 um 00:59 schrieb Lang

        Hames:<br>

      </div>

      <blockquote type="cite"

cite="mid:CALLttgqHaBTinwVqC3PGf01KVa8m-0iEaAoSz7SA00+TKWfkDw@mail.gmail.com">

        <div dir="ltr">Hi Simon,

          <div><br>

          </div>

          <div>I think the best thing would be to add an

            ObjectTransformLayer between your CompileLayer and

            LinkingLayer so that you can capture the object files as

            they're generated. Then you can inspect the object files

            being generated by the compiler to see what might be wrong

            with them.</div>

          <div><br>

          </div>

          <div>Something like this:</div>

          <div><br>

          </div>

          <div>

            <div><font face="monospace, monospace">class KaleidoscopeJIT

                {</font></div>

            <div><font face="monospace, monospace">private:</font></div>

            <div><font face="monospace, monospace"><br>

              </font></div>

            <div><font face="monospace, monospace">  using ObjectPtr =

                std::shared_ptr<object::OwningBinary<object::ObjectFile>>;</font></div>

            <div><font face="monospace, monospace"><br>

              </font></div>

            <div><font face="monospace, monospace">  static ObjectPtr

                dumpObject(ObjectPtr Obj) {</font></div>

            <div><font face="monospace, monospace">   

                SmallVector<char, 256> UniqueObjFileName;</font></div>

            <div><font face="monospace, monospace">   

                sys::fs::createUniqueFile("jit-object-%%%.o",

                UniqueObjFileName);</font></div>

            <div><font face="monospace, monospace">    std::error_code

                EC;</font></div>

            <div><font face="monospace, monospace">    raw_fd_ostream

                ObjFileStream(UniqueObjFileName.data(), EC,

                sys::fs::F_RW);</font></div>

            <div><font face="monospace, monospace">   

                ObjFileStream.write(Obj->getBinary()->getData().data(),

                                                                </font></div>

            <div><font face="monospace, monospace">                     

                  Obj->getBinary()->getData().size());            

                                                   </font></div>

            <div><font face="monospace, monospace">    return Obj;</font></div>

            <div><font face="monospace, monospace">  }</font></div>

            <div><font face="monospace, monospace"><br>

              </font></div>

            <div><font face="monospace, monospace"> 

                std::unique_ptr<TargetMachine> TM;</font></div>

            <div><font face="monospace, monospace">  const DataLayout

                DL;</font></div>

            <div><font face="monospace, monospace"> 

                RTDyldObjectLinkingLayer ObjectLayer;</font></div>

            <div><font face="monospace, monospace"> 

                ObjectTransformLayer<decltype(ObjectLayer),</font></div>

            <div><font face="monospace, monospace">                     

                 decltype(&KaleidoscopeJIT::dumpObject)>

                DumpObjectsLayer;</font></div>

            <div><font face="monospace, monospace"> 

                IRCompileLayer<decltype(DumpObjectsLayer),

                SimpleCompiler> CompileLayer;</font></div>

            <div><font face="monospace, monospace"><br>

              </font></div>

            <div><font face="monospace, monospace">public:</font></div>

            <div><font face="monospace, monospace">  using ModuleHandle

                = decltype(CompileLayer)::ModuleHandleT;</font></div>

            <div><font face="monospace, monospace"><br>

              </font></div>

            <div><font face="monospace, monospace">  KaleidoscopeJIT()</font></div>

            <div><font face="monospace, monospace">      :

                TM(EngineBuilder().selectTarget()),

                DL(TM->createDataLayout()),</font></div>

            <div><font face="monospace, monospace">       

                ObjectLayer([]() { return

                std::make_shared<SectionMemoryManager>(); }),</font></div>

            <div><font face="monospace, monospace">       

                DumpObjectsLayer(ObjectLayer,

                &KaleidoscopeJIT::dumpObject),</font></div>

            <div><font face="monospace, monospace">       

                CompileLayer(DumpObjectsLayer, SimpleCompiler(*TM)) {</font></div>

            <div><font face="monospace, monospace">   

                llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);</font></div>

            <div><font face="monospace, monospace">  }</font></div>

          </div>

          <div><br>

          </div>

          <div>Hope this helps!</div>

          <div><br>

          </div>

          <div>Cheers,</div>

          <div>Lang.</div>

          <div><br>

          </div>

        </div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Wed, Sep 27, 2017 at 10:32 AM,

            Simeon Ehrig via llvm-dev <span dir="ltr"><<a

                href="mailto:llvm-dev@lists.llvm.org" target="_blank"

                moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div text="#000000" bgcolor="#FFFFFF">

                <p>Dear LLVM-Developers and Vinod Grover,<br>

                  <br>

                  we are trying to extend the cling C++ interpreter (<a

                    class="m_-8219667011232476936moz-txt-link-freetext"

                    href="https://github.com/root-project/cling"

                    target="_blank" moz-do-not-send="true">https://github.com/root-<wbr>project/cling</a>)

                  with CUDA functionality for Nvidia GPUs.<br>

                  <br>

                  I already developed a prototype based on OrcJIT and am

                  seeking for feedback. I am currently a stuck with a

                  runtime issue, on which my interpreter prototype fails

                  to execute kernels with a CUDA runtime error.<br>

                </p>

                <p><br>

                  === How to use the prototype<br>

                  <br>

                  This application interprets cuda runtime code. The

                  program needs the whole cuda-program (.cu-file) and

                  its pre-compiled device code (as fatbin) as an input:<br>

                  <br>

                      command: cuda-interpreter [source].cu

                  [kernels].fatbin<br>

                  <br>

                  I also implemented an alternative mode, which is

                  generating an object file. The object file can be

                  linked (ld) to an exectuable. This mode is just

                  implemented to check if the LLVM module generation

                  works as expected. Activate it by changing the define

                  INTERPRET from 1 to 0 .<br>

                  <br>

                  === Implementation<br>

                  <br>

                  The prototype is based on the clang example in<br>

                  <br>

                  <a class="m_-8219667011232476936moz-txt-link-freetext"

href="https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter"

                    target="_blank" moz-do-not-send="true">https://github.com/llvm-<wbr>mirror/clang/tree/master/<wbr>examples/clang-interpreter</a><br>

                  <br>

                  I also pushed the source code to github with the

                  install instructions and examples:<br>

                    <a

                    class="m_-8219667011232476936moz-txt-link-freetext"

href="https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter"

                    target="_blank" moz-do-not-send="true">https://github.com/<wbr>SimeonEhrig/CUDA-Runtime-<wbr>Interpreter</a><br>

                  <br>

                  The device code generation can be performed with

                  either clang's CUDA frontend or NVCC to ptx.<br>

                  <br>

                  Here is the workflow in five stages:<br>

                </p>

                <ol>

                  <li>generating ptx device code (a kind of nvidia

                    assembler)</li>

                  <li>translate ptx to sass (machine code of ptx)</li>

                  <li>generate a fatbinray (a kind of wrapper for the

                    device code)</li>

                  <li>generate host code object file (use fatbinary as

                    input)</li>

                  <li>link to executable</li>

                </ol>

                <p>(The exact commands are stored in the commands.txt in

                  the github repo)<br>

                  <br>

                  The interpreter replaces the 4th and 5th step. It

                  interprets the host code with pre-compiled device code

                  as fatbinary. The fatbinary (Step 1 to 3) will be

                  generated with the clang compiler and the nvidia tools

                  ptxas and fatbinary.<br>

                  <br>

                  === Test Cases and Issues<br>

                  <br>

                  You find the test sources on GitHub in the directory

                  "example_prog".<br>

                  <br>

                  Run the tests with cuda-interpeter and the two

                  arguments as above:<br>

                  <br>

                   [1] path to the source code in "example_prog"<br>

                       - note: even for host-only code, use the

                  file-ending .cu<br>

                       <br>

                   [2] path to the runtime .fatbin<br>

                       - note: needs the file ending .fatbin<br>

                       - a fatbin file is necessary, but if the program

                  doesn't need a kernel, the content of the file will

                  ignore</p>

                Note: As a prototype, the input is just static and

                barely checked yet.<br>

                <br>

                1. <a href="http://hello.cu" target="_blank"

                  moz-do-not-send="true">hello.cu</a>: simple c++ hello

                world program with cmath library call sqrt() -> works

                without problems<br>

                <br>

                2. <a href="http://pthread_test.cu" target="_blank"

                  moz-do-not-send="true">pthread_test.cu</a>: c++

                program, which starts a second thread -> works

                without problems<br>

                <br>

                3. <a href="http://fat_memory.cu" target="_blank"

                  moz-do-not-send="true">fat_memory.cu</a>: use cuda

                library and allocate about 191 MB of VRAM. After the

                allocation, the program waits for 3 seconds, so you can

                check the memory usage with the nvidia-smi -> works

                without problems<br>

                <br>

                4. <a href="http://runtime.cu" target="_blank"

                  moz-do-not-send="true">runtime.cu</a>: combine cuda

                library with a simple cuda kernel -> Generating an

                object file, which can be linked (see 5th call in

                commands above -> ld ...) to a working executable.<br>

                <br>

                The last example has the following issues: Running the

                executable works fine. Interpreting the code instead

                does not work. The Cuda Runtime returns the error 8 (<span

                  class="m_-8219667011232476936enum-member-name-def">cudaErrorInvalidDeviceFunctio<wbr>n</span>)

                , the kernel failed.<br>

                <br>

                Do you have any idea how to proceed?<br>

                <br>

                <br>

                Best regards,<br>

                Simeon Ehrig </div>

              <br>

              ______________________________<wbr>_________________<br>

              LLVM Developers mailing list<br>

              <a href="mailto:llvm-dev@lists.llvm.org"

                moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

              <a

                href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

              <br>

            </blockquote>

          </div>

          <br>

        </div>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>