[llvm-dev] OrcJIT + CUDA Prototype for Cling

Tue Nov 14 13:15:58 PST 2017

Hi Lang,

thank You very much. I've used Your code and the creating of the object
file works. I think the problem is after creating the object file. When
I link the object file with ld I get an executable, which is working right.

After changing the clang and llvm libraries from the package control
version (.deb) to a own compiled version with debug options, I get an
assert() fault.
In
void RuntimeDyldELF::resolveX86_64Relocation() at the case
ELF::R_X86_64_PC32
this will throw an assert. You can find the code in the file
llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp . I don't know
exactly, what this function do but after first research, I think it has
something to do with the linking. Maybe You know more about the function?

Your code also helps me to understand more, how the interpreter library
works. I have also some new ideas, how I could find the concrete problem
and solve it.

Cheers,
Simeon

Am 09.11.2017 um 00:59 schrieb Lang Hames:
> Hi Simon,
>
> I think the best thing would be to add an ObjectTransformLayer between
> your CompileLayer and LinkingLayer so that you can capture the object
> files as they're generated. Then you can inspect the object files
> being generated by the compiler to see what might be wrong with them.
>
> Something like this:
>
> class KaleidoscopeJIT {
> private:
>
>   using ObjectPtr =
> std::shared_ptr<object::OwningBinary<object::ObjectFile>>;
>
>   static ObjectPtr dumpObject(ObjectPtr Obj) {
>     SmallVector<char, 256> UniqueObjFileName;
>     sys::fs::createUniqueFile("jit-object-%%%.o", UniqueObjFileName);
>     std::error_code EC;
>     raw_fd_ostream ObjFileStream(UniqueObjFileName.data(), EC,
> sys::fs::F_RW);
>     ObjFileStream.write(Obj->getBinary()->getData().data(),          
>                                                                      
>                                                                      
>                                                                      
>                                                                      
>                                       
>                         Obj->getBinary()->getData().size());          
>                                                                      
>                                                                      
>                                                                      
>                                                                      
>                                      
>     return Obj;
>   }
>
>   std::unique_ptr<TargetMachine> TM;
>   const DataLayout DL;
>   RTDyldObjectLinkingLayer ObjectLayer;
>   ObjectTransformLayer<decltype(ObjectLayer),
>                        decltype(&KaleidoscopeJIT::dumpObject)>
> DumpObjectsLayer;
>   IRCompileLayer<decltype(DumpObjectsLayer), SimpleCompiler> CompileLayer;
>
> public:
>   using ModuleHandle = decltype(CompileLayer)::ModuleHandleT;
>
>   KaleidoscopeJIT()
>       : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
>         ObjectLayer([]() { return
> std::make_shared<SectionMemoryManager>(); }),
>         DumpObjectsLayer(ObjectLayer, &KaleidoscopeJIT::dumpObject),
>         CompileLayer(DumpObjectsLayer, SimpleCompiler(*TM)) {
>     llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
>   }
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
> On Wed, Sep 27, 2017 at 10:32 AM, Simeon Ehrig via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Dear LLVM-Developers and Vinod Grover,
>
>     we are trying to extend the cling C++ interpreter
>     (https://github.com/root-project/cling
>     <https://github.com/root-project/cling>) with CUDA functionality
>     for Nvidia GPUs.
>
>     I already developed a prototype based on OrcJIT and am seeking for
>     feedback. I am currently a stuck with a runtime issue, on which my
>     interpreter prototype fails to execute kernels with a CUDA runtime
>     error.
>
>
>     === How to use the prototype
>
>     This application interprets cuda runtime code. The program needs
>     the whole cuda-program (.cu-file) and its pre-compiled device code
>     (as fatbin) as an input:
>
>         command: cuda-interpreter [source].cu [kernels].fatbin
>
>     I also implemented an alternative mode, which is generating an
>     object file. The object file can be linked (ld) to an exectuable.
>     This mode is just implemented to check if the LLVM module
>     generation works as expected. Activate it by changing the define
>     INTERPRET from 1 to 0 .
>
>     === Implementation
>
>     The prototype is based on the clang example in
>
>     https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter
>     <https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter>
>
>     I also pushed the source code to github with the install
>     instructions and examples:
>       https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter
>     <https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter>
>
>     The device code generation can be performed with either clang's
>     CUDA frontend or NVCC to ptx.
>
>     Here is the workflow in five stages:
>
>      1. generating ptx device code (a kind of nvidia assembler)
>      2. translate ptx to sass (machine code of ptx)
>      3. generate a fatbinray (a kind of wrapper for the device code)
>      4. generate host code object file (use fatbinary as input)
>      5. link to executable
>
>     (The exact commands are stored in the commands.txt in the github repo)
>
>     The interpreter replaces the 4th and 5th step. It interprets the
>     host code with pre-compiled device code as fatbinary. The
>     fatbinary (Step 1 to 3) will be generated with the clang compiler
>     and the nvidia tools ptxas and fatbinary.
>
>     === Test Cases and Issues
>
>     You find the test sources on GitHub in the directory "example_prog".
>
>     Run the tests with cuda-interpeter and the two arguments as above:
>
>      [1] path to the source code in "example_prog"
>          - note: even for host-only code, use the file-ending .cu
>         
>      [2] path to the runtime .fatbin
>          - note: needs the file ending .fatbin
>          - a fatbin file is necessary, but if the program doesn't need
>     a kernel, the content of the file will ignore
>
>     Note: As a prototype, the input is just static and barely checked yet.
>
>     1. hello.cu <http://hello.cu>: simple c++ hello world program with
>     cmath library call sqrt() -> works without problems
>
>     2. pthread_test.cu <http://pthread_test.cu>: c++ program, which
>     starts a second thread -> works without problems
>
>     3. fat_memory.cu <http://fat_memory.cu>: use cuda library and
>     allocate about 191 MB of VRAM. After the allocation, the program
>     waits for 3 seconds, so you can check the memory usage with the
>     nvidia-smi -> works without problems
>
>     4. runtime.cu <http://runtime.cu>: combine cuda library with a
>     simple cuda kernel -> Generating an object file, which can be
>     linked (see 5th call in commands above -> ld ...) to a working
>     executable.
>
>     The last example has the following issues: Running the executable
>     works fine. Interpreting the code instead does not work. The Cuda
>     Runtime returns the error 8 (cudaErrorInvalidDeviceFunction) , the
>     kernel failed.
>
>     Do you have any idea how to proceed?
>
>
>     Best regards,
>     Simeon Ehrig
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171114/f01930a8/attachment.html>