[llvm-dev] OrcJIT + CUDA Prototype for Cling
Simeon Ehrig via llvm-dev
llvm-dev at lists.llvm.org
Tue Jan 16 05:41:46 PST 2018
Hi LLVM-Developers and Lang,
I solved the relocation problem and another problem, so I have a working
cuda-runtime-code-interpreter [1].
The solution of the relocation problem is, that I change die reloc mode
from dynamic relocation to PIC (position independent code).
The second problem was, that I got an cudaErrorInvalidDeviceFunction
error (error code 8), if I want to run a kernel. After some research, I
found out, that the kernel-code will be registered
(__cudaRegisterFatBinary(...) ) in a global constructor, which is
generated by the cuda backend (lib/CodeGen/CGCUDANV.cpp). The ctor
should start before the main function but I called the main directly. So
the error was caused by running directly the main function and skipped
the global cuda ctor and dtor. So I wrote a fix, which runs the ctor and
dtor before and after the main and all works fine.
Cheers,
Simeon
Am 14.11.2017 um 22:15 schrieb Simeon Ehrig:
>
> Hi Lang,
>
> thank You very much. I've used Your code and the creating of the
> object file works. I think the problem is after creating the object
> file. When I link the object file with ld I get an executable, which
> is working right.
>
> After changing the clang and llvm libraries from the package control
> version (.deb) to a own compiled version with debug options, I get an
> assert() fault.
> In
> void RuntimeDyldELF::resolveX86_64Relocation() at the case
> ELF::R_X86_64_PC32
> this will throw an assert. You can find the code in the file
> llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp . I don't know
> exactly, what this function do but after first research, I think it
> has something to do with the linking. Maybe You know more about the
> function?
>
> Your code also helps me to understand more, how the interpreter
> library works. I have also some new ideas, how I could find the
> concrete problem and solve it.
>
> Cheers,
> Simeon
>
> Am 09.11.2017 um 00:59 schrieb Lang Hames:
>> Hi Simon,
>>
>> I think the best thing would be to add an ObjectTransformLayer
>> between your CompileLayer and LinkingLayer so that you can capture
>> the object files as they're generated. Then you can inspect the
>> object files being generated by the compiler to see what might be
>> wrong with them.
>>
>> Something like this:
>>
>> class KaleidoscopeJIT {
>> private:
>>
>> using ObjectPtr =
>> std::shared_ptr<object::OwningBinary<object::ObjectFile>>;
>>
>> static ObjectPtr dumpObject(ObjectPtr Obj) {
>> SmallVector<char, 256> UniqueObjFileName;
>> sys::fs::createUniqueFile("jit-object-%%%.o", UniqueObjFileName);
>> std::error_code EC;
>> raw_fd_ostream ObjFileStream(UniqueObjFileName.data(), EC,
>> sys::fs::F_RW);
>> ObjFileStream.write(Obj->getBinary()->getData().data(),
>>
>>
>>
>>
>>
>> Obj->getBinary()->getData().size());
>>
>>
>>
>>
>>
>> return Obj;
>> }
>>
>> std::unique_ptr<TargetMachine> TM;
>> const DataLayout DL;
>> RTDyldObjectLinkingLayer ObjectLayer;
>> ObjectTransformLayer<decltype(ObjectLayer),
>> decltype(&KaleidoscopeJIT::dumpObject)>
>> DumpObjectsLayer;
>> IRCompileLayer<decltype(DumpObjectsLayer), SimpleCompiler>
>> CompileLayer;
>>
>> public:
>> using ModuleHandle = decltype(CompileLayer)::ModuleHandleT;
>>
>> KaleidoscopeJIT()
>> : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
>> ObjectLayer([]() { return
>> std::make_shared<SectionMemoryManager>(); }),
>> DumpObjectsLayer(ObjectLayer, &KaleidoscopeJIT::dumpObject),
>> CompileLayer(DumpObjectsLayer, SimpleCompiler(*TM)) {
>> llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
>> }
>>
>> Hope this helps!
>>
>> Cheers,
>> Lang.
>>
>>
>> On Wed, Sep 27, 2017 at 10:32 AM, Simeon Ehrig via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Dear LLVM-Developers and Vinod Grover,
>>
>> we are trying to extend the cling C++ interpreter
>> (https://github.com/root-project/cling
>> <https://github.com/root-project/cling>) with CUDA functionality
>> for Nvidia GPUs.
>>
>> I already developed a prototype based on OrcJIT and am seeking
>> for feedback. I am currently a stuck with a runtime issue, on
>> which my interpreter prototype fails to execute kernels with a
>> CUDA runtime error.
>>
>>
>> === How to use the prototype
>>
>> This application interprets cuda runtime code. The program needs
>> the whole cuda-program (.cu-file) and its pre-compiled device
>> code (as fatbin) as an input:
>>
>> command: cuda-interpreter [source].cu [kernels].fatbin
>>
>> I also implemented an alternative mode, which is generating an
>> object file. The object file can be linked (ld) to an exectuable.
>> This mode is just implemented to check if the LLVM module
>> generation works as expected. Activate it by changing the define
>> INTERPRET from 1 to 0 .
>>
>> === Implementation
>>
>> The prototype is based on the clang example in
>>
>> https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter
>> <https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter>
>>
>> I also pushed the source code to github with the install
>> instructions and examples:
>> https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter
>> <https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter>
>>
>> The device code generation can be performed with either clang's
>> CUDA frontend or NVCC to ptx.
>>
>> Here is the workflow in five stages:
>>
>> 1. generating ptx device code (a kind of nvidia assembler)
>> 2. translate ptx to sass (machine code of ptx)
>> 3. generate a fatbinray (a kind of wrapper for the device code)
>> 4. generate host code object file (use fatbinary as input)
>> 5. link to executable
>>
>> (The exact commands are stored in the commands.txt in the github
>> repo)
>>
>> The interpreter replaces the 4th and 5th step. It interprets the
>> host code with pre-compiled device code as fatbinary. The
>> fatbinary (Step 1 to 3) will be generated with the clang compiler
>> and the nvidia tools ptxas and fatbinary.
>>
>> === Test Cases and Issues
>>
>> You find the test sources on GitHub in the directory "example_prog".
>>
>> Run the tests with cuda-interpeter and the two arguments as above:
>>
>> [1] path to the source code in "example_prog"
>> - note: even for host-only code, use the file-ending .cu
>>
>> [2] path to the runtime .fatbin
>> - note: needs the file ending .fatbin
>> - a fatbin file is necessary, but if the program doesn't
>> need a kernel, the content of the file will ignore
>>
>> Note: As a prototype, the input is just static and barely checked
>> yet.
>>
>> 1. hello.cu <http://hello.cu>: simple c++ hello world program
>> with cmath library call sqrt() -> works without problems
>>
>> 2. pthread_test.cu <http://pthread_test.cu>: c++ program, which
>> starts a second thread -> works without problems
>>
>> 3. fat_memory.cu <http://fat_memory.cu>: use cuda library and
>> allocate about 191 MB of VRAM. After the allocation, the program
>> waits for 3 seconds, so you can check the memory usage with the
>> nvidia-smi -> works without problems
>>
>> 4. runtime.cu <http://runtime.cu>: combine cuda library with a
>> simple cuda kernel -> Generating an object file, which can be
>> linked (see 5th call in commands above -> ld ...) to a working
>> executable.
>>
>> The last example has the following issues: Running the executable
>> works fine. Interpreting the code instead does not work. The Cuda
>> Runtime returns the error 8 (cudaErrorInvalidDeviceFunction) ,
>> the kernel failed.
>>
>> Do you have any idea how to proceed?
>>
>>
>> Best regards,
>> Simeon Ehrig
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180116/033b94c2/attachment.html>
More information about the llvm-dev
mailing list