[llvm-dev] OrcJIT + CUDA Prototype for Cling
Simeon Ehrig via llvm-dev
llvm-dev at lists.llvm.org
Tue Nov 14 13:15:58 PST 2017
Hi Lang,
thank You very much. I've used Your code and the creating of the object
file works. I think the problem is after creating the object file. When
I link the object file with ld I get an executable, which is working right.
After changing the clang and llvm libraries from the package control
version (.deb) to a own compiled version with debug options, I get an
assert() fault.
In
void RuntimeDyldELF::resolveX86_64Relocation() at the case
ELF::R_X86_64_PC32
this will throw an assert. You can find the code in the file
llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp . I don't know
exactly, what this function do but after first research, I think it has
something to do with the linking. Maybe You know more about the function?
Your code also helps me to understand more, how the interpreter library
works. I have also some new ideas, how I could find the concrete problem
and solve it.
Cheers,
Simeon
Am 09.11.2017 um 00:59 schrieb Lang Hames:
> Hi Simon,
>
> I think the best thing would be to add an ObjectTransformLayer between
> your CompileLayer and LinkingLayer so that you can capture the object
> files as they're generated. Then you can inspect the object files
> being generated by the compiler to see what might be wrong with them.
>
> Something like this:
>
> class KaleidoscopeJIT {
> private:
>
> using ObjectPtr =
> std::shared_ptr<object::OwningBinary<object::ObjectFile>>;
>
> static ObjectPtr dumpObject(ObjectPtr Obj) {
> SmallVector<char, 256> UniqueObjFileName;
> sys::fs::createUniqueFile("jit-object-%%%.o", UniqueObjFileName);
> std::error_code EC;
> raw_fd_ostream ObjFileStream(UniqueObjFileName.data(), EC,
> sys::fs::F_RW);
> ObjFileStream.write(Obj->getBinary()->getData().data(),
>
>
>
>
>
> Obj->getBinary()->getData().size());
>
>
>
>
>
> return Obj;
> }
>
> std::unique_ptr<TargetMachine> TM;
> const DataLayout DL;
> RTDyldObjectLinkingLayer ObjectLayer;
> ObjectTransformLayer<decltype(ObjectLayer),
> decltype(&KaleidoscopeJIT::dumpObject)>
> DumpObjectsLayer;
> IRCompileLayer<decltype(DumpObjectsLayer), SimpleCompiler> CompileLayer;
>
> public:
> using ModuleHandle = decltype(CompileLayer)::ModuleHandleT;
>
> KaleidoscopeJIT()
> : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
> ObjectLayer([]() { return
> std::make_shared<SectionMemoryManager>(); }),
> DumpObjectsLayer(ObjectLayer, &KaleidoscopeJIT::dumpObject),
> CompileLayer(DumpObjectsLayer, SimpleCompiler(*TM)) {
> llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
> }
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
> On Wed, Sep 27, 2017 at 10:32 AM, Simeon Ehrig via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Dear LLVM-Developers and Vinod Grover,
>
> we are trying to extend the cling C++ interpreter
> (https://github.com/root-project/cling
> <https://github.com/root-project/cling>) with CUDA functionality
> for Nvidia GPUs.
>
> I already developed a prototype based on OrcJIT and am seeking for
> feedback. I am currently a stuck with a runtime issue, on which my
> interpreter prototype fails to execute kernels with a CUDA runtime
> error.
>
>
> === How to use the prototype
>
> This application interprets cuda runtime code. The program needs
> the whole cuda-program (.cu-file) and its pre-compiled device code
> (as fatbin) as an input:
>
> command: cuda-interpreter [source].cu [kernels].fatbin
>
> I also implemented an alternative mode, which is generating an
> object file. The object file can be linked (ld) to an exectuable.
> This mode is just implemented to check if the LLVM module
> generation works as expected. Activate it by changing the define
> INTERPRET from 1 to 0 .
>
> === Implementation
>
> The prototype is based on the clang example in
>
> https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter
> <https://github.com/llvm-mirror/clang/tree/master/examples/clang-interpreter>
>
> I also pushed the source code to github with the install
> instructions and examples:
> https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter
> <https://github.com/SimeonEhrig/CUDA-Runtime-Interpreter>
>
> The device code generation can be performed with either clang's
> CUDA frontend or NVCC to ptx.
>
> Here is the workflow in five stages:
>
> 1. generating ptx device code (a kind of nvidia assembler)
> 2. translate ptx to sass (machine code of ptx)
> 3. generate a fatbinray (a kind of wrapper for the device code)
> 4. generate host code object file (use fatbinary as input)
> 5. link to executable
>
> (The exact commands are stored in the commands.txt in the github repo)
>
> The interpreter replaces the 4th and 5th step. It interprets the
> host code with pre-compiled device code as fatbinary. The
> fatbinary (Step 1 to 3) will be generated with the clang compiler
> and the nvidia tools ptxas and fatbinary.
>
> === Test Cases and Issues
>
> You find the test sources on GitHub in the directory "example_prog".
>
> Run the tests with cuda-interpeter and the two arguments as above:
>
> [1] path to the source code in "example_prog"
> - note: even for host-only code, use the file-ending .cu
>
> [2] path to the runtime .fatbin
> - note: needs the file ending .fatbin
> - a fatbin file is necessary, but if the program doesn't need
> a kernel, the content of the file will ignore
>
> Note: As a prototype, the input is just static and barely checked yet.
>
> 1. hello.cu <http://hello.cu>: simple c++ hello world program with
> cmath library call sqrt() -> works without problems
>
> 2. pthread_test.cu <http://pthread_test.cu>: c++ program, which
> starts a second thread -> works without problems
>
> 3. fat_memory.cu <http://fat_memory.cu>: use cuda library and
> allocate about 191 MB of VRAM. After the allocation, the program
> waits for 3 seconds, so you can check the memory usage with the
> nvidia-smi -> works without problems
>
> 4. runtime.cu <http://runtime.cu>: combine cuda library with a
> simple cuda kernel -> Generating an object file, which can be
> linked (see 5th call in commands above -> ld ...) to a working
> executable.
>
> The last example has the following issues: Running the executable
> works fine. Interpreting the code instead does not work. The Cuda
> Runtime returns the error 8 (cudaErrorInvalidDeviceFunction) , the
> kernel failed.
>
> Do you have any idea how to proceed?
>
>
> Best regards,
> Simeon Ehrig
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171114/f01930a8/attachment.html>
More information about the llvm-dev
mailing list