[llvm-dev] Linking CUDA bitcode files and generating CUDA executable

Sat Jul 15 11:05:28 PDT 2017

Hi everyone,

Could someone share the recipe for getting bitcode files out of a CUDA 
program and then linking them to generate an executable? I’m following 
these steps but when I run the executable no CUDA kernels are executed:

   clang++ -emit-llvm -c program.cu --cuda-path=$(CUDA_PATH) 
--cuda-gpu-arch=sm_35
   clang++ -c program.bc -o program.o
   llc program-cuda-nvptx64-nvidia-cuda-sm_35.bc -o 
program-cuda-nvptx64-nvidia-cuda-sm_35.ptx
   nvcc -arch=sm_35 --device-c 
program-cuda-nvptx64-nvidia-cuda-sm_35.ptx -o 
program-cuda-nvptx64-nvidia-cuda-sm_35.o
   nvcc -arch=sm_35 -dlink program.o 
program-cuda-nvptx64-nvidia-cuda-sm_35.o -o linkedcode.o
   clang++ -o program linkedcode.o program.o 
program-cuda-nvptx64-nvidia-cuda-sm_35.o -L$(CUDA_LIB) -lcudart_static 
-lcudadevrt -ldl -lrt -pthread

I don’t get any error when doing this, but when I run it, no kernels 
execute. When I use cuda-memcheck it tells me this:

“Program hit cudaErrorInvalidDeviceFunction (error 8) due to "invalid 
device function" on CUDA API call to cudaLaunch. ”

My device is "Tesla K40m" with compute capability 3.5. I'm using 
clang/llvm 4.0, and CUDA 8.0. Can someone point out what I am doing wrong?

Thank you very much in advance,

Ignacio