[llvm-dev] cuda __shfl_sync problem
George K via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 24 11:02:27 PDT 2020
Hi,
First of all, i'm not sure if i should be posting this here or in
cfe-dev, but here it goes.
In order to instrument CUDA kernels i first generate device IR with:
clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o
device.bc
I also have a library that contains the instrumentation stubs for which
i generate IR similarly and i link it with the device IR
programmatically with Linker::linkModules(..)
Then after some analysis i use llc to get ptx:
llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx
This works fine but the problem is that the instrumentation code uses
__shfl_sync() and ptxas gives me the following error:
ptxas device.ptx, line 1033; error : Feature 'shfl.sync' requires PTX
ISA .version 6.0 or later
Now according to
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,
__shfl_sync is supported by compute capability >= 3 and according to
https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute
Capability 5.2.
Also according to
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
PTX ISA 6.0 does support sm_52.
However llc generates:
.version 4.1
.target sm_52, debug
.address_size 64
Any ideas why this is happening? Or am i doing something wrong?
PS. I'm using CUDA 10, driver 440
~George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200924/4fa6c248/attachment.html>
More information about the llvm-dev
mailing list