[llvm-dev] cuda __shfl_sync problem

Thu Sep 24 11:02:27 PDT 2020

Hi,

First of all, i'm not sure if i should be posting this here or in 
cfe-dev, but here it goes.

In order to instrument CUDA kernels i first generate device IR with:

clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o 
device.bc

I also have a library that contains the instrumentation stubs for which 
i generate IR similarly and i link it with the device IR 
programmatically with Linker::linkModules(..)

Then after some analysis i use llc to get ptx:

llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx

This works fine but the problem is that the instrumentation code uses 
__shfl_sync() and ptxas gives me the following error:

ptxas device.ptx, line 1033; error   : Feature 'shfl.sync' requires PTX 
ISA .version 6.0 or later

Now according to 
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,

__shfl_sync is supported by compute capability >= 3 and according to 
https://developer.nvidia.com/cuda-gpus#compute my GTX950 has Compute 
Capability 5.2.

Also according to 
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes 
PTX ISA 6.0 does support sm_52.

However llc generates:

.version 4.1
.target sm_52, debug
.address_size 64

Any ideas why this is happening? Or am i doing something wrong?

PS. I'm using CUDA 10, driver 440

~George

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200924/4fa6c248/attachment.html>