[llvm-dev] cuda __shfl_sync problem

Mon Sep 28 06:16:59 PDT 2020

I guess target-feature is a cc1 option. So maybe -Xclang -target-feature 
works.
However, given that you solved the problem, I'd call that a win ;)

~ Johannes

On 9/28/20 7:30 AM, George K wrote:
> I couldn't find `-target-feature`. I am on llvm 10. Has the interface 
> changed maybe?
>
> Fortunately, `-mattr=+ptx60 ` did the trick.
>
> George
>
> On 25-09-2020 17:05, Johannes Doerfert wrote:
>> Have you tried `-target-feature +ptx60`?
>>
>>
>> On 9/25/20 3:18 AM, George K wrote:
>>> Do you mean in llc? Because i don't see such an option i'm afraid.
>>>
>>> ~George
>>>
>>> On 24-09-2020 20:54, Johannes Doerfert wrote:
>>>> Not that I am an expert but it looks like it defaults to the 
>>>> minimal PTX version that supports the compute capability. You might 
>>>> be able to choose PTX 6.0 though.
>>>>
>>>> ~ Johannes
>>>>
>>>>
>>>> On 9/24/20 1:02 PM, George K via llvm-dev wrote:
>>>>> Hi,
>>>>>
>>>>> First of all, i'm not sure if i should be posting this here or in 
>>>>> cfe-dev, but here it goes.
>>>>>
>>>>> In order to instrument CUDA kernels i first generate device IR with:
>>>>>
>>>>> clang++ -x cuda --cuda-device-only -emit-llvm 
>>>>> --cuda-gpu-arch=sm_52 -o device.bc
>>>>>
>>>>> I also have a library that contains the instrumentation stubs for 
>>>>> which i generate IR similarly and i link it with the device IR 
>>>>> programmatically with Linker::linkModules(..)
>>>>>
>>>>> Then after some analysis i use llc to get ptx:
>>>>>
>>>>> llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o 
>>>>> device.ptx
>>>>>
>>>>> This works fine but the problem is that the instrumentation code 
>>>>> uses __shfl_sync() and ptxas gives me the following error:
>>>>>
>>>>> ptxas device.ptx, line 1033; error   : Feature 'shfl.sync' 
>>>>> requires PTX ISA .version 6.0 or later
>>>>>
>>>>> Now according to 
>>>>> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions,
>>>>>
>>>>> __shfl_sync is supported by compute capability >= 3 and according 
>>>>> to https://developer.nvidia.com/cuda-gpus#compute my GTX950 has 
>>>>> Compute Capability 5.2.
>>>>>
>>>>> Also according to 
>>>>> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes 
>>>>> PTX ISA 6.0 does support sm_52.
>>>>>
>>>>> However llc generates:
>>>>>
>>>>> .version 4.1
>>>>> .target sm_52, debug
>>>>> .address_size 64
>>>>>
>>>>> Any ideas why this is happening? Or am i doing something wrong?
>>>>>
>>>>> PS. I'm using CUDA 10, driver 440
>>>>>
>>>>> ~George
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev