<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi,<br>
<br>
First of all, i'm not sure if i should be posting this here or in
cfe-dev, but here it goes.<br>
<br>
In order to instrument CUDA kernels i first generate device IR
with:<br>
<br>
<font face="Courier New, Courier, monospace">clang++ -x cuda
--cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc<br>
</font><br>
I also have a library that contains the instrumentation stubs for
which i generate IR similarly and i link it with the device IR
programmatically with <font face="Courier New, Courier,
monospace">Linker::linkModules(..)</font><br>
<br>
Then after some analysis i use llc to get ptx:<br>
<br>
<font face="Courier New, Courier, monospace">llc device.bc
--march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx</font><br>
<br>
This works fine but the problem is that the instrumentation code
uses <font face="Courier New, Courier, monospace">__shfl_sync()</font>
and ptxas gives me the following error:<br>
<br>
<font face="Courier New, Courier, monospace">ptxas device.ptx,
line 1033; error : Feature 'shfl.sync' requires PTX ISA
.version 6.0 or later</font><br>
<br>
Now according to
<a class="moz-txt-link-freetext" href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions">https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions</a>,<br>
<br>
__shfl_sync is supported by compute capability >= 3 and
according to <a class="moz-txt-link-freetext" href="https://developer.nvidia.com/cuda-gpus#compute">https://developer.nvidia.com/cuda-gpus#compute</a> my
GTX950 has Compute Capability 5.2. <br>
<br>
Also according to
<a class="moz-txt-link-freetext" href="https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes">https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes</a>
PTX ISA 6.0 does support sm_52.<br>
<br>
However llc generates:<br>
<br>
.version 4.1<br>
.target sm_52, debug<br>
.address_size 64<br>
<br>
Any ideas why this is happening? Or am i doing something wrong?<br>
<br>
PS. I'm using CUDA 10, driver 440<br>
<br>
~George<br>
</p>
</body>
</html>