<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/155682>155682</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[CUDA] Incorrect output when using __shfl_sync intrinsic
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
zyd2001
</td>
</tr>
</table>
<pre>
When I tried to use clang++ to compile a SPMV code from https://github.com/poojahira/spmv-cuda/blob/master/code/src/spmv_light.cu, I found the result PTX from clang is seemingly incorrect.
There is some code to atomically compute lane id and broadcast it with `__shfl_sync`.
```
// Get the row index
if (laneId == 0) {
row = atomicAdd(i, 16);
}
// Broadcast the value to other threads in the same warp and compute the row index of each vector
row = __shfl_sync(0xffffffff,row, 0) + laneId;
```
However, when there is `__shfl_sync`, the if statement surrounding `atomicAdd` is ignored. This does not happen with NVCC.
clang output
```
$L__func_begin0:
ld.param.b64 %rd1, [kernel(int*)_param_0];
cvta.to.global.u64 %rd2, %rd1;
$L__tmp0:
atom.global.add.u32 %r1, [%rd2], 16;
$L__tmp1:
shfl.sync.idx.b32 %r2, %r1, 0, 31, -1;
$L__tmp2:
ret;
$L__tmp3:
$L__func_end0:
```
nvcc output
```
ld.param.u64 %rd1, [kernel(int*)_param_0];
mov.u32 %r4, %tid.x;
setp.ne.s32 %p1, %r4, 0;
@%p1 bra $L__BB0_2;
cvta.to.global.u64 %rd2, %rd1;
atom.global.add.u32 %r9, [%rd2], 16;
$L__BB0_2:
mov.u32 %r5, 31;
mov.u32 %r6, 0;
mov.u32 %r7, -1;
shfl.sync.idx.b32 %r8|%p2, %r9, %r6, %r5, %r7;
ret;
```
Here is a link on compiler explorer with minimum example: https://godbolt.org/z/117Tex6eM
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVd2O2jgUfhpzczSR45BALrgIILYjbVeVdra7d8iJT4hbx45sB5g-_cr5AWa67UpFSLHi48_fj-PDnZMnjbgh6Zak-wXvfWPs5turYJTGi9KI183fDWp4Bm8lCvAGeodQKa5PhG0J24ZXlWk7qRA4_Pnp42eojECorWmh8b5zJCkIOxB2OEnf9GVUmZawQ2fMF95Iywk7uK49P1W9CONSmZKwQ8udR0vYIYCFEltNhUclT42Pqp6wHTxDbXotwDcIFl2vPHx6-WfcfCAJ0oFDbKU-qVeQujLWYuUjILR4adDiUGBaHFl7A9ybVlZcqddBV-8RFNcIUgDXAkpruKi48yA9XKRvgGT0eHRNrY7uVVckoxGhBcno9KeTevgN_cjTXEBqgVdCC1kDYeuA_yyAJHuS7IESlgNZbcNKmofq8HakVQhB2FoG6XFGWE6SoWy1v2-zvREMm5256gdZxjdowTcWuXAg9TDreItw4bYbpM1y35AEUwPyqoEzVt5YQouZ0aNotqbXevoRtrPmEiiOStgWRoET2QdnPpgLnkPMO7iEY-bnRL7zNJQEXrIG57nHFrUH11sb4pf6FFbcLcpoAJEnbSyKCF4a6UAYdKCNh4Z3Heoxuz8-73ZDXLQYj4vpfdf77wJc_n481r2ujiWepKbhSA_pKBF13PI2KrMlEJoTlloRB7Ik3X5Fq1GFvLQnrCAsPw7FR0rS_WQGzauz55E30UmZkquofwBiA9AIOZYPRHzb3RkE0fNaLkTUJ2xeP_OYsNL9dGzeQsU3qOB3FPyOpLhG5R3oxiOeUt1BMgyf3vNiNzCL_t1cMs3dzUQtZiEPbutzVf0ghwfD-182vDXnR5eWkzgvRXS9FTn0XaQxcrfCLp5dWI4uzKVkSYdpKC0fa4PA7ZYe2a-F_NNQ8_8Pddq6-E-56ZTeD9zI3mp7N7t6E_pPTsyarHbBlJu4fB5k8yCdB6sb3P3MvAm9-DDdChyU1F_B6LnjWMBrp4xFO37OrdSy7VvAK287hSQp3rcgI0qjfGTsibDDN8IOcbx6wWuGHxdik4g8yfkCN_EqzRKWJDRbNJtVnWa4EkyUMU8ErZOasbqiVMTrFWeVWMgNoyyla7aK8ySlWVTny5XgbFlivIyrdUaWFFsuVaTUuQ17L6RzPW7iNM3WbKF4icoNPZgxjRcYZgkL6S7sJix6KvuTI0uqpPPuDuOlV0Pz3v21L0i6h-e5wU0f0Hit9i7ckA8XKkjtrdROVoveqs1P2nTYano8ddZ8wcoTdhgIuuDeqOC8Yf8GAAD__5gLcyE">