<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/155682>155682</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [CUDA] Incorrect output when using __shfl_sync intrinsic
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          zyd2001
      </td>
    </tr>
</table>

<pre>
    When I tried to use clang++ to compile a SPMV code from https://github.com/poojahira/spmv-cuda/blob/master/code/src/spmv_light.cu, I found the result PTX from clang is seemingly incorrect. 
There is some code to atomically compute lane id and broadcast it with `__shfl_sync`.
```
// Get the row index
if (laneId == 0) {
        row = atomicAdd(i, 16);
}
// Broadcast the value to other threads in the same warp and compute the row index of each vector
row = __shfl_sync(0xffffffff,row, 0) + laneId;
```
However, when there is `__shfl_sync`, the if statement surrounding `atomicAdd` is ignored. This does not happen with NVCC.

clang output
```
$L__func_begin0:
        ld.param.b64    %rd1, [kernel(int*)_param_0];
        cvta.to.global.u64      %rd2, %rd1;
$L__tmp0:
        atom.global.add.u32     %r1, [%rd2], 16;
$L__tmp1:
        shfl.sync.idx.b32       %r2, %r1, 0, 31, -1;
$L__tmp2:
        ret;
$L__tmp3:
$L__func_end0:
```
nvcc output
```
        ld.param.u64    %rd1, [kernel(int*)_param_0];
        mov.u32         %r4, %tid.x;
        setp.ne.s32     %p1, %r4, 0;
        @%p1 bra        $L__BB0_2;
        cvta.to.global.u64      %rd2, %rd1;
        atom.global.add.u32     %r9, [%rd2], 16;
$L__BB0_2:
        mov.u32         %r5, 31;
        mov.u32         %r6, 0;
        mov.u32         %r7, -1;
        shfl.sync.idx.b32       %r8|%p2, %r9, %r6, %r5, %r7;
        ret;
```

Here is a link on compiler explorer with minimum example: https://godbolt.org/z/117Tex6eM
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVd2O2jgUfhpzczSR45BALrgIILYjbVeVdra7d8iJT4hbx45sB5g-_cr5AWa67UpFSLHi48_fj-PDnZMnjbgh6Zak-wXvfWPs5turYJTGi9KI183fDWp4Bm8lCvAGeodQKa5PhG0J24ZXlWk7qRA4_Pnp42eojECorWmh8b5zJCkIOxB2OEnf9GVUmZawQ2fMF95Iywk7uK49P1W9CONSmZKwQ8udR0vYIYCFEltNhUclT42Pqp6wHTxDbXotwDcIFl2vPHx6-WfcfCAJ0oFDbKU-qVeQujLWYuUjILR4adDiUGBaHFl7A9ybVlZcqddBV-8RFNcIUgDXAkpruKi48yA9XKRvgGT0eHRNrY7uVVckoxGhBcno9KeTevgN_cjTXEBqgVdCC1kDYeuA_yyAJHuS7IESlgNZbcNKmofq8HakVQhB2FoG6XFGWE6SoWy1v2-zvREMm5256gdZxjdowTcWuXAg9TDreItw4bYbpM1y35AEUwPyqoEzVt5YQouZ0aNotqbXevoRtrPmEiiOStgWRoET2QdnPpgLnkPMO7iEY-bnRL7zNJQEXrIG57nHFrUH11sb4pf6FFbcLcpoAJEnbSyKCF4a6UAYdKCNh4Z3Heoxuz8-73ZDXLQYj4vpfdf77wJc_n481r2ujiWepKbhSA_pKBF13PI2KrMlEJoTlloRB7Ik3X5Fq1GFvLQnrCAsPw7FR0rS_WQGzauz55E30UmZkquofwBiA9AIOZYPRHzb3RkE0fNaLkTUJ2xeP_OYsNL9dGzeQsU3qOB3FPyOpLhG5R3oxiOeUt1BMgyf3vNiNzCL_t1cMs3dzUQtZiEPbutzVf0ghwfD-182vDXnR5eWkzgvRXS9FTn0XaQxcrfCLp5dWI4uzKVkSYdpKC0fa4PA7ZYe2a-F_NNQ8_8Pddq6-E-56ZTeD9zI3mp7N7t6E_pPTsyarHbBlJu4fB5k8yCdB6sb3P3MvAm9-DDdChyU1F_B6LnjWMBrp4xFO37OrdSy7VvAK287hSQp3rcgI0qjfGTsibDDN8IOcbx6wWuGHxdik4g8yfkCN_EqzRKWJDRbNJtVnWa4EkyUMU8ErZOasbqiVMTrFWeVWMgNoyyla7aK8ySlWVTny5XgbFlivIyrdUaWFFsuVaTUuQ17L6RzPW7iNM3WbKF4icoNPZgxjRcYZgkL6S7sJix6KvuTI0uqpPPuDuOlV0Pz3v21L0i6h-e5wU0f0Hit9i7ckA8XKkjtrdROVoveqs1P2nTYano8ddZ8wcoTdhgIuuDeqOC8Yf8GAAD__5gLcyE">