<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/58626>58626</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            CUDA __syncthreads() malfunctioning with -O2 optimization or higher (clang 14.0.6 and 15.0.3)
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          cbuchner1
      </td>
    </tr>
</table>

<pre>
    I use CUDA 11.3 and an nVidia A100, hence the architecture sm_80 is specified during compilation.

__syncthreads() is no longer working for me since upgrading from clang 12 to clang 14.0.6 when compiling CUDA code. I have a short reproducer. It throws asserts when compiled with -O2 or -O3.

`// clang++ -O3 --cuda-gpu-arch=sm_80 -x cuda test.cu -o test -L/usr/local/cuda-11.3/lib64 -lcudart
#include <cassert>
__global__ void test()
{
   __shared__ int test;
   test = 0;
   __syncthreads();
   if (threadIdx.x == 0 && threadIdx.y == 0 && threadIdx.z == 0)
   {
     test = 1234;
   }
   __syncthreads();
   assert(test == 1234);
}
#include <iostream>
int main(int argc, char **argv)
{
  dim3 block(16,16,1);
  dim3 grid(1,1,1);
  test<<<grid, block>>>();
  cudaDeviceSynchronize();
  std::cerr << "CUDA error code: " << cudaGetLastError() << std::endl;
}`

when I call the below inline assembly function barSync() in place of __syncthread(), the code starts working again.

`// inline assembly to insert a barrier synchronization equivalent to __syncthreads()
__device__ __forceinline__ void barSync() {
      asm volatile("bar.sync 0;" : : : "memory");
}
`
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyNVcuu2zgM_RpnQ9jwK0688CJN2sEFCnQxmNkGesVWK1sZSU5u-vVDyc7rNjMooCSKSJNHhzw01fzSvMFoBWz_2m0gy5ICyMDxA8PfkksCmyxNo3wLnRiYANcJIIZ10gnmRiPA9vt1CtKCPQomD1Jw4KORQwtM90epiJN6SKJ0F6Wb6Xu_t5eBuc4Iwm2Ur6O89s8PGpQeWmHgrM0PH-CgDfSYQfrE47E1hIdjo3tgiuA2y8Hp675M0qSCM-KcU3vncCumuUjgDTpyQvRgO20cGHE0mo9MGDQ5vJjRZwvEWmGcfQyDNzpL10H8LQdEFH8rnq4TVUjPF1wTjij_hMt7QRyzkZO4PY6xpywqdhNZ8Tt4AzhhXcJGiHXYQvwVo4zW4LfSjCj8DQF8UfyZpFUJsfJnxs3J8wLZUSMXEBVbNqGPis9XplulKVH7PZy05CHLRPj89OrTtAEArEpHjODoKwc3uRZ3cwCIN4D08fRFKR_N8gB4OJnf-Hvy7iOEIHhe4YK77fI_tp832w06Rn9E_wAwy4vyEUS02v0u3pk-hDwHu8V78LvFe-ZeauswbH8j37PYEzlgOL8lpmVeRgxZxhvi0xs8Or2qBZd9ARRb4Ac-myEX2-nrGW3wao3k3mkyf3SZirid1uS6nQMjyml9JMF3106cJBN_IlUoikH-FL94WcejYoML5WNgSoC3yoPc8Ah14kWHDv706uBj_yHcV2LdZ-8zi3-23mKKgasnulFhD4IL0nwDFIgK44gKpc_YtKh3EUrYU3WBw4iFxtEDlBh_k-ugGeCoCM4TfXjqhvmGyI8P6bEjHhImwTyNSIvF_A_lf0yOQ0kOvpdw2mB-I3Gq2RudYSSC-GeUJ6KEF5t-2ZmzhnmoBupyv8eJyMSU7Krp5-s9awIB9ejlZ7Caapije-IzTToOpdncP3nei16bC25et3yVLkSTVVW1qvM0Lxe8KXhd1GThpFOiCeV_Nd97oq4F8Vze5-nRyf7KCDZNJ9tOeH2sn4a6fyNlS9ziHKwXo1FN59zR-nYJBWgx3kgTnNd-TqrT9SfGEf8dX1T4V1o7CoTzZbmu8mrRNetiWRVrStbLNCspzeual_xAC5qT1YGu2UIR7CzbREvP0yCwxXwIT81yt5AN3j_P0rzKinKVlgk9lBkRRU2pqAtC6qhMBcpfJR5Hok27ME2ARMfWolFJ6-zdiK0j20GIkA7jk9HhW6phdGTdIEy2CMmbAP5f1xZOFg">