<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/120131>120131</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [amdgpu] Generic scoped fences are miscompiled
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AMDGPU
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
            arsenm,
            JonChesterfield,
            b-sumner,
            jhuber6,
            dhruvachak
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          JonChesterfield
      </td>
    </tr>
</table>

<pre>
    Joseph wrote https://github.com/llvm/llvm-project/pull/119619 to replace uses of the amdgpu specific fence builtin with __scoped_atomic_thread_fence. That should have been fine (the amdgpu specific __builtin_amdgcn_fence only exists because it predates the generic one), except amdgpu mangles the scope as shown in https://github.com/llvm/llvm-project/pull/120095.

This _might_ be correct behaviour on opencl. I'd rather it was __opencl_atomic_thread_fence if the language wants that sort of logic, but either way, it definitely caused runtime failures on openmp after 119619 landed. I don't think that default makes sense on openmp or hip (but I could be wrong, I don't fully understand the openmp spec and https://rocm.docs.amd.com/projects/HIP/en/latest/reference/cpp_language_extensions.html doesn't mention it).

The "fix" to remove the special casing in 120095 is opposed on the basis that it's probably wrong on opencl (and it'll make things slower even when it's right) which I think means we need to revert 119619 while we work out what to do here.

This issue is created to document the current dynamic (as best understood by someone who wrote neither of those pull requests) and to see if we can reach consensus to revert 119619, as it's using a generic clang intrinsic which doesn't work properly, until such point as we can agree to fix the clang intrinsic or otherwise make things work properly. I.e. I don't think we can leave openmp's runtime with extra race conditions.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVU1v2zgQ_TX0ZVDDpr-igw_ZFu6mwAI9dM8CRY5ENhRHyyGt-N8vKMlJ0_aye0pMEW9m3nvzqJhdFxDP4vCHkFJFxtALKYX8KKT8QuGjRU4YW4fevJ43Hzj3AePrwXebG4zH19_GxnxV2qrncnT4tFI5WYrnnwBXDZnb-QsxDhbGSAnBpjSw2D0KeRHy0rlkc7PW1At58f56__NhiPQddRLyMmTvhbxst9VxW0EiiDh4pREyIwO1kCyC6k03ZOABtWudhhaDRmiy88kFGF2yUNesaUBTq0S903WyEZWpp5tr-GZVAraUvQGrrggNYoDWBQQhH35Xoq4X-Lp80WFGAgr-BvjiODE0qFVmBJdgiGhUQp667TBgdBoooJCVkB8BXzQO6V6jV6Hzy92paVBcmhsDuPD_GZSbTXVYi82j2Dx-s46h7l1nUw0NgqYYUSdo0KqroxyBAtCAQfs1PAl5MhBVshjLMKNiqOv56-_oBDer4lXosuoQRhVSGadwTDEV1Tx1TpfRm5wA3QQ9qls5cQkMti64hP4GE4UGYg7J9Qitcj7HovzcXz-AahNGWAziVTBo1vAEhoKQpwTJuvA8FzfYquwT9OoZGRgD4w84FMG6oehdenoCPbmhweLc0JXO3kDb7P0NcjAYOalgpnkXnGIRKGfvlYqk-7UhzWvVm0WwRSQW8vLn01chLxiKfsUpRbmILcbCqJAXPQz1ndAaXxIGdhR4bVPvwRDy3FiPITkK4JKQ1avYxcWydS9CynmFerri7K_iZ-VBK3ahK_6afQKOgYaBCvcUpquNYreoWNBPDEOkRjX-NjP0ZpnCYSFguub9xPekQ8fAnkaMgFcMMFoMd6xYvChkBaN12sLToluPKjCMCAHRzL1fMaa73KN1HsvnkeIzUE4wlv4SgSGwGPFHvzvmjGUwHVGlGc6QzoWzaUKdYyz_m1tQvdPTGGWNOd2lJjLQ3ICpRwoIo6Ul18Li4SmQiBHK1kHEfzJy0beaHJEIGKf9GBG0ChBRaQuaQjFj5l8GLK5TfOcoTxqp1wDRxRDgQoousNMLdW9mmEgZIg0Y_bRaZYk8cNYWBnIhFeylE9VFxFK_dS8zGT-BUwQqI46O8Z2i76qs4WmNv67fUsRjydZ5T2bVl7WeIhpfUlQQS7hrCsalyeArc96ZalepFZ63p91-dzqdjruVPSNuqqppdvsj7reHY9NUhwe9O-016of9_uG0cme5kfut3B631e64ketdY9TBVHtTHffVfrsX-w32yvl1Scw1xW41WeS8lZvtbrvyqkHPy9PZKP2MwYjd4-Nfnz5__Xt5-OJ5Stsmdyz2G19y_w0tueSnp3cOdnH4BJ8X6ebHaH6oGFRE6B1r6gfn0axy9Of_nPNT7zwnfWn_epb_BgAA__87Z8E3">