[llvm] [AMDGPU] Disable atomic optimization of fadd/fsub with result (PR #96479)

Wed Jul 3 08:49:27 PDT 2024

b-sumner wrote:

> The golden rule of compiler development is correctness trumps performance. You're welcome to have a fast but broken implementation downstream. Or we could work together on fixing the fast path so it is also correct :)

Absolutely.  We need both correctness and performance.

> The bug was in the uniform path, which does not need to use generate any DPP or "Iterative" code. The fix was to treat uniform inputs the same as divergent inputs, so they _will_ generate some DPP or "Iterative" code.

Makes sense, when the result is needed.

https://github.com/llvm/llvm-project/pull/96479