[llvm] [AMDGPU] Disable atomic optimization of fadd/fsub with result (PR #96479)

Mon Jun 24 04:50:03 PDT 2024

jayfoad wrote:

> %r = %x + * %y * +0.0

We actually calculate `%r = %x + * %y * uitofp(MbCnt)` and the problem is in the first active lane where `MbCnt` is 0. I think we could avoid all (?) of these problems by first calculating `%y * uitofp(MbCnt)` and then overwriting the first active lane with -0.0, before multiplying by `%y`.

There might be opportunities to simplify this if `%y` is known not to be NaN or infinity. There are definitely opportunities to simplify if we don't care about NaNs or infinities or signed zeroes -- but unfortunately the IR `atomicrmw` instruction does not have fast math flags.

https://github.com/llvm/llvm-project/pull/96479