[llvm] [AMDGPU] Disable atomic optimization of fadd/fsub with result (PR #96479)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 24 04:50:03 PDT 2024


jayfoad wrote:

> %r = %x + * %y * +0.0

We actually calculate `%r = %x + * %y * uitofp(MbCnt)` and the problem is in the first active lane where `MbCnt` is 0. I think we could avoid all (?) of these problems by first calculating `%y * uitofp(MbCnt)` and then overwriting the first active lane with -0.0, before multiplying by `%y`.

There might be opportunities to simplify this if `%y` is known not to be NaN or infinity. There are definitely opportunities to simplify if we don't care about NaNs or infinities or signed zeroes -- but unfortunately the IR `atomicrmw` instruction does not have fast math flags.



https://github.com/llvm/llvm-project/pull/96479


More information about the llvm-commits mailing list