[llvm] [AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (PR #66082)

Tue Sep 12 06:19:40 PDT 2023

================
@@ -451,7 +451,7 @@ define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_str
 ; IR-ITERATIVE-NEXT:    [[TMP26:%.*]] = bitcast float [[OLDVALUEPHI]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP27:%.*]] = call i32 @llvm.amdgcn.writelane(i32 [[TMP25]], i32 [[TMP21]], i32 [[TMP26]]) #[[ATTR7]]
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = bitcast i32 [[TMP27]] to float
-; IR-ITERATIVE-NEXT:    [[TMP29]] = call float @llvm.experimental.constrained.fsub.f32(float [[ACCUMULATOR]], float [[TMP24]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP29]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP24]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
----------------
jayfoad wrote:

If you're doing the reduction with fadd then you need to start with the identity for fadd, i.e. -0.0.

(For that reason the code changes might be very slightly simpler if you do the reduction with fsub followed by a final atomic fadd.)

https://github.com/llvm/llvm-project/pull/66082