[llvm] [AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (PR #66082)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 12 06:19:40 PDT 2023
================
@@ -451,7 +451,7 @@ define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_str
; IR-ITERATIVE-NEXT: [[TMP26:%.*]] = bitcast float [[OLDVALUEPHI]] to i32
; IR-ITERATIVE-NEXT: [[TMP27:%.*]] = call i32 @llvm.amdgcn.writelane(i32 [[TMP25]], i32 [[TMP21]], i32 [[TMP26]]) #[[ATTR7]]
; IR-ITERATIVE-NEXT: [[TMP28]] = bitcast i32 [[TMP27]] to float
-; IR-ITERATIVE-NEXT: [[TMP29]] = call float @llvm.experimental.constrained.fsub.f32(float [[ACCUMULATOR]], float [[TMP24]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT: [[TMP29]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP24]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
----------------
jayfoad wrote:
If you're doing the reduction with fadd then you need to start with the identity for fadd, i.e. -0.0.
(For that reason the code changes might be very slightly simpler if you do the reduction with fsub followed by a final atomic fadd.)
https://github.com/llvm/llvm-project/pull/66082
More information about the llvm-commits
mailing list