[PATCH] D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Tue Sep 12 04:44:37 PDT 2023

pravinjagtap added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/global_atomics_iterative_scan_fp.ll:172-241
+; IR-ITERATIVE-LABEL: @global_atomic_fsub_div_value(
+; IR-ITERATIVE-NEXT:    [[ID_X:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
+; IR-ITERATIVE-NEXT:    [[DIVVALUE:%.*]] = bitcast i32 [[ID_X]] to float
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
+; IR-ITERATIVE-NEXT:    [[TMP2:%.*]] = trunc i64 [[TMP1]] to i32
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = lshr i64 [[TMP1]], 32
+; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
----------------
pravinjagtap wrote:
> foad wrote:
> > This fsub code does not look right (both strategies). First you do an fsub-reduction, and then you do an atomic fsub of the reduced value. That is like a double negative - you will end up **adding** the values to the memory location. I think you need to do an **fadd** reduction followed by an atomic fsub, or vice versa.
> > 
> > Have you run any conformance tests that exercise this code?
> This holds true for integer sub also right? I have ran psdb and gfx pipeline which runs some conformance tests. I will take closer look to see test coverage required to exercise this. 
This did not get caught because atomic `fsub` is transformed to `fadd` before we reach atomic-optimizer: https://cuda.godbolt.org/z/56ToP79Pb 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156301/new/

https://reviews.llvm.org/D156301