[PATCH] D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Tue Sep 12 05:05:39 PDT 2023

foad added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/global_atomics_iterative_scan_fp.ll:172-241
+; IR-ITERATIVE-LABEL: @global_atomic_fsub_div_value(
+; IR-ITERATIVE-NEXT:    [[ID_X:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
+; IR-ITERATIVE-NEXT:    [[DIVVALUE:%.*]] = bitcast i32 [[ID_X]] to float
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
+; IR-ITERATIVE-NEXT:    [[TMP2:%.*]] = trunc i64 [[TMP1]] to i32
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = lshr i64 [[TMP1]], 32
+; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
----------------
pravinjagtap wrote:
> pravinjagtap wrote:
> > foad wrote:
> > > This fsub code does not look right (both strategies). First you do an fsub-reduction, and then you do an atomic fsub of the reduced value. That is like a double negative - you will end up **adding** the values to the memory location. I think you need to do an **fadd** reduction followed by an atomic fsub, or vice versa.
> > > 
> > > Have you run any conformance tests that exercise this code?
> > This holds true for integer sub also right? I have ran psdb and gfx pipeline which runs some conformance tests. I will take closer look to see test coverage required to exercise this. 
> This did not get caught because atomic `fsub` is transformed to `fadd` before we reach atomic-optimizer: https://cuda.godbolt.org/z/56ToP79Pb 
For integer sub this is already handled by:
```
    const AtomicRMWInst::BinOp ScanOp =
        Op == AtomicRMWInst::Sub ? AtomicRMWInst::Add : Op;
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156301/new/

https://reviews.llvm.org/D156301