[PATCH] D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 12 05:05:39 PDT 2023
foad added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/global_atomics_iterative_scan_fp.ll:172-241
+; IR-ITERATIVE-LABEL: @global_atomic_fsub_div_value(
+; IR-ITERATIVE-NEXT: [[ID_X:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
+; IR-ITERATIVE-NEXT: [[DIVVALUE:%.*]] = bitcast i32 [[ID_X]] to float
+; IR-ITERATIVE-NEXT: [[TMP1:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
+; IR-ITERATIVE-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP1]] to i32
+; IR-ITERATIVE-NEXT: [[TMP3:%.*]] = lshr i64 [[TMP1]], 32
+; IR-ITERATIVE-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
----------------
pravinjagtap wrote:
> pravinjagtap wrote:
> > foad wrote:
> > > This fsub code does not look right (both strategies). First you do an fsub-reduction, and then you do an atomic fsub of the reduced value. That is like a double negative - you will end up **adding** the values to the memory location. I think you need to do an **fadd** reduction followed by an atomic fsub, or vice versa.
> > >
> > > Have you run any conformance tests that exercise this code?
> > This holds true for integer sub also right? I have ran psdb and gfx pipeline which runs some conformance tests. I will take closer look to see test coverage required to exercise this.
> This did not get caught because atomic `fsub` is transformed to `fadd` before we reach atomic-optimizer: https://cuda.godbolt.org/z/56ToP79Pb
For integer sub this is already handled by:
```
const AtomicRMWInst::BinOp ScanOp =
Op == AtomicRMWInst::Sub ? AtomicRMWInst::Add : Op;
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156301/new/
https://reviews.llvm.org/D156301
More information about the llvm-commits
mailing list