[PATCH] D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Tue Sep 12 04:04:52 PDT 2023

pravinjagtap added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/global_atomics_iterative_scan_fp.ll:172-241
+; IR-ITERATIVE-LABEL: @global_atomic_fsub_div_value(
+; IR-ITERATIVE-NEXT:    [[ID_X:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
+; IR-ITERATIVE-NEXT:    [[DIVVALUE:%.*]] = bitcast i32 [[ID_X]] to float
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
+; IR-ITERATIVE-NEXT:    [[TMP2:%.*]] = trunc i64 [[TMP1]] to i32
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = lshr i64 [[TMP1]], 32
+; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
----------------
foad wrote:
> This fsub code does not look right (both strategies). First you do an fsub-reduction, and then you do an atomic fsub of the reduced value. That is like a double negative - you will end up **adding** the values to the memory location. I think you need to do an **fadd** reduction followed by an atomic fsub, or vice versa.
> 
> Have you run any conformance tests that exercise this code?
This holds true for integer sub also right? I have ran psdb and gfx pipeline which runs some conformance tests. I will take closer look to see test coverage required to exercise this. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156301/new/

https://reviews.llvm.org/D156301