[PATCH] D157495: [WIP] Run SimplifyCFG from Atomic-Expand on CAS loop blocks.

Thu Aug 10 07:11:49 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll:2165-2184
-; GFX1164-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-NEXT:    v_mul_f32_e32 v0, 4.0, v0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s4
----------------
pravinjagtap wrote:
> I am not sure whether this is what we are expecting. None of the existing test-cases need update for this change. I am struggling to demonstrate the actual benefits of running SimplifyCFG of CAS blocks.  
This needs some pure IR tests in test/Transforms/AtomicExpand. You could start by hacking out the aarch64 option and see what breaks for potentially interesting cases.

Does this only do anything if the atomic is in more complex control flow? Does it only do anything if the dominator tree is precomputed? Do you see more changes if you force the dominator tree to be required?

================
Comment at: llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll:50
 ; GFX7LESS-NEXT:    s_cbranch_execnz .LBB0_2
-; GFX7LESS-NEXT:  .LBB0_3:
+; GFX7LESS-NEXT:  .LBB0_3: ; %atomicrmw.end
 ; GFX7LESS-NEXT:    s_endpgm
----------------
Something's not right because all this is doing is starting to preserve a block name. Does this only do anything if the atomic is in more complex control flow? Does it only do anything if the dominator tree is precomputed? Do you see more changes if you force the dominator tree to be required?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157495/new/

https://reviews.llvm.org/D157495