[PATCH] D156679: [WIP][AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already

Juan Manuel Martinez CaamaƱo via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 7 07:44:42 PDT 2023


jmmartinez updated this revision to Diff 547793.
jmmartinez added a comment.

- Fixed initialization of WaitcntBrackets structure for non-kernel functions (since the counters were initialized to 0, the inserted s_waitcnt 0 at function-entry was removed)
- Updated tests to help spot potential issues with the patch:
  - One issue is that some memory-fence tests with a kernel doing only a memory fence become irrelevant since the wait instructions are removed. Adding some memory accesses to those would help.
- Added `CodeGen/AMDGPU/preserve-user-waitcnt.ll` to test how user-inserted waits are handled. Currently, with `__builtin_amdgcn_s_waitcnt` it is possible that the waitcnt instruction is going to be removed

Thanks for the comments @foad & @kerbowa !

I agree that this transformation would need wider agreement (and more testing than our ci pipeline!).

I'm currently trying to find a way to distinguish between waitcnt instructions coming from the memory-legalizer and the rest.
But I'm far less familiar with the MachineInstruction API than with LLVM-IR.
Is there something like Metadata or Flags that we can turn on/off to indiciate that an instruction is coming from the legalizer?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156679/new/

https://reviews.llvm.org/D156679

Files:
  llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_udec_wrap.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_uinc_wrap.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fp-atomics-gfx940.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
  llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
  llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
  llvm/test/CodeGen/AMDGPU/atomicrmw-nand.ll
  llvm/test/CodeGen/AMDGPU/back-off-barrier-subtarget-feature.ll
  llvm/test/CodeGen/AMDGPU/fence-barrier.ll
  llvm/test/CodeGen/AMDGPU/fence-lds-read2-write2.ll
  llvm/test/CodeGen/AMDGPU/flat_atomics.ll
  llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll
  llvm/test/CodeGen/AMDGPU/flat_atomics_i64_min_max_system.ll
  llvm/test/CodeGen/AMDGPU/flat_atomics_min_max_system.ll
  llvm/test/CodeGen/AMDGPU/force-store-sc0-sc1.ll
  llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll
  llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll
  llvm/test/CodeGen/AMDGPU/gds-allocation.ll
  llvm/test/CodeGen/AMDGPU/global-atomics-fp-wrong-subtarget.ll
  llvm/test/CodeGen/AMDGPU/global-atomics-fp.ll
  llvm/test/CodeGen/AMDGPU/global-saddr-atomics-min-max-system.ll
  llvm/test/CodeGen/AMDGPU/global-saddr-atomics.ll
  llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
  llvm/test/CodeGen/AMDGPU/global-saddr-store.ll
  llvm/test/CodeGen/AMDGPU/global_atomics.ll
  llvm/test/CodeGen/AMDGPU/global_atomics_i64.ll
  llvm/test/CodeGen/AMDGPU/global_atomics_i64_min_max_system.ll
  llvm/test/CodeGen/AMDGPU/global_atomics_min_max_system.ll
  llvm/test/CodeGen/AMDGPU/idemponent-atomics.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.init.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.br.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.p.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.release.all.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.gws.sema.v.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll
  llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll
  llvm/test/CodeGen/AMDGPU/move-to-valu-atomicrmw-system.ll
  llvm/test/CodeGen/AMDGPU/move-to-valu-atomicrmw.ll
  llvm/test/CodeGen/AMDGPU/preserve-user-waitcnt.ll
  llvm/test/CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll
  llvm/test/CodeGen/AMDGPU/waitcnt-agpr.mir
  llvm/test/CodeGen/AMDGPU/waitcnt-no-redundant.mir
  llvm/test/CodeGen/AMDGPU/waitcnt-preexisting-vscnt.mir
  llvm/test/CodeGen/AMDGPU/waitcnt-preexisting.mir
  llvm/test/CodeGen/AMDGPU/waitcnt.mir



More information about the llvm-commits mailing list