[PATCH] D115747: [AMDGPU] Hoist waitcnt out of loops when they unecessarily wait for stores

Baptiste Saleil via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 15 14:36:46 PST 2022


bsaleil updated this revision to Diff 409063.
bsaleil added a comment.

Refactoring of the pass. We compute the brackets for both the flushed and non-flushed versions of each outer loop until we finish visiting the loop or until we decide it is not worth to flush in the preheader. Instead of generating the waitcnts, we add them to two separate lists (for the flushed and non-flushed versions). After all the blocks are visited, we generate one of the two lists depending on the decision we made.

For now, the only case in which we may decide to use the flushed version is the original case of this patch, which is when a loop contains no load, and at least one store on GFX9:

v0 = load(...)
loop {

  ...
  use(v0)
  store(...)

}

With the refactoring done, I'm planning to optimize one GFX10 case that we observed in at least one game which is when a loop is only loading values that are not used in the loop:

v0 = load(...)
loop {

  ...
  use(v0)
  v1 = load(...)

}

This will be added in a subsequent patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115747/new/

https://reviews.llvm.org/D115747

Files:
  llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
  llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
  llvm/test/CodeGen/AMDGPU/nested-loop-conditions.ll
  llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D115747.409063.patch
Type: text/x-patch
Size: 27567 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220215/e4fdfeb5/attachment.bin>


More information about the llvm-commits mailing list