[PATCH] D154480: [AMDGPU] Flush vmcnt with any loop extraneous defs

Austin Kerbow via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 5 00:52:00 PDT 2023


kerbowa created this revision.
kerbowa added reviewers: bsaleil, rochauha, foad, nhaehnle.
Herald added subscribers: StephenFan, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl, arsenm.
Herald added a project: All.
kerbowa requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

Starts to hoist waitcnt in loops containing the use of a value that was loaded outside of the loop, which also has any VMEM load inside of the loop that defines a value that is used outside of the loop.

example:
v0 = load(...)
loop {

  ...
  use(v0)
  v1 = load(...)
  ...
  use(v1)
  v2 = load(...)

}
use(v2)

Previously we would not hoist waitcnt to the preheader of any loop which contained any use/def pairs that had any subregisters that were defined and used wholly within the loop. It seems somewhat arbitrary to limit the optimization to loops that only load values but never use them, but I may be missing something. While there is a concern about increased compile time with this change, it is essentially what was done before with FLAT/GLOBAL instructions.

A more thorough approach would try and estimate the minimum number of cycles gained or lost by hoisting the waitcnt, but this would involve further increases in compile time.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D154480

Files:
  llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
  llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D154480.537249.patch
Type: text/x-patch
Size: 6353 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230705/f33361a9/attachment.bin>


More information about the llvm-commits mailing list