[PATCH] D115747: [AMDGPU] Hoist waitcnt out of loops when they unecessarily wait for stores

Mon Dec 20 12:02:09 PST 2021

bsaleil added a comment.

@arsenm the idea is to rotate the processing loop (runOnMachineFunction) of the SIInsertWaitcnt pass, so we can make decisions from predecessors, not the actual machine IR loop containing the waitcnt:

In D115747#3194595 <https://reviews.llvm.org/D115747#3194595>, @foad wrote:

> Note that the loop over basic blocks in runOnMachineFunction currently looks like:
>
>   for each block:
>     get saved state for this block
>     process block
>     merge new state into saved state for each successor
>
> But if necessary we could change it to:
>
>   for each block:
>     merge saved state from each predecessor
>     process block
>     save state for this block

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115747/new/

https://reviews.llvm.org/D115747