[PATCH] D115747: [AMDGPU] Hoist waitcnt out of loops when they unecessarily wait for stores
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 15 04:25:34 PST 2021
foad added a comment.
Here's the big question: is it REALLY necessary for shouldFlush to examine every instruction in the loop? (This is quite expensive, and could I think cause quadratic behaviour if you have lots of nested loops.) Or could we make the same decision just by examining the WaitcntBrackets at the end of the preheader compared to the WaitcntBrackets coming from the loop backedge(s)? I think this would be a much cleaner approach if it is possible, even if we have to track a bit more state in WaitcntBrackets to make it work.
Note that the loop over basic blocks in runOnMachineFunction currently looks like:
for each block:
get saved state for this block
process block
merge new state into saved state for each successor
But if necessary we could change it to:
for each block:
merge saved state from each predecessor
process block
save state for this block
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115747/new/
https://reviews.llvm.org/D115747
More information about the llvm-commits
mailing list