[llvm] [AMDGPU] Reschedule loads in clauses to improve throughput (RFC) (PR #102595)
Carl Ritson via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 12 20:17:42 PDT 2024
perlfu wrote:
I am not against enabling `ReorderWhileClustering`; however, it will potentially change RA -- so could have some unexpected negatives.
The point of my examples is that "reorder after clustering+RA" can take the "default output" and achieve results similar to or better than `ReorderWhileClustering` by reordering, with no impact to existing RA or schedules.
I say *better* because each successive waitcnt is only waiting for 1-2 more outstanding loads, rather than 7+ in the default case.
This might seem not useful at first because only a single VALU is issued before potentially another wait, but if we imagine (n) waves in flight, then that means (n) VALUs that can issued.
I have not tested extensively, but I have the impression that this kind of reordering can, in some cases, give a few percentage points performance improvement.
https://github.com/llvm/llvm-project/pull/102595
More information about the llvm-commits
mailing list