[llvm] [AMDGPU] SIInsertHardClause: add configurable clause length limit (PR #142343)

Carl Ritson via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 2 01:58:51 PDT 2025


perlfu wrote:

> Why?

In certain applications I am seeing large clauses to have a negative impact on performance.
For these cases using small clauses (e.g. 4 operations) boosts performance 1-2%.
(Disabling clausing entirely does not yield this benefit.)
I assume these cases have improved cache efficiency from interleaving lock-stepped waves rather than issuing long sequences of uninterrupted memory requests from each wave in turn.

I am gather data to see if a lower (than hardware maximum) limit might be beneficial overall.


https://github.com/llvm/llvm-project/pull/142343


More information about the llvm-commits mailing list