[llvm] [AMDGPU] SIInsertHardClause: add configurable clause length limit (PR #142343)
    Jay Foad via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Mon Jun  2 02:50:35 PDT 2025
    
    
  
jayfoad wrote:
> > Why?
> 
> In certain applications I am seeing large clauses to have a negative impact on performance. For these cases using small clauses (e.g. 4 operations) boosts performance 1-2%. (Disabling clausing entirely does not yield this benefit.) I assume these cases have improved cache efficiency from interleaving lock-stepped waves rather than issuing long sequences of uninterrupted memory requests from each wave in turn.
> 
> I am gather data to see if a lower (than hardware maximum) limit might be beneficial overall.
Fair enough. Should mention in the commit description that it's for performance tuning.
https://github.com/llvm/llvm-project/pull/142343
    
    
More information about the llvm-commits
mailing list