[PATCH] D145586: [AMDGPU] Tweak PromoteAlloca limits

Thu Mar 9 12:05:30 PST 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp:181
-    MaxVGPRs = ST.getMaxNumVGPRs(ST.getWavesPerEU(F).first);
-    // A non-entry function has only 32 caller preserved registers.
-    // Do not promote alloca which will force spilling.
----------------
rampitec wrote:
> Pierre-vh wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > rampitec wrote:
> > > > > Isn't this still true? You might not see alloca, but you will see spills.
> > > > Trading spills inside a function for CSR spills can be profitable, such as if the spilling occurs in a loop.
> > > > 
> > > > IIRC this was to workaround some compile failures 
> > > One reason for the current limits were 'run of registers' errors in some cases. I'd be really careful extending the limits.
> > I would be okay with leaving the limit at 1/4 (or raising it to 1/3 instead of 1/2), as long as the entry function CC exception is removed. That should be enough for the particular case I'm interested in.
> > I would be okay with leaving the limit at 1/4 (or raising it to 1/3 instead of 1/2), as long as the entry function CC exception is removed. That should be enough for the particular case I'm interested in.
> 
> If Matt is OK trading alloca to spilling let it be.
This pass ought to be smarter about the context and whether the vector would be live across calls.

For this particular case, it’s apparently a pass ordering issue. It seems we run this before inlining which seems wrong 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145586/new/

https://reviews.llvm.org/D145586