[PATCH] D123525: [AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.

Mahesha S via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 18 11:56:37 PDT 2022


hsmhsm added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/spill-agpr.ll:72
 
-; Should spill agprs to memory for both gfx908 and gfx90a.
-; GCN-LABEL: {{^}}max_5regs_used_8a:
----------------
rampitec wrote:
> hsmhsm wrote:
> > rampitec wrote:
> > > hsmhsm wrote:
> > > > rampitec wrote:
> > > > > No reason to split this file. You can just add new function here.
> > > > Actually, I am bit confused here - I am having difficulty in understanding about fixing it.
> > > > 
> > > > Basically, with change in this patch (for gfx908), @max_5regs_used_8a() crashes (asserts) while compiling for gfx908, and hence no output will be generated for the file while compiling for gfx908. 
> > > > 
> > > > To fix this crash, we need to increase num vgprs to 6 (from 5) only for gfx908, but, not for gfx90a, which means, we need to keep two versions of the same function - one for gfx90a (existing one - @max_5regs_used_8a), and another for gfx908 (new one - @max_6regs_used_8a).
> > > > 
> > > > But, the problem is - since the function @max_5regs_used_8a still exist in the file, the compilation for gfx908 still crashes, and hence lit still fails for gfx908.
> > > > 
> > > > How to fix this without spilting the file? 
> > > > 
> > > > 
> > > Isn't it regression and bug by itself in the first place?
> > I agree that the current update (from this patch) to this file has to be properly corrected.
> > 
> > But, what I do not understand is - are you saying this patch introducing new bug, and is a regression? If so how?
> > 
> > Without this patch - what actually happening is - we use 5 vgprs from v0 to v4 for vector operations which is the minimum required budget, and v32 for agpr copy. So, we are still using 6 vgprs here, but v32 is not getting accounted here.
> > 
> > With this patch - we reserve v4 (highest available one) for agpr copy, hence we only left with four vgprs from v0 to v3, and hence RegAlloc issue. It requires to **officially** mention num required vgprs to 6 in that case v5 is used for agpr copy.
> OK, it was using v32 violating the budget requested. Now that makes sense to me. Then you probably should just increase the budget to 6.
But, for gfx90a, still 5 vgprs are minimum requirement. Is it OK, if I make it 6 common for both gfx908 and gfx90a, in that case there is NO need to split the file?



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123525/new/

https://reviews.llvm.org/D123525



More information about the llvm-commits mailing list