[PATCH] D123525: [AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.

Mon Apr 18 11:51:20 PDT 2022

rampitec added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/spill-agpr.ll:72

-; Should spill agprs to memory for both gfx908 and gfx90a.
-; GCN-LABEL: {{^}}max_5regs_used_8a:
----------------
hsmhsm wrote:
> rampitec wrote:
> > hsmhsm wrote:
> > > rampitec wrote:
> > > > No reason to split this file. You can just add new function here.
> > > Actually, I am bit confused here - I am having difficulty in understanding about fixing it.
> > > 
> > > Basically, with change in this patch (for gfx908), @max_5regs_used_8a() crashes (asserts) while compiling for gfx908, and hence no output will be generated for the file while compiling for gfx908. 
> > > 
> > > To fix this crash, we need to increase num vgprs to 6 (from 5) only for gfx908, but, not for gfx90a, which means, we need to keep two versions of the same function - one for gfx90a (existing one - @max_5regs_used_8a), and another for gfx908 (new one - @max_6regs_used_8a).
> > > 
> > > But, the problem is - since the function @max_5regs_used_8a still exist in the file, the compilation for gfx908 still crashes, and hence lit still fails for gfx908.
> > > 
> > > How to fix this without spilting the file? 
> > > 
> > > 
> > Isn't it regression and bug by itself in the first place?
> I agree that the current update (from this patch) to this file has to be properly corrected.
> 
> But, what I do not understand is - are you saying this patch introducing new bug, and is a regression? If so how?
> 
> Without this patch - what actually happening is - we use 5 vgprs from v0 to v4 for vector operations which is the minimum required budget, and v32 for agpr copy. So, we are still using 6 vgprs here, but v32 is not getting accounted here.
> 
> With this patch - we reserve v4 (highest available one) for agpr copy, hence we only left with four vgprs from v0 to v3, and hence RegAlloc issue. It requires to **officially** mention num required vgprs to 6 in that case v5 is used for agpr copy.
OK, it was using v32 violating the budget requested. Now that makes sense to me. Then you probably should just increase the budget to 6.

================
Comment at: llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr.ll:4

+; GFX908:     register allocation failed: maximum depth for recoloring reached
 ; GFX900:     couldn't allocate input reg for constraint 'a'
----------------
hsmhsm wrote:
> rampitec wrote:
> > hsmhsm wrote:
> > > rampitec wrote:
> > > > What exactly do we test here now? Plus this is an obvious regression.
> > > I am confused here, not getting, what exactly is happening with this lit test.
> > > 
> > > First, forget this change, consider the original file, and focus on the compilation for gfx900. - the expectation here is that the functions @max_10_vgprs_used_9a and @max_10_vgprs_used_1a_partial_spill should fail to compile for gfx900 since there is agpr usage here, which we test in the below check line. 
> > > 
> > > ```
> > > ; GFX900:     couldn't allocate input reg for constraint 'a'
> > > ```
> > > 
> > > But, the quirk here is, - we also test for expected assembly output for other functions (say, max_10_vgprs) while compiling for gfx900. 
> > > 
> > > This is a mystery to me, because, I expected no assembly output when the compiler throws an error. However, llc has a quirk where on error, it will print the asm output if it's defaulting to stdout output from stdin, but won't if you specify -o (courtesy Matt), and we are taking advantage of this quirk nature of llc while testing this file for gfx900.
> > > 
> > > Whether above testing by taking advantage of the quirk nature of llc is right thing to do? Whether this quirk llc behavior is consistent with all types of compiler errors? which I am not sure, for example, with asserts, looks like llc will not exbibit this quirk behavior, meaning stops compilation when asserts.
> > > 
> > > Now, coming to new behavior of testing this file for gfx908 with this patch - basically llc throws below error since 10 vgprs is not enough.
> > > 
> > > ```
> > > register allocation failed: maximum depth for recoloring reached
> > > ```
> > > 
> > > To fix it, we need to increase the num of vgprs from 10 to 11 only for gfx908, but we need to keep 10 while compiling for gfx900. So, we need to keep two versions of same function one for gfx900, and another for gfx908. But, if we keep both the versions in the same file, then compilation for gfx908 still fails because vpgr 10 version is still exist in the file.
> > > 
> > > How to fix it by maintaining everything in a single file without spliting the file? For now, I took advantage of the quirk nature llc as explained above.
> > > 
> > >  
> > Run line for gfx900 can be really just removed and a much simpler and smaller test written to check we are rejecting 'a' constraint before gfx908. It does not require a full copy of the test though.
> > 
> > The second is the real problem, because it can be compiled today w/o your patch.
> for gfx900, I had previously attempted to move the test of rejecting 'a' constraint to a separate file, but then again had to restore it due to review comments. So, I am going to do the same now again, of course by including only required function, not a full copy of the file.
> 
> The second thing is making my job difficult - without splitting the file, it seems not possible to keep both gfx908 and gfx90a test within the same file.
Yes, just a really small separate test for the constraint refusal.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123525/new/

https://reviews.llvm.org/D123525