[PATCH] D109301: [AMDGPU] Enable copy between VGPR and AGPR classes during regalloc

Tue Sep 21 11:25:27 PDT 2021

rampitec added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/agpr-csr.ll:41
+; GCN-NOT: v_accvgpr
+; GFX90A: buffer_
 ; GCN:        s_setpc_b64
----------------
I think Matt it right, the easiest thing is to have no AGPR CSRs. We certainly do not want these scratch spills.

================
Comment at: llvm/test/CodeGen/AMDGPU/spill-to-agpr-partial.mir:19
+    ; GCN: liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15, $agpr16_agpr17_agpr18_agpr19_agpr20_agpr21_agpr22_agpr23, $agpr24_agpr25_agpr26_agpr27, $agpr28_agpr29, $agpr30
+    ; GCN: SCRATCH_STORE_DWORDX4_SADDR killed $vgpr0_vgpr1_vgpr2_vgpr3, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (store (s128) into %stack.0, align 4, addrspace 5)
+    ; GCN: $vgpr0_vgpr1_vgpr2_vgpr3 = SCRATCH_LOAD_DWORDX4_SADDR $sgpr32, 0, 0, implicit $exec, implicit $flat_scr :: (load (s128) from %stack.0, align 4, addrspace 5)
----------------
cdevadas wrote:
> rampitec wrote:
> > arsenm wrote:
> > > rampitec wrote:
> > > > arsenm wrote:
> > > > > rampitec wrote:
> > > > > > This is greedy, not fastra, the same regression.
> > > > > This isn't allocated at all, this is running just PEI. For CSRs I think we should either not have CSR AGPRs, or use splitCSR
> > > > OK, can we see partial cross class copy anywhere?
> > > Not really, but this is just a general problem with the allocator which I hope to look into soon. It doesn't know how to introduce new subranges to avoid conflicts or to spill the minimum set of required lanes
> > Look, this is tolerable unless these are AGPRs with their 32 register tuples.
> It is true, the Spiller during regalloc doesn't support partial/sub-range tuple spills.
> But it's not the case with copy. Greedy did handle partial tuple copy into a superclass.
> 
> `max_256_vgprs_spill_9x32_2bb` in spill-vgpr-to-agpr.ll, for instance, used to get entire 1024 tuple spill stores and restores.
> Now that has become minimal copies.
> 
> I see similar copies inserted during greedy allocator.
> %1200.sub16_sub17_sub18_sub19:av_1024 = COPY %1201.sub16_sub17_sub18_sub19:vreg_1024  //// copy to the super class, AV.
> %1202.sub16_sub17_sub18_sub19:vreg_1024 = COPY %1200.sub16_sub17_sub18_sub19:av_1024  //// later moving it back to V.
> 
> We currently don't have a test that captures it. I can include one.
This test would be super helpful. We need to make sure we do not copy a whole tuple. Moreover, we need it all ways: v to a, a to v, and all tuple sizes. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109301/new/

https://reviews.llvm.org/D109301