[PATCH] D109301: [AMDGPU] Enable copy between VGPR and AGPR classes during regalloc

Sun Nov 21 23:03:01 PST 2021

cdevadas added a comment.

The combination of Regcoalescer, superclass copy (tryInstructionSplit), and Spiller during regalloc introduce a superclass spill at extreme pressure situations.
Lit test llvm/test/CodeGen/AMDGPU/schedule-xdl-resource.ll demonstrates that.
This mostly happens for gfx908 where we need a copy for AGPR to VGPR. I didn’t see this happening for gfx90a.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:1469

+  if (RI.isVectorSuperClass(RC)) {
+    // Convert an AV spill into a VGPR spill. Introduce a copy from AV to an
----------------
rampitec wrote:
> When does this happen? On gfx90a there is no need to copy AGPR to VGPR, it can be stored (and loaded) directly.
Yes, this is mostly needed for gfx908.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:1054
+    // needs a copy in such cases.
+    auto CopyMIB = BuildMI(MBB, MI, DL, TII->get(AMDGPU::COPY), Dst)
+                       .addReg(Src, getKillRegState(IsKill));
----------------
rampitec wrote:
> We are already spilling. I.e. we ran out of registers of ValueReg's class and we know it. Why cannot we just copy into a different class instead of introducing an ambiguous AV class?
I didn't follow the question. Are you, in the first place, doubting the need for AV class as a superclass?

================
Comment at: llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll:13
+  ; REGALLOC: GLOBAL_STORE_DWORDX2 undef %{{[0-9]+}}:vreg_64, [[SI_SPILL_V64_RESTORE]]
+  ; REGALLOC: [[COPY3:%[0-9]+]]:vreg_128 = COPY [[V_MFMA_I32_4X4X4I8]]
+  ; REGALLOC: GLOBAL_STORE_DWORDX4 undef %{{[0-9]+}}:vreg_64, [[COPY3]]
----------------
rampitec wrote:
> Can you add gfx90a run line and/or test? This copy should not happen on gfx90a, it can store AGPRs directly.
Will add a RUN line for gfx90a.

================
Comment at: llvm/test/CodeGen/AMDGPU/spill-agpr.ll:10
+; GCN-DAG:     v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
+; GCN-NOT:     buffer_store_dword
+; GCN-NOT:     buffer_load_dword
----------------
rampitec wrote:
> rampitec wrote:
> > Looks like the test does not test AGPR spills anymore, while we need to test it at least for gfx908.
> > Looks like the test does not test AGPR spills anymore, while we need to test it at least for gfx908.
> 
> There shall be a test somewhere to do real spilling into memory on both gfx908 and gfx90a.
Will do.

================
Comment at: llvm/test/CodeGen/AMDGPU/spill-to-agpr-partial.mir:234
+    ; GCN: {{  $}}
+    ; GCN: $vgpr55 = V_ACCVGPR_READ_B32_e64 killed $agpr3, implicit $exec, implicit-def $agpr0_agpr1_agpr2_agpr3, implicit $agpr0_agpr1_agpr2_agpr3
+    ; GCN: SCRATCH_STORE_DWORDX3_SADDR killed $agpr0_agpr1_agpr2, $sgpr32, 0, 0, implicit $exec, implicit $flat_scr, implicit killed $agpr0_agpr1_agpr2_agpr3 :: (store (s96) into %stack.0, align 4, addrspace 5)
----------------
rampitec wrote:
> This should not happen. This is gfx90a and agpr has to be stored directly. This should happen on gfx908 though.
You are right. The original test was for gfx90a. 
The new patterns should be in a separate test for gfx908. I will create one.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109301/new/

https://reviews.llvm.org/D109301