[PATCH] D109301: [AMDGPU] Enable copy between VGPR and AGPR classes during regalloc

Wed Nov 10 11:16:54 PST 2021

rampitec added a comment.

In D109301#3121154 <https://reviews.llvm.org/D109301#3121154>, @cdevadas wrote:

> Recently added lit test `llvm/test/CodeGen/AMDGPU/schedule-xdl-resource.ll` has extreme pressure situations and the regalloc ends up inserting copies between virtual registers of identical regclasses.
> It’s due to the allocator’s choice to spill the AGPRs and later restore them into its superclass.
>
> **After regalloc:**
> %563:areg_1024 = V_MFMA_F32_32X32X4F16_e64 %136, %136, %568, 1, 1, 1, implicit $mode, implicit $exec
> **SI_SPILL_A1024_SAVE** %563, %stack.2, $sgpr32, 0,   // agpr spill
> ...
> %555:av_1024 = **SI_SPILL_V1024_RESTORE** %stack.2, $sgpr32, 0 // restores to AV class.
> %461.sub3:vreg_128 = COPY %555.sub31
> %461.sub2:vreg_128 = COPY %555.sub30
> %461.sub1:vreg_128 = COPY %555.sub29
> %461.sub0:vreg_128 = COPY %555.sub28
> GLOBAL_STORE_DWORDX4_SADDR %151, %461, renamable $sgpr6_sgpr7, 112, 0
>
> The superclass eventually gets VGPRs.
> **After virtual reg-rewriter:**
> $agpr32_agpr33_..._agpr62_agpr63 = V_MFMA_F32_32X32X4F16_e64 $vgpr9_vgpr10, $vgpr9_vgpr10, killed $agpr32_agpr33_..._agpr62_agpr63, 1, 1, 1
> **SI_SPILL_A1024_SAVE** killed $agpr32_agpr33_..._agpr62_agpr63, %stack.2, $sgpr32, 0  // AGPR spill
> ...
> $vgpr5_vgpr6_..._vgpr35_vgpr36 = **SI_SPILL_V1024_RESTORE** %stack.2, $sgpr32, 0 // restores to VGPRs.
> renamable $vgpr4 = COPY renamable $vgpr36
> renamable $vgpr3 = COPY renamable $vgpr35
> renamable $vgpr2 = COPY renamable $vgpr34
> renamable $vgpr1 = COPY renamable $vgpr33
> GLOBAL_STORE_DWORDX4_SADDR renamable $vgpr0, killed renamable $vgpr1_vgpr2_vgpr3_vgpr4, renamable $sgpr6_sgpr7, 112, 0,
>
> These VGPR copies are redundant here and should have optimized away. But they exist in the ISA.
> It is possible that identical regclass copies are needed for alignment constraints in gfx90a and above. 
> Not sure how do we deal with it. Should we optimize them late in the pre-emit-peephole?

You cannot optimize it in pre-emit peephole as it will create new hazards which will not be handled.
That is also not what we would want on gfx90a: '$vgpr5_vgpr6_.. ='. I am not sure if spilling code would handle it correctly but this is a misaligned tuple.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109301/new/

https://reviews.llvm.org/D109301