[all-commits] [llvm/llvm-project] 6b9713: [AMDGPU] Fold more AGPR copies/PHIs in SIFoldOperands

Tue Mar 28 00:33:26 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 6b971325e9eac315b83aa474c01da85b81062d17
      https://github.com/llvm/llvm-project/commit/6b971325e9eac315b83aa474c01da85b81062d17
  Author: pvanhout <pierre.vanhoutryve at amd.com>
  Date:   2023-03-28 (Tue, 28 Mar 2023)

  Changed paths:
    M llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
    A llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir

  Log Message:
  -----------
  [AMDGPU] Fold more AGPR copies/PHIs in SIFoldOperands

Generalize `tryFoldLCSSAPhi` into `tryFoldPhiAGPR` which works
on any kind of PHI node (not just LCSSA ones) and attempts to
create AGPR Phis more aggressively.

Also adds a GFX908-only "cleanup" function `tryOptimizeAGPRPhis`
which tries to minimize AGPR to AGPR copies on GFX908, which doesn't
have a ACCVGPR MOV instruction (so AGPR-AGPR copies become 2 or 3 instructions
as they need a VGPR temp). The reason why this is needed is because D143731
+ the new `tryFoldPhiAGPR` may create a lot more PHIs (one 32xfloat PHI becomes
32 float phis), and if each PHI hits the same AGPR (like in `test_mfma_loop_agpr_init`)
they will be lowered to 32 copies from the same AGPR, which will each become 2-3 instructions.
Creating a VGPR cache in this case prevents all those copies from being generated
(we have AGPR-VGPR copies instead which are trivial).

This is a prepation patch intended to prevent regressions in D143731 when
AGPRs are involved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144099