[all-commits] [llvm/llvm-project] cf14c7: AMDGPU: Add a pass to rewrite certain undef in PHI

Sun Sep 25 18:56:26 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: cf14c7caacfc17e893f952dd6d0e31f275302cd6
      https://github.com/llvm/llvm-project/commit/cf14c7caacfc17e893f952dd6d0e31f275302cd6
  Author: Ruiling Song <ruiling.song at amd.com>
  Date:   2022-09-26 (Mon, 26 Sep 2022)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPU.h
    A llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
    M llvm/lib/Target/AMDGPU/CMakeLists.txt
    M llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
    A llvm/test/CodeGen/AMDGPU/rewrite-undef-for-phi.ll
    A llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll

  Log Message:
  -----------
  AMDGPU: Add a pass to rewrite certain undef in PHI

For the pattern of IR (%if terminates with a divergent branch.),
divergence analysis will report %phi as uniform to help optimal code
generation.
```
  %if
  | \
  | %then
  | /
  %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ]
```
In the backend, %phi and %uniform will be assigned a scalar register.
But the %undef from %then will make the scalar register dead in %then.
This will likely cause the register being over-written in %then. To fix
the issue, we will rewrite %undef as %uniform. For details, please refer
the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test
changes shown, but this is mandatory for later changes.

Reviewed by: sameerds

Differential Revision: https://reviews.llvm.org/D133840

  Commit: 66325d9ba19dee10adfe587b6c59fad7dc0882bf
      https://github.com/llvm/llvm-project/commit/66325d9ba19dee10adfe587b6c59fad7dc0882bf
  Author: Ruiling Song <ruiling.song at amd.com>
  Date:   2022-09-26 (Mon, 26 Sep 2022)

  Changed paths:
    A llvm/test/CodeGen/AMDGPU/while-break.ll

  Log Message:
  -----------
  AMDGPU: Add a test to show how later optimization works

Differential Revision: https://reviews.llvm.org/D132448

  Commit: 40e9284f3c4c1643ae48afae0658e32d5d39718f
      https://github.com/llvm/llvm-project/commit/40e9284f3c4c1643ae48afae0658e32d5d39718f
  Author: Ruiling Song <ruiling.song at amd.com>
  Date:   2022-09-26 (Mon, 26 Sep 2022)

  Changed paths:
    M llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
    M llvm/test/CodeGen/AMDGPU/loop_break.ll
    M llvm/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll
    M llvm/test/CodeGen/AMDGPU/multilevel-break.ll
    M llvm/test/CodeGen/AMDGPU/nested-loop-conditions.ll
    M llvm/test/CodeGen/AMDGPU/si-annotate-cf.ll
    M llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
    M llvm/test/CodeGen/AMDGPU/vgpr-liverange-ir.ll
    M llvm/test/CodeGen/AMDGPU/while-break.ll
    M llvm/test/Transforms/StructurizeCFG/AMDGPU/loop-subregion-misordered.ll
    M llvm/test/Transforms/StructurizeCFG/interleaved-loop-order.ll
    M llvm/test/Transforms/StructurizeCFG/loop-continue-phi.ll
    M llvm/test/Transforms/StructurizeCFG/one-loop-multiple-backedges.ll
    M llvm/test/Transforms/StructurizeCFG/workarounds/needs-fix-reducible.ll
    M llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll
    M llvm/test/Transforms/StructurizeCFG/workarounds/needs-unified-loop-exits.ll

  Log Message:
  -----------
  StructurizeCFG: prefer reduced number of live values

The instruction simplification will try to simplify the affected phis.
In some cases, this might extend the liveness of values. For example:

  BB0:
   | \
   | BB1
   | /
  BB2:phi (BB0, v), (BB1, undef)

The phi in BB2 will be simplified to v as v dominates BB2, but this is
increasing the number of active values in BB1. By setting CanUseUndef
to false, we will not simplify the phi in this way, this would help
register pressure. This is mandatory for the later change to help
reducing VGPR pressure for AMDGPU.

Reviewed by: foad, sameerds

Differential Revision: https://reviews.llvm.org/D132449

  Commit: a5676a3a7eab3a295ae0482162089a4e366bf9d2
      https://github.com/llvm/llvm-project/commit/a5676a3a7eab3a295ae0482162089a4e366bf9d2
  Author: Ruiling Song <ruiling.song at amd.com>
  Date:   2022-09-26 (Mon, 26 Sep 2022)

  Changed paths:
    M llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
    M llvm/test/CodeGen/AMDGPU/multilevel-break.ll
    M llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
    M llvm/test/CodeGen/AMDGPU/while-break.ll
    M llvm/test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll
    M llvm/test/Transforms/StructurizeCFG/workarounds/needs-unified-loop-exits.ll

  Log Message:
  -----------
  StructurizeCFG: Set Undef for non-predecessors in setPhiValues()

During structurization process, we may place non-predecessor blocks
between the predecessors of a block in the structurized CFG. Take
the typical while-break case as an example:
```
 /---A(v=...)
 |  / \
 ^ B   C
 |  \ /|
 \---L |
     \ /
      E (r = phi (v:C)...)
```
After structurization, the CFG would be look like:
```
 /---A
 |   |\
 |   | C
 |   |/
 |   F1
 ^   |\
 |   | B
 |   |/
 |   F2
 |   |\
 |   | L
 \   |/
  \--F3
     |
     E
```
We can see that block B is placed between the predecessors(C/L) of E.
During phi reconstruction, to achieve the same sematics as before, we
are reconstructing the PHIs as:
  F1: v1 = phi (v:C), (undef:A)
  F3: r = phi (v1:F2), ...
But this is also saying that `v1` would be live through B, which is not
quite necessary. The idea in the change is to say the incoming value
from B is Undef for the PHI in E. With this change, the reconstructed
PHI would be:
  F1: v1 = phi (v:C), (undef:A)
  F2: v2 = phi (v1:F1), (undef:B)
  F3: r  = phi (v2:F2), ...

Reviewed by: sameerds

Differential Revision: https://reviews.llvm.org/D132450

Compare: https://github.com/llvm/llvm-project/compare/7a8b9307cad0...a5676a3a7eab