[all-commits] [llvm/llvm-project] c62e2a: StructurizeCFG: Optimize phi insertion during ssa ...

Wed Aug 7 23:48:10 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c62e2a2a4ed69d53a3c6ca5c24ee8d2504d6ba2b
      https://github.com/llvm/llvm-project/commit/c62e2a2a4ed69d53a3c6ca5c24ee8d2504d6ba2b
  Author: Ruiling, Song <ruiling.song at amd.com>
  Date:   2024-08-08 (Thu, 08 Aug 2024)

  Changed paths:
    M llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
    M llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
    M llvm/test/CodeGen/AMDGPU/while-break.ll
    M llvm/test/Transforms/StructurizeCFG/AMDGPU/loop-subregion-misordered.ll
    M llvm/test/Transforms/StructurizeCFG/loop-break-phi.ll

  Log Message:
  -----------
  StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)

After investigating more while-break cases, I think we should try to
optimize
the way we reconstruct phi nodes. Previously, we reconstruct each phi 
nodes separately, but this is not optimal. For example:

```
header:
  %v.1 = phi float [ %v, %entry ], [ %v.2, %latch ]
  br i1 %cc, label %if, label %latch

if:
  %v.if = fadd float %v.1, 1.0 
  br i1 %cc2, label %latch, label %exit

latch:
  %v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
  br i1 %cc3, label %exit, label %header

exit:
  %v.3 = phi float [ %v.2, %latch ], [ %v.if, %if ]
```

For this case, we have different copies of value `v`, but there is at
most one copy of value `v` alive at any program point shown above.

The existing ssa reconstruction will use the incoming values from the 
old deleted phi. Below is a possible output after ssa reconstruction.

```
header:
  %v.1 = phi float [ %v, %entry ], [ %v.loop, %Flow1 ]
  br i1 %cc, label %if, label %flow

if:
  %v.if = fadd float %v.1, 1.0 
  br label %flow

flow:
  %v.exit.if = phi float [ %v.if, %if ], [ undef, %header ]
  %v.latch = phi float [ %v.if, %if ], [ %v.1, %header ]

latch:
  br label %flow1

flow1:
  %v.loop = phi float [ %v.latch, %latch ], [ undef, %Flow ]
  %v.exit = phi float [ %v.latch, %latch ], [ %v.exit.if, %Flow ]

exit:
  %v.3 = phi float [ %v.exit, %flow1 ]
```

If we look closely, in order to reconstruct `v.1` `v.2` `v.3`, we are 
having two simultaneous copies of `v` alive at `flow` and `flow1`.
We highly depend on register coalescer to coalesce them together.
But register coalescer may not always be able to coalesce them
because of the complexity in the chain of phi.

On the other side, now that we have only one copy of `v` alive at any 
program point before the transform, why not simplify the phi network
as much as we can? Look at the incoming values of these PHIs:
```
      header    if     latch
v.1:   --       --      v.2 
v.2:   v.1      v.if    --  
v.3:   --       v.if    v.2 
```
If we let them share the same incoming values for these three different
incoming blocks, then we would have only one copy of alive `v` at any 
program point after ssa reconstruction. Something like:

```
header:
  %v.1 = phi float [ %v, %entry ], [ %v.2, %Flow1 ]
  br i1 %cc, label %if, label %flow

if:
  %v.if = fadd float %v.1, 1.0 
  br label %flow

flow:
  %v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]

latch:
  br label %flow1

flow1:
  ...

exit:
  %v.3 = phi float [ %v.2, %flow1 ]
```

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications