[all-commits] [llvm/llvm-project] 62ea79: [AMDGPU] Break Large PHIs: Take whole PHI chains i...

Thu Aug 3 07:41:27 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 62ea799e6c798e84428d77bc84b83fc572b7d75e
      https://github.com/llvm/llvm-project/commit/62ea799e6c798e84428d77bc84b83fc572b7d75e
  Author: pvanhout <pierre.vanhoutryve at amd.com>
  Date:   2023-08-03 (Thu, 03 Aug 2023)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
    M llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-break-large-phis-heuristics.ll

  Log Message:
  -----------
  [AMDGPU] Break Large PHIs: Take whole PHI chains into account

Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain".
The concept of "chain" is important because if we only break a chain partially, we risk forcing regalloc to reserve twice as many registers for that vector.
We also risk adding a lot of copies that shouldn't be there and can inhibit backend optimizations.

The solution I found is to consider the whole "PHI chain" when looking at PHI.
That is, we recursively look at the PHI's incoming value & users for other PHIs, then make a decision about the chain as a whole.

The currrent threshold requires that at least `ceil(chain size * (2/3))` PHIs have at least one interesting incoming value.
In simple terms, two-thirds (rounded up) of the PHIs should be breakable.

This seems to work well. A lower threshold such as 50% is too aggressive because chains can often have 7 or 9 PHIs, and breaking 3+ or 4+ PHIs in those case often causes performance issue.

Fixes SWDEV-409648, SWDEV-398393, SWDEV-413487

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156414