[all-commits] [llvm/llvm-project] 68a0a3: [AggressiveAntiDepBreaker] Tweak the fix for renam...

Mon Aug 7 07:41:58 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 68a0a3737163d5ee4cb5e8bb10128dc589e40d9a
      https://github.com/llvm/llvm-project/commit/68a0a3737163d5ee4cb5e8bb10128dc589e40d9a
  Author: Jay Foad <jay.foad at amd.com>
  Date:   2023-08-07 (Mon, 07 Aug 2023)

  Changed paths:
    M llvm/lib/CodeGen/AggressiveAntiDepBreaker.cpp
    M llvm/test/CodeGen/Hexagon/autohvx/fp-to-int.ll
    M llvm/test/CodeGen/Hexagon/autohvx/mulh.ll
    M llvm/test/CodeGen/Hexagon/autohvx/vmpy-parts.ll

  Log Message:
  -----------
  [AggressiveAntiDepBreaker] Tweak the fix for renaming a subregister of a live register

This patch tweaks the fix in D20627 "Do not rename registers that do not
start an independent live range" to only consider Data dependencies, not
Output or Anti dependencies. An Output or Anti dependency to a superreg
does not imply that that superreg is live at the current instruction.

This enables breaking anti-dependencies in a few more cases as shown by
the lit test updates.

Differential Revision: https://reviews.llvm.org/D156879

  Commit: 97324f6274184e607fa6d6cffb1aebee317d4644
      https://github.com/llvm/llvm-project/commit/97324f6274184e607fa6d6cffb1aebee317d4644
  Author: Jay Foad <jay.foad at amd.com>
  Date:   2023-08-07 (Mon, 07 Aug 2023)

  Changed paths:
    M llvm/lib/CodeGen/AggressiveAntiDepBreaker.cpp
    M llvm/lib/CodeGen/AggressiveAntiDepBreaker.h

  Log Message:
  -----------
  [AggressiveAntiDepBreaker] Refix renaming a subregister of a live register

This patch reworks the fix from D20627 "Do not rename registers that do
not start an independent live range". That fix depended on the scheduler
dependency graph having redundant edges. Those edges are removed by
D156552 "[MachineScheduler] Track physical register dependencies
per-regunit" with the result that on several Hexagon lit tests, the
post-RA scheduler would schedule the code in a way that fails machine
verification.

Consider this code where D11 is a pair R23:R22:

SU(0): %R2<def> = A2_add %R23, %R17<kill>
    (anti dependency on R23 here)
SU(8): %R23<def> = S2_asr_i_r %R22, 31
    (data dependency on R23->D11 here)
SU(10): %D0<def> = A2_tfrp %D11<kill>

The original fix would detect this situation by examining the dependency
from SU(8) to SU(10) and seeing that D11 is not a subreg of R23.

A slightly more complicated example:

SU(0): %R2<def> = A2_add %R23, %R17<kill>
    (anti dependency on R23 here)
SU(8): %R23<def> = S2_asr_i_r %R22, 31
    (data dependency on R23 here)
SU(9): %R23<def> = S2_asr_i_r %R23, 31
    (data dependency on R23->D11 here)
SU(10): %D0<def> = A2_tfrp %D11<kill>

The original fix also worked on this example, but only because
ScheduleDAGInstrs adds an extra data dependency edge directly from SU(8)
to SU(10). This edge is redundant, since you could infer it transitively
from the edges SU(8)->SU(9) and SU(9)->SU(10), and since none of the
data that SU(8) writes to R23 is read by SU(10).

After D156552 the redundant edge SU(8)->SU(10) will not be present, so
when we examine the successors of SU(8) we will not find any that read
from a superreg of R23.

This patch removes the original fix from D20627, which examined edges in
the dependency graph. Instead it extends a check that was already being
done in FindSuitableFreeRegisters: instead of checking that *some*
register is a superreg of all registers in the rename group, we now
check that the specific register that carries the anti-dependency that
we want to break is a superreg of all registers in the rename group.

Differential Revision: https://reviews.llvm.org/D156880

  Commit: 56d92c17583e5f0b5e1e521b5f614be79436fccc
      https://github.com/llvm/llvm-project/commit/56d92c17583e5f0b5e1e521b5f614be79436fccc
  Author: Jay Foad <jay.foad at amd.com>
  Date:   2023-08-07 (Mon, 07 Aug 2023)

  Changed paths:
    M llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
    M llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
    M llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll
    M llvm/test/CodeGen/AMDGPU/load-global-i16.ll
    M llvm/test/CodeGen/AMDGPU/schedule-physregdeps.mir
    M llvm/test/CodeGen/SystemZ/inline-asm-fp-int-casting-explicit-regs.ll
    M llvm/test/CodeGen/SystemZ/inline-asm-fp-int-casting.ll
    M llvm/test/CodeGen/Thumb2/mve-vldst4.ll
    M llvm/test/CodeGen/Thumb2/mve-vst3.ll

  Log Message:
  -----------
  [MachineScheduler] Track physical register dependencies per-regunit

Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Recommit after fixing AggressiveAntiDepBreaker in D156880.

Differential Revision: https://reviews.llvm.org/D156552

Compare: https://github.com/llvm/llvm-project/compare/8aeb84c1b63f...56d92c17583e