[all-commits] [llvm/llvm-project] 5a64c8: [MachineScheduler] Test case for physical register...
Jay Foad via All-commits
all-commits at lists.llvm.org
Sat Jul 29 07:35:14 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 5a64c89c8d663d4287e79a6371e6f64bf8ebfe91
https://github.com/llvm/llvm-project/commit/5a64c89c8d663d4287e79a6371e6f64bf8ebfe91
Author: Jay Foad <jay.foad at amd.com>
Date: 2023-07-29 (Sat, 29 Jul 2023)
Changed paths:
A llvm/test/CodeGen/AMDGPU/schedule-physregdeps.mir
Log Message:
-----------
[MachineScheduler] Test case for physical register dependencies
Differential Revision: https://reviews.llvm.org/D156551
Commit: 1a54671d5405a39de362e9692ce963c0638023bc
https://github.com/llvm/llvm-project/commit/1a54671d5405a39de362e9692ce963c0638023bc
Author: Jay Foad <jay.foad at amd.com>
Date: 2023-07-29 (Sat, 29 Jul 2023)
Changed paths:
M llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
M llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
M llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll
M llvm/test/CodeGen/AMDGPU/load-global-i16.ll
M llvm/test/CodeGen/AMDGPU/schedule-physregdeps.mir
M llvm/test/CodeGen/Hexagon/autohvx/fp-to-int.ll
M llvm/test/CodeGen/Hexagon/autohvx/int-to-fp.ll
M llvm/test/CodeGen/Hexagon/autohvx/vmpy-parts.ll
M llvm/test/CodeGen/SystemZ/inline-asm-fp-int-casting-explicit-regs.ll
M llvm/test/CodeGen/SystemZ/inline-asm-fp-int-casting.ll
M llvm/test/CodeGen/Thumb2/mve-vldst4.ll
M llvm/test/CodeGen/Thumb2/mve-vst3.ll
Log Message:
-----------
[MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:
- The dependency tracking is more accurate and creates fewer useless
edges in the dependency graph. An AMDGPU example, edited for clarity:
SU(0): $vgpr1 = V_MOV_B32 $sgpr0
SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0
There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
SU(1) to SU(2). But the old dependency tracking code also added a
useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.
- On targets like AMDGPU that make heavy use of subregisters, each
register can have a huge number of aliases - it can be quadratic in
the size of the largest defined register tuple. There is a much lower
bound on the number of regunits per register, so iterating over
regunits is faster than iterating over aliases.
The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.
Differential Revision: https://reviews.llvm.org/D156552
Compare: https://github.com/llvm/llvm-project/compare/165841b681c1...1a54671d5405
More information about the All-commits
mailing list