[all-commits] [llvm/llvm-project] c88585: [AMDGPU] Enable load clustering in the post-RA sch...
Jay Foad via All-commits
all-commits at lists.llvm.org
Wed Oct 13 09:12:41 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: c885857e9d03dbf2563c43e8a0a99b3f63a4d106
https://github.com/llvm/llvm-project/commit/c885857e9d03dbf2563c43e8a0a99b3f63a4d106
Author: Jay Foad <jay.foad at amd.com>
Date: 2021-10-13 (Wed, 13 Oct 2021)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
M llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
M llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
M llvm/test/CodeGen/AMDGPU/idiv-licm.ll
M llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
M llvm/test/CodeGen/AMDGPU/sdiv64.ll
M llvm/test/CodeGen/AMDGPU/srem64.ll
M llvm/test/CodeGen/AMDGPU/udiv64.ll
M llvm/test/CodeGen/AMDGPU/urem64.ll
Log Message:
-----------
[AMDGPU] Enable load clustering in the post-RA scheduler
This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
pressure, which in some cases gives it more freedom to reorder
instructions.
Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
increased by about 1%.
- The number of runs of two or more load instructions increased by
about 4%.
Differential Revision: https://reviews.llvm.org/D111646
More information about the All-commits
mailing list