[all-commits] [llvm/llvm-project] c88585: [AMDGPU] Enable load clustering in the post-RA sch...

Wed Oct 13 09:12:41 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c885857e9d03dbf2563c43e8a0a99b3f63a4d106
      https://github.com/llvm/llvm-project/commit/c885857e9d03dbf2563c43e8a0a99b3f63a4d106
  Author: Jay Foad <jay.foad at amd.com>
  Date:   2021-10-13 (Wed, 13 Oct 2021)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
    M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
    M llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
    M llvm/test/CodeGen/AMDGPU/idiv-licm.ll
    M llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
    M llvm/test/CodeGen/AMDGPU/sdiv64.ll
    M llvm/test/CodeGen/AMDGPU/srem64.ll
    M llvm/test/CodeGen/AMDGPU/udiv64.ll
    M llvm/test/CodeGen/AMDGPU/urem64.ll

  Log Message:
  -----------
  [AMDGPU] Enable load clustering in the post-RA scheduler

This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.

Differential Revision: https://reviews.llvm.org/D111646