[all-commits] [llvm/llvm-project] 66e13c: [AMDGPU] Enable load clustering in the post-RA sch...

Tue Oct 12 08:18:33 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 66e13c7f439cf162d7ed1d25883e71a5755ac7ec
      https://github.com/llvm/llvm-project/commit/66e13c7f439cf162d7ed1d25883e71a5755ac7ec
  Author: Jay Foad <jay.foad at amd.com>
  Date:   2021-10-12 (Tue, 12 Oct 2021)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
    M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
    M llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
    M llvm/test/CodeGen/AMDGPU/idiv-licm.ll
    M llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
    M llvm/test/CodeGen/AMDGPU/sdiv64.ll
    M llvm/test/CodeGen/AMDGPU/srem64.ll
    M llvm/test/CodeGen/AMDGPU/udiv64.ll
    M llvm/test/CodeGen/AMDGPU/urem64.ll

  Log Message:
  -----------
  [AMDGPU] Enable load clustering in the post-RA scheduler

This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
   allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
   pressure, which in some cases gives it more freedom to reorder
   instructions.

Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
  increased by about 1%.
- The number of runs of two or more load instructions increased by
  about 4%.

  Commit: f7ee21aa326fcd07448d5162daf66f3675ffa863
      https://github.com/llvm/llvm-project/commit/f7ee21aa326fcd07448d5162daf66f3675ffa863
  Author: Jay Foad <jay.foad at amd.com>
  Date:   2021-10-12 (Tue, 12 Oct 2021)

  Changed paths:
    M llvm/lib/CodeGen/TwoAddressInstructionPass.cpp

  Log Message:
  -----------
  [TwoAddressInstruction] Remove ad hoc machine verification

With the -early-live-intervals command line flag,
TwoAddressInstructionPass::runOnMachineFunction would call
MachineFunction::verify before returning to check the live intervals.
But there was not much benefit to doing this since -verify-machineinstrs
and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
scheduling machine verification after every pass.

Also it caused problems on targets like Lanai which are marked as "not
machine verifier clean", since verification would fail for known
target-specific problems which are nothing to do with LiveIntervals.

Differential Revision: https://reviews.llvm.org/D111618

Compare: https://github.com/llvm/llvm-project/compare/838b4a533e68...f7ee21aa326f