[all-commits] [llvm/llvm-project] 69153d: AMDGPU: Use GlobalPriority for largest register tu...

Thu Sep 15 08:45:21 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 69153d6c0a3f9110bc455b1cca28a5a71e2ac933
      https://github.com/llvm/llvm-project/commit/69153d6c0a3f9110bc455b1cca28a5a71e2ac933
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2022-09-15 (Thu, 15 Sep 2022)

  Changed paths:
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.td
    M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.large.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
    M llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.ll
    M llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
    M llvm/test/CodeGen/AMDGPU/mfma-no-register-aliasing.ll
    M llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll

  Log Message:
  -----------
  AMDGPU: Use GlobalPriority for largest register tuples

Only do this for 16 and 32 register tuples, although we might want to
extend to 8 tuples.

It's incredibly expensive to spill these, and doing so majorly
interferes with the ability to allocate anything else in the function.

The lit tests show mostly sizeable improvements with a handful of tiny
regressions with large vectors.