[all-commits] [llvm/llvm-project] a7f60b: AMDGPU: Regenerate mir test checks to include -NEXT

Tue Feb 8 08:15:06 PST 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: a7f60bfdf663e1e4092c885139623ea682c73823
      https://github.com/llvm/llvm-project/commit/a7f60bfdf663e1e4092c885139623ea682c73823
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2022-02-08 (Tue, 08 Feb 2022)

  Changed paths:
    M llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
    M llvm/test/CodeGen/AMDGPU/spill-agpr-partially-undef.mir
    M llvm/test/CodeGen/AMDGPU/spill-agpr.mir

  Log Message:
  -----------
  AMDGPU: Regenerate mir test checks to include -NEXT

  Commit: 8b2ca766f0e58a2a094a4dffbf5b035d584ef475
      https://github.com/llvm/llvm-project/commit/8b2ca766f0e58a2a094a4dffbf5b035d584ef475
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2022-02-08 (Tue, 08 Feb 2022)

  Changed paths:
    M llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
    M llvm/test/CodeGen/AMDGPU/accvgpr-copy.mir
    A llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
    M llvm/test/CodeGen/AMDGPU/agpr-copy-no-vgprs.mir
    M llvm/test/CodeGen/AMDGPU/agpr-copy-sgpr-no-vgprs.mir
    M llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx908.mir
    M llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll

  Log Message:
  -----------
  AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908

We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.

This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.

One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.

This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.

(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)

Compare: https://github.com/llvm/llvm-project/compare/768b50df2969...8b2ca766f0e5