[all-commits] [llvm/llvm-project] a7f60b: AMDGPU: Regenerate mir test checks to include -NEXT
Matt Arsenault via All-commits
all-commits at lists.llvm.org
Tue Feb 8 08:15:06 PST 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: a7f60bfdf663e1e4092c885139623ea682c73823
https://github.com/llvm/llvm-project/commit/a7f60bfdf663e1e4092c885139623ea682c73823
Author: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: 2022-02-08 (Tue, 08 Feb 2022)
Changed paths:
M llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
M llvm/test/CodeGen/AMDGPU/spill-agpr-partially-undef.mir
M llvm/test/CodeGen/AMDGPU/spill-agpr.mir
Log Message:
-----------
AMDGPU: Regenerate mir test checks to include -NEXT
Commit: 8b2ca766f0e58a2a094a4dffbf5b035d584ef475
https://github.com/llvm/llvm-project/commit/8b2ca766f0e58a2a094a4dffbf5b035d584ef475
Author: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: 2022-02-08 (Tue, 08 Feb 2022)
Changed paths:
M llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
M llvm/test/CodeGen/AMDGPU/accvgpr-copy.mir
A llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
M llvm/test/CodeGen/AMDGPU/agpr-copy-no-vgprs.mir
M llvm/test/CodeGen/AMDGPU/agpr-copy-sgpr-no-vgprs.mir
M llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx908.mir
M llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll
Log Message:
-----------
AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908
We need to guarantee cheap copies between AGPRs, and unfortunately
gfx908 cannot directly do this. Theoretically we could set the
scavenger up with an emergency spill slot, but it also feels
unreasonable to pay that cost for what was assumed to be a simple and
cheap copy. Pick a register that doesn't conflict with any ABI
registers.
This does not address the same issue when copying from SGPR to AGPR
for gfx90a (this coincidentally fixes it for gfx908), but that's less
interesting since the register allocator shouldn't be proactively
introducing such copies.
One edge case I'm worried about is respecting the VGPR budget implied
by amdgpu-waves-per-eu. If the theoretical upper bound of a function
is 32 VGPRs, this will force the actual count to be 33.
This is also broken if inline assembly uses/defs something in v32. The
coalescer will eliminate the intermediate vreg between the def and
use, and the introduced copy will clobber the user value.
(cherry picked from commit 3335784ac2d587ff4eac04586e189532ae8b2607)
Compare: https://github.com/llvm/llvm-project/compare/768b50df2969...8b2ca766f0e5
More information about the All-commits
mailing list