[PATCH] D96517: [AMDGPU] Optimize SGPR to scratch spilling
Sebastian Neubauer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 19 01:25:02 PST 2021
sebastian-ne added a comment.
> Shall we optimize the cases where only 1 or 2 SGPRs are to be spilled or reloaded when there's a VGPR scavenged? In this case, we only need one or two loads/stores to spill/reload that SGPR.
The “v_mov and readfirstlane”-approach doesn’t work when exec=0.
However, you sparked an idea:
If we can scavenge an SGPR, we can use that to save the VGPR lanes that we clobber.
For example, we want to spill s0 to scratch and s5 is currently unused:
v_readlane_b32 s5, v0, 0 ; Save v0
v_writelane_b32 v0, s0, 0 ; Save s0 to v0 and to memory
s_mov_b32 s0, exec
s_mov_b32 exec, 1
buffer_store_dword_offset v0, …
s_mov_b32 exec, s0
v_writelane_b32 v0, s5, 0 ; Restore v0
Restoring:
v_readlane_b32 s5, v0, 0 ; Save v0
v_writelane_b32 v0, s0, 0
s_mov_b32 s0, exec
s_mov_b32 exec, 1
buffer_load_dword_offset v0, … ; Read v0 from memory and into s0
s_mov_b32 exec, s0
v_readlane_b32 s0, v0, 0
v_writelane_b32 v0, s5, 0 ; Restore v0
The downside is, it will make the code even more complicated. Especially restoring, as we need to ensure that exec is exactly 1, so we do not clobber other lanes. The above code would therefore only work in wave32 mode, not in wave64 mode. Except in the case where v0 is a scavenged register, i.e. it is unused in the currently active lanes, in which case we are allowed to clobber currently active lanes of v0, so the above code would also work in wave64 mode.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96517/new/
https://reviews.llvm.org/D96517
More information about the llvm-commits
mailing list