[PATCH] D96517: [AMDGPU] Optimize SGPR to scratch spilling

Mon Feb 22 10:36:25 PST 2021

hliao added a comment.

In D96517#2578930 <https://reviews.llvm.org/D96517#2578930>, @sebastian-ne wrote:

> In D96517#2578884 <https://reviews.llvm.org/D96517#2578884>, @hliao wrote:
>
>> why exec mask = 0 case is a valid one, won't we already branch away once exec mask goes to zero?
>
> That is the question it comes down to. If it is guaranteed that exec is never 0, i.e. at least one bit is always set, I’m in favor of your patch.
>
> To have some numbers, I saw some functions spilling 30 SGPRs to scratch, so it can be more than just a one or two.

The question is that do we have 30 SGPR spills or we have a spill of SGPR register sequence up to 30 SGPR. For there former, we cannot coalesce them together as the spill/reload points would differ quite a lot. Only on the latter one, we benefit from packing that sequence of SGPR into VGPR. For that case, I suggest we do that spill of SGPR sequence with 1 or 2 SGPR with this approach and the pack approach otherwise.

Back to the exec mask 0 concern, my point is that the change on exec mask follow the user-code semantic. If exec mask goes to 0, the user code should expect no code for that BB to be executed. If we translate that correctly, we should not have code executed when exec mask is 0. The case that code is still executed when mask is 0 is that we may optimize away branch if the branch is too short. For that case, we should check whether that branch has unwanted side effect or check whether there are instructions not honoring exec mask. If we found them, we should not remove that branch. Following that, we should not have code executed when exec mask goes to 0.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96517/new/

https://reviews.llvm.org/D96517