[PATCH] D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask.

Michael Liao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 5 07:06:33 PDT 2021


hliao added a comment.

In D99507#2668062 <https://reviews.llvm.org/D99507#2668062>, @nhaehnle wrote:

> In D99507#2667537 <https://reviews.llvm.org/D99507#2667537>, @hliao wrote:
>
>> In D99507#2665302 <https://reviews.llvm.org/D99507#2665302>, @nhaehnle wrote:
>>
>>>> I think this requires a lot more thought.
>>>
>>> +1
>>>
>>> What I'd like to know: why are we reloading a lane mask via V_READFIRSTLANE in the first place? I would expect one of two types of reload:
>>>
>>> 1. Load from a fixed lane of a VGPR using V_READLANE.
>>
>> That depends on how we spill a SGPR by writing a fixed lane or write an active lane. The 1st one, without saving/restoring, we will overwrite the live values in the inactive lanes. HPC workloads are hit by that issue and cannot run correctly.
>
> Let me rephrase to make sure I understood you correctly. You're saying that spilling an SGPR to a fixed lane of a VGPR may cause data of an inactive lane to be overwritten. This is a problem if the spill/reload happens in a called function, because VGPR save/reload doesn't save those inactive lanes. (HPC is irrelevant here.)

We found the inactive lane overwriting issue in a HPC workload where all functions are inlined but its CFG is quite complicated also it has very high SGPR register pressure due to pointer usage. The overwriting is observed in the pseudo code illustrated in https://reviews.llvm.org/D96980. That HPC workload runs correctly after reverting that agnostic SGPR spill.

> My understanding is that @sebastian-ne is working on a fix for this, see D96336 <https://reviews.llvm.org/D96336>.
>
>> Instead, writing into active lanes won't need to save/restore those lanes as they are actively maintained in RA. That minimizes the overhead when you have to spill an SGPR.
>
> Wrong, it's way more overhead. Instead of copying the 32-bit value into a single lane of a VGPR, you may be copying it into up to 64 lanes. That's very wasteful. We should fix the root cause instead, which means properly saving/restoring the VGPRs that are used for SGPR-to-VGPR spilling.

The alternative spill approach adds the overhead to save/restore a VGPR in both SGPR spill and reload. Comparing to VGPR waste, that's newly added memory overhead is a significant one, especially for SGRP spill. Those spills are store-only previously but now add extra memory load overhead. Considering that not all spills are reloaded later in the application runtime, that extra memory load overhead in a spill cannot justice the VGPR space saved. But, I do agree that SGPR to memory spill needs a more efficient mechanism.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99507/new/

https://reviews.llvm.org/D99507



More information about the llvm-commits mailing list