[PATCH] D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask.

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Apr 4 06:35:17 PDT 2021


nhaehnle added a comment.

In D99507#2667537 <https://reviews.llvm.org/D99507#2667537>, @hliao wrote:

> In D99507#2665302 <https://reviews.llvm.org/D99507#2665302>, @nhaehnle wrote:
>
>>> I think this requires a lot more thought.
>>
>> +1
>>
>> What I'd like to know: why are we reloading a lane mask via V_READFIRSTLANE in the first place? I would expect one of two types of reload:
>>
>> 1. Load from a fixed lane of a VGPR using V_READLANE.
>
> That depends on how we spill a SGPR by writing a fixed lane or write an active lane. The 1st one, without saving/restoring, we will overwrite the live values in the inactive lanes. HPC workloads are hit by that issue and cannot run correctly.

Let me rephrase to make sure I understood you correctly. You're saying that spilling an SGPR to a fixed lane of a VGPR may cause data of an inactive lane to be overwritten. This is a problem if the spill/reload happens in a called function, because VGPR save/reload doesn't save those inactive lanes. (HPC is irrelevant here.)

My understanding is that @sebastian-ne is working on a fix for this, see D96336 <https://reviews.llvm.org/D96336>.

> Instead, writing into active lanes won't need to save/restore those lanes as they are actively maintained in RA. That minimizes the overhead when you have to spill an SGPR.

Wrong, it's way more overhead. Instead of copying the 32-bit value into a single lane of a VGPR, you may be copying it into up to 64 lanes. That's very wasteful. We should fix the root cause instead, which means properly saving/restoring the VGPRs that are used for SGPR-to-VGPR spilling.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99507/new/

https://reviews.llvm.org/D99507



More information about the llvm-commits mailing list