[PATCH] D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask.

Sat Apr 3 08:16:56 PDT 2021

hliao added a comment.

In D99507#2665302 <https://reviews.llvm.org/D99507#2665302>, @nhaehnle wrote:

>> I think this requires a lot more thought.
>
> +1
>
> What I'd like to know: why are we reloading a lane mask via V_READFIRSTLANE in the first place? I would expect one of two types of reload:
>
> 1. Load from a fixed lane of a VGPR using V_READLANE.

That depends on how we spill a SGPR by writing a fixed lane or write an active lane. The 1st one, without saving/restoring, we will overwrite the live values in the inactive lanes. HPC workloads are hit by that issue and cannot run correctly. Instead, writing into active lanes won't need to save/restore those lanes as they are actively maintained in RA. That minimizes the overhead when you have to spill an SGPR. As a result, we need to READFIRSTLANE correspondingly when an SGPR needs reloading. Exec mask 0 makes that READFIRSTLANE undefined and we need to ensure proper exec mask is used.

> 2. Load directly from memory using an SMEM load instruction.
>
> Both types of reload should work just fine with exec=0.
>
> Keeping a lane mask in a VGPR is fundamentally a nonsensical thing to do because it clashes with the whole theory of how different types of data (uniform vs. divergent) are represented in AMDGPU's implementation of SIMT. So I'd really rather we fix that instead of adding yet another hack onto the existing pile of hacks. At the very least, we need to understand this better.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99507/new/

https://reviews.llvm.org/D99507