[PATCH] D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask.
Michael Liao via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Apr 3 08:16:56 PDT 2021
hliao added a comment.
In D99507#2665302 <https://reviews.llvm.org/D99507#2665302>, @nhaehnle wrote:
>> I think this requires a lot more thought.
>
> +1
>
> What I'd like to know: why are we reloading a lane mask via V_READFIRSTLANE in the first place? I would expect one of two types of reload:
>
> 1. Load from a fixed lane of a VGPR using V_READLANE.
That depends on how we spill a SGPR by writing a fixed lane or write an active lane. The 1st one, without saving/restoring, we will overwrite the live values in the inactive lanes. HPC workloads are hit by that issue and cannot run correctly. Instead, writing into active lanes won't need to save/restore those lanes as they are actively maintained in RA. That minimizes the overhead when you have to spill an SGPR. As a result, we need to READFIRSTLANE correspondingly when an SGPR needs reloading. Exec mask 0 makes that READFIRSTLANE undefined and we need to ensure proper exec mask is used.
> 2. Load directly from memory using an SMEM load instruction.
>
> Both types of reload should work just fine with exec=0.
>
> Keeping a lane mask in a VGPR is fundamentally a nonsensical thing to do because it clashes with the whole theory of how different types of data (uniform vs. divergent) are represented in AMDGPU's implementation of SIMT. So I'd really rather we fix that instead of adding yet another hack onto the existing pile of hacks. At the very least, we need to understand this better.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D99507/new/
https://reviews.llvm.org/D99507
More information about the llvm-commits
mailing list