[PATCH] D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask.

Thu Apr 1 15:12:33 PDT 2021

nhaehnle requested changes to this revision.
nhaehnle added a comment.
This revision now requires changes to proceed.

> I think this requires a lot more thought.

+1

What I'd like to know: why are we reloading a lane mask via V_READFIRSTLANE in the first place? I would expect one of two types of reload:

1. Load from a fixed lane of a VGPR using V_READLANE.
2. Load directly from memory using an SMEM load instruction.

Both types of reload should work just fine with exec=0.

Keeping a lane mask in a VGPR is fundamentally a nonsensical thing to do because it clashes with the whole theory of how different types of data (uniform vs. divergent) are represented in AMDGPU's implementation of SIMT. So I'd really rather we fix that instead of adding yet another hack onto the existing pile of hacks. At the very least, we need to understand this better.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99507/new/

https://reviews.llvm.org/D99507