[llvm] AMDGPU: Fix temporal divergence introduced by machine-sink and performance regression introduced by D155343 (PR #67456)

Fri Oct 6 03:00:55 PDT 2023

petar-avramovic wrote:

> > > Does that make sense? Whatever PostRAMachineSink is trying to do, surely it shouldn't need target-specific block prologue logic to understand that a def can't be sunk past its use...
> > 
> > 
> > https://reviews.llvm.org/D121277. Target-prologue instructions are not checked for "def can't be sunk past its use" but skipped as part of SkipPHIsAndLabels. They have to be checked somewhere. Maybe we could remove some checks from blockPrologueInterferes?
> 
> It looks like this new change totally supersedes what D121277 was trying to do. Perhaps it can be reverted in its entirety?

This change is independent from D121277. It supersedes and reverts D155343 which also used blockPrologueInterferes.

> I think we need to move towards fully modeling the uniform and divergent CFGs in MachineBasicBlock,

What we need for this, we have pseudo branches for divergent control flow. Only thing I could think of would be to have dedicated sgpr register class that would be used for lane masks only.

> and use dedicated restricted pseudo-copies where there's a potential temporal divergence issue

Where/how would we insert them? I am convinced that we can detect temporal divergent use at any point in the pipeline.

https://github.com/llvm/llvm-project/pull/67456