[llvm] AMDGPU: Fix temporal divergence introduced by machine-sink and performance regression introduced by D155343 (PR #67456)

Thu Oct 5 17:12:18 PDT 2023

nhaehnle wrote:

> > Does that make sense? Whatever PostRAMachineSink is trying to do, surely it shouldn't need target-specific block prologue logic to understand that a def can't be sunk past its use...
> 
> https://reviews.llvm.org/D121277. Target-prologue instructions are not checked for "def can't be sunk past its use" but skipped as part of SkipPHIsAndLabels. They have to be checked somewhere. Maybe we could remove some checks from blockPrologueInterferes?

It looks like this new change totally supersedes what D121277 was trying to do. Perhaps it can be reverted in its entirety?

> I think we need to stop adding these extremely specific hooks that just identify problematic situations and hack around them in individual transforms. I think we need to move towards fully modeling the uniform and divergent CFGs in MachineBasicBlock, and use dedicated restricted pseudo-copies where there's a potential temporal divergence issue

I agree. Do we actually have a proposal for how this should look? I've seen some ideas and had some myself, but haven't seen anything fully coherent yet.

https://github.com/llvm/llvm-project/pull/67456