[llvm] AMDGPU: Fix temporal divergence introduced by machine-sink and performance regression introduced by D155343 (PR #67456)

Thu Oct 5 09:18:10 PDT 2023

petar-avramovic wrote:

> > blockPrologueInterferes in tryToSinkCopy seems to be required,
> > For example see _amdgpu_ps_main2 from llvm/test/CodeGen/AMDGPU/sink-after-control-flow-postra.mir
> > blockPrologueInterferes stops sink of
> > renamable $sgpr0_sgpr1 = COPY $sgpr6_sgpr7
> > past its use $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 $sgpr0_sgpr1, implicit-def $exec, implicit-def $scc, implicit $exec
> 
> Does that make sense? Whatever PostRAMachineSink is trying to do, surely it shouldn't need target-specific block prologue logic to understand that a def can't be sunk past its use...

https://reviews.llvm.org/D121277. Target-prologue instructions are not checked for "def can't be sunk past its use" but skipped as part of SkipPHIsAndLabels. They have to be checked somewhere. Maybe we could remove some checks from blockPrologueInterferes?

https://github.com/llvm/llvm-project/pull/67456