[llvm] [AMDGPU] Teach SIPreEmitPeephole pass to preserve MachineLoopInfo (PR #178868)

Wed Feb 11 06:27:40 PST 2026

PrasoonMishra wrote:

> > I can implement the conservative approach: disable CFG-changing optimizations (i.e. changing conditional branch to unconditional or removing branch altogether) when the terminator is inside a loop. _Is this approach acceptable?_
> 
> Yes, this seems fine to me.
> 
> I ran a few experiments disabling the branch manipulation and successor removal of `optimizeVcc` when within a loop. Only about about 2% of shaders are affected and in those almost all cases are just a conditional branch remaining. I want to be naively optimistic that on recent hardware the branch predictor can do something useful with conditional branch that is always/never taken within a hot loop.
> 
> If for some reason we really need to convert a branch to unconditional and maintain CFG -- i.e. a significant perf regression. Then we can convert all the other branches of the block into conditionals which will never be taken. For example SCC values are not maintained over block boundaries, so we could initialize SCC to 1 in the terminators and convert the old branches to `S_CBRANCH_SCC0` (reordering as required).

Sorry for the delay in responding.
I initially tried the conservative approach (i.e. disable CFG-changing optimizations) but that caused 12 lit test failures since many tests expect these optimizations to fire:
```
********************
Failed Tests (12):
  LLVM :: CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
  LLVM :: CodeGen/AMDGPU/exec-mask-opt-cannot-create-empty-or-backward-segment.ll
  LLVM :: CodeGen/AMDGPU/fix-sgpr-copies-phi-regression-issue130646-issue130119.ll
  LLVM :: CodeGen/AMDGPU/indirect-addressing-si.ll
  LLVM :: CodeGen/AMDGPU/infinite-loop.ll
  LLVM :: CodeGen/AMDGPU/kill-infinite-loop.ll
  LLVM :: CodeGen/AMDGPU/memcpy-crash-issue63986.ll
  LLVM :: CodeGen/AMDGPU/no-fold-accvgpr-mov.ll
  LLVM :: CodeGen/AMDGPU/optimize-negated-cond.ll
  LLVM :: CodeGen/AMDGPU/si-annotate-nested-control-flows.ll
  LLVM :: CodeGen/AMDGPU/wave32.ll
  LLVM :: CodeGen/AMDGPU/wqm.ll
```
So, I implemented an approach to incrementally update MLI before each `removeSuccessor`, I check if the edge is the last back-edge to a loop header, and if so, destroy the loop from MLI. The catch is that I won't be removing dead code, so MLI information can be slightly imprecise in rare scenarios (i.e. cascading forward edge removal could have led to loop removal). As per my use case in VNOPs hoisting, this slight imprecise loop info won't impact.

https://github.com/llvm/llvm-project/pull/178868