[llvm] [AMDGPU][TTI] Threshold bonus to loops whose unrolling makes nested loops unrollable (PR #114579)

Thu Dec 19 06:28:01 PST 2024

lucas-rami wrote:

I moved the optimization to the target-independent part of the loop unrolling pass. In the process, I significantly changed the way the analysis increases the likelihood of the loop being fully unrolled to be consistent with the way unroll cost estimation is generally done in the pass.

Instead of adding an absolute value to the unrolling threshold on encountering an optimizable subloop, my analysis now tries to estimate the per-iteration cost savings that fully unrolling a loop would yield, in an attempt to compare a more precise full unrolling cost against the threshold. The general idea is to find instructions in the loop which will be optimizable away once the loop IV is known. Subloops which may become unrollable if we were to unroll the outer loop yield additional cost savings. Simple control structure akin to if/then(/else) yield additional cost savings as well.

https://github.com/llvm/llvm-project/pull/114579