[llvm] [LoopUnroll] Structural cost savings analysis for full loop unrolling (PR #114579)

Thu Dec 19 08:02:31 PST 2024

lucas-rami wrote:

Thanks a lot for adding context here, I see the danger of catastrophic unrolling and I now see that my patch will make it worse. My original motivation for this patch was in fact an AMDGPU kernel structurally similar to my motivating example where all loops need to be fully unrolled to achieve good performance (see discourse post [here](https://discourse.llvm.org/t/unexpected-peeling-decision-when-trying-to-eliminate-compares-inside-loop/82866/3) as well).

I think a solution would be, if there are subloops, to severely limit (or even nullify) the cost savings my analysis estimates if these subloops are too big or have big trip counts (what "too big" means to be defined somehow). If the latter cannot be determined, we would have to err on the "do not unroll" side. Do you think this would prevent catastrophic unrolling and still make the analysis a somewhat useful optimization?

https://github.com/llvm/llvm-project/pull/114579