[PATCH] D136806: [Pipelines] Introduce SROA after (full) loop unrolling

Thu Oct 27 07:11:20 PDT 2022

lebedev.ri added a reviewer: reames.
lebedev.ri added a comment.

In D136806#3887547 <https://reviews.llvm.org/D136806#3887547>, @nikic wrote:

> Some context here: LLVM performs unrolling in two places. There is a full unroll as part of the function simplification pipeline (i.e. interleaved with inlining) and a runtime unroll at the end of the module simplification pipeline. The general expectation is that if a loop is going to be fully unrolled, then this should happen during the full unroll pass. We run SROA after full unroll, as well as a significant further optimization pipeline, and do this pre-inlining, so the inlining cost model is correct. Conversely, the final runtime unroll is not supposed to expose significant further optimization opportunities -- it is a late pre-codegen pass.
>
> Of course, runtime unrolling can end up performing full unrolling, for example if the trip count could not yet be determined at the time of full unrolling, but can be determined at the time of runtime unrolling, and this patch can help in such situations. I have encountered cases like this quite a few times with Rust iterator adaptors, but my conclusion from analyzing them was generally that the proper way to address this is to make the trip count computable at full unroll, because this integrates properly with the remaining pipeline (and is the reason why we have that separate full unroll pass in the first place). For example, I've had cases where the code after unrolling reduced to the moral equivalent of "a + b", which is something we want to happen before inlining, not after. The patch at https://reviews.llvm.org/D133192 is motivated by one such case (it allows LICM to do more scalar promotion, which allows SCEV to compute the trip count before full unroll, which ends up reducing the loop to something trivial).
>
> As such, I'm not convinced that this is the right way to go about it. I would suggest to at least analyze the pre-full-unroll IR in your motivating cases and check whether there is anything obvious that can be done to already enable unrolling at that stage, rather than in the late pipeline.

I did. This is not a SCEV/LoopUnroll limitation, but the usual genius of our arbitrarily random cut-offs.
The alternative is to bump `"unroll-max-iteration-count-to-analyze"` to at least `17` and `"unroll-threshold-aggressive"` to at least `337`.
https://godbolt.org/z/PMe6qhaKs
I can't imagine this won't have worse implications for the compile time. Is that preferred?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136806/new/

https://reviews.llvm.org/D136806