[PATCH] D45374: [LoopUnroll] Limit peeling to conds in BBs executed on every iteration.

Tue Apr 10 12:42:07 PDT 2018

mcrosier added a comment.

In https://reviews.llvm.org/D45374#1061368, @fhahn wrote:

> Thanks for having a look and sorry for not being clearer. Chad discovered a regression in SPEC2006's h264ref with LTO, caused by this change. The problem was that we peeled off an iteration of a big loop before LTO. That caused the function to be too big for the inliner during LTO, whereas it would be inlined before. We based the peeling decision on a nested condition. With this patch I tried to find a balance between increasing code size and benefits of peeling (simplifying nested conditionals are likely to have less positive impact than top-level ones).
>
> @mcrosier did some more digging and found that we might just want to run simple unrolling before LTO and normal unrolling/peeling during LTO, which makes a sense to me. With that, we would not need this patch (or we could only consider top-level conditionals for "simple" peeling) and IMO that is what we should try to do.

Prior to loop peeling the function we'd like to inlined in h264ref has a single use.

Currently, (non-simple) loop peeling will not peel a loop if it includes a function call that is likely to be inlined (i.e., is not marked with a noinline attribute, has internal linkage and has a single use). This is exactly the case we're dealing with in h264ref, except the function to be inlined isn't marked as internal until the LTO phase of compilation. Thus, one possible approach would be to defer peeling until the LTO phase.

In https://reviews.llvm.org/D45374#1061368, @fhahn wrote:

> Thanks for having a look and sorry for not being clearer. Chad discovered a regression in SPEC2006's h264ref with LTO, caused by this change. The problem was that we peeled off an iteration of a big loop before LTO. That caused the function to be too big for the inliner during LTO, whereas it would be inlined before. We based the peeling decision on a nested condition. With this patch I tried to find a balance between increasing code size and benefits of peeling (simplifying nested conditionals are likely to have less positive impact than top-level ones).
>
> @mcrosier did some more digging and found that we might just want to run simple unrolling before LTO and normal unrolling/peeling during LTO, which makes a sense to me. With that, we would not need this patch (or we could only consider top-level conditionals for "simple" peeling) and IMO that is what we should try to do.

Quick update on this Florian.  I think my initial analysis was a little off or rather the regression in h264ref was actually just noise.  I tried reverting r327671 (your change to peeling) on ToT this afternoon to verify the regression and now the revert itself is causing a regression.  Further, after r324557 (which changed the codegen optimization level when compiling with gold) was reverted in r329458, I'm now showing that h264ref is ahead by 3.19% when comparing ToT to Clang 6.0.  Thus, I not sure this change is worth pursuing if the sole purpose is to address the h264ref regression (which doesn't appear to be a real regression after all).

https://reviews.llvm.org/D45374