[PATCH] D43876: [LoopUnroll] Peel off iterations if it makes conditions true/false.

Fri Apr 6 10:42:44 PDT 2018

mcrosier added a comment.
Herald added a subscriber: zzheng.

In https://reviews.llvm.org/D43876#1059839, @fhahn wrote:

> In https://reviews.llvm.org/D43876#1057409, @mcrosier wrote:
>
> > In https://reviews.llvm.org/D43876#1057247, @mcrosier wrote:
> >
> > > In https://reviews.llvm.org/D43876#1050460, @fhahn wrote:
> > >
> > > > @mcrosier I've submitted https://reviews.llvm.org/D44983 for review. It prevents peeling, if we cannot simplify the loop body after peeling. Peeling if we cannot simplify the loop body afterwards is likely not beneficial. It would be great if you could check if that helps in your case. If it's not easy for you to check, I can try and test it myself.
> > > >
> > > > Coming up with some additional heuristics, e.g. based on the number of instructions peeled and not eliminated, should be possible, but without knowing the inlining situation we would probably have to choose a rather arbitrary threshold.
> > >
> > >
> > > Sure, I'll take a look now and let you know!  Sorry I didn't see this sooner.
> >
> >
> > Unfortunately, https://reviews.llvm.org/D44983 does not fix this case. I'm going to dig into this now.  I'll update you once I have some additional findings.
>
>
> I had a closer look at FullPelBlockMotionBiPred and we peeled off an iteration  because we have something like
>
>   if (i % 2) {
>     ...
>     if (i != 0) {...}
>    ...
>   }
>   
>
> Peeling based on those nested conditional is likely to increase the code size too much compared to the benefit. https://reviews.llvm.org/D45374 only considers conditions in blocks that are executed on every iteration.

IIUC, https://reviews.llvm.org/D45374 should fix the regression in h264ref we're seeing (but I haven't tested it yet).  However, I wanted to share with you my findings.  Currently, loop peeling will not peel a loop if it includes a function call that is likely to be inlined (i.e., is not marked with a noinline attribute, has internal linkage and has a single use).  This is exactly the case we're dealing with here except FullPelBlockMotionBiPred isn't marked as internal until the LTO phase of compilation.  Thus, one possible approach would be to defer peeling until the LTO phase.  After r329392, this can be accomplished with a small change to the pass manager:

  diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp
  index 1def2e8..b06ca2a 100644
  --- a/lib/Transforms/IPO/PassManagerBuilder.cpp
  +++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
  @@ -651,7 +651,10 @@ void PassManagerBuilder::populateModulePassManager(
     addInstructionCombiningPass(MPM);

     if (!DisableUnrollLoops) {
  -    MPM.add(createLoopUnrollPass(OptLevel));    // Unroll small loops
  +    if (PrepareForLTO)
  +      MPM.add(createSimpleLoopUnrollPass(OptLevel));    // Unroll small loops
  +    else
  +      MPM.add(createLoopUnrollPass(OptLevel));    // Unroll small loops

       // LoopUnroll may generate some redundency to cleanup.
       addInstructionCombiningPass(MPM);

However, I haven't run any performance tests (but I do know if fixes the h264ref regression) nor do I know if the community would be interested in such an approach.

A more targeted approach might be to only prevent peeling of the loop (rather than disable peeling entirely) during the first phase of LTO if the loop has any calls.

Repository:
  rL LLVM

https://reviews.llvm.org/D43876