[PATCH] D24790: [LoopUnroll] Use the upper bound of the loop trip count to completely unroll loops

Fri Sep 23 08:58:29 PDT 2016

haicheng marked 3 inline comments as done.
haicheng added a comment.

In https://reviews.llvm.org/D24790#550261, @mzolotukhin wrote:

> Hi,
>
> This makes total sense to me. But I think there is no need in introducing yet another knob in loop-unroller: have you measured the effect of the change on benchmarks (LLVM testsuite/SPEC/others)? I think it should be correct to unroll a loop with a small trip count even if it's not exactly known: the effect of this transformation would be not worse then from unrolling a similar loop with a constant, but equal to upper bound, trip-count. The benchmark results would help confirming or rejecting this assumption.
>
> Thanks,
> Michael

Thank you for reviewing my patch, Michael.

My initial design was to completely unroll all loops having small trip count upper bounds whenever the exact trip counts are unknown, but I saw several regressions in spec200x and internal benchmarks (e.g. spec2000/eon -1.8%, spec2000/gcc -1.2%) running on a AArch64 device.  One of the major reason was that I unrolled many loops with calls.  As you may already know, the cost model of call is not that awesome.  BasicTTIImplBase::getUnrollingPreferences() can help me check the call IRs in the loop and I need a boolean anyway to pass the check result so that I create a new entry in UnrollingPreferences.

If using exact trip count to unroll, the unrolled loop usually becomes a giant basic block which is preferable.  However, if using the upper bound to unroll, the unrolled loop usually become a sequence of small basic blocks because it is not safe to merge loop blocks belonging to different iterations.  Some of these blocks may not be executed during runtime.  This is another reason that I think we may need to be more conservative to use upper bound to unroll loops.

I tried several different configurations and the patch I uploaded was the best I found.  No noticeable regressions in the benchmarks I tested and there are several small wins in spec200x (e.g. spec2000/perlbmk +0.8%, spec2006/xalancbmk +0.7%) and many larger wins in our internal benchmarks which are much larger than spec2006.

Haicheng

Repository:
  rL LLVM

https://reviews.llvm.org/D24790