[PATCH] D24790: [LoopUnroll] Use the upper bound of the loop trip count to completely unroll loops

Mon Oct 3 11:03:25 PDT 2016

haicheng added a comment.

Hi Michael,

Please see my inlined response.  Thank you.

In https://reviews.llvm.org/D24790#554730, @mzolotukhin wrote:

> Hi Haicheng,
>
> Thanks for working on this, please find my answers below and some more remarks/nit-picks inline.
>
> > One of the major reason was that I unrolled many loops with calls. As you may already know, the cost model of call is not that awesome.
>
> Maybe we just fix the cost model for calls instead :-) But I know, it might not be actually possible at all.

It should be fixed, but I have no clue how to do that now.  Can I leave it for the future?

>> If using exact trip count to unroll, the unrolled loop usually becomes a giant basic block which is preferable. However, if using the upper bound to unroll, the unrolled loop usually become a sequence of small basic blocks because it is not safe to merge loop blocks belonging to different iterations. Some of these blocks may not be executed during runtime. This is another reason that I think we may need to be more conservative to use upper bound to unroll loops.
> 
> I see, this makes perfect sense to me. Indeed, having separate thresholds might be reasonable.

I tried threshold 100 for using upper bound to unroll instead of the current default threshold 150, but I saw some regressions.

I have some other random thoughts about unroll threshold that not directly related to this patch and not tested with any benchmarks yet.  I think we may want to encourage the unroller to unroll loops with larger trip count to reduce more loop overhead.  For example, if we have a loop whose size is 75 and can be unrolled twice and we have another loop whose size is 21 but can be unrolled 8 times, we may prefer unrolling the latter more.  It occurs to me because I noticed that GCC unrolls more small loops with high trip count than LLVM does and LLVM unrolls more large loops with small trip count than GCC does.  Maybe we can use some equations like 100+10*trip_count to calculate the threshold.  What do you think?

> 
> 
>> I tried several different configurations and the patch I uploaded was the best I found.
> 
> Have you tested it on any other architecture except AArch64?

I ran spec2000/2006 on x86 last week, no noticeable regressions. Some small improvements: spec2000/bzip2 +0.7, spec2006/dealII +2.1% gcc +0.8%.

> Thanks,
> Michael

Repository:
  rL LLVM

https://reviews.llvm.org/D24790