[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

nadav wurembrand nwurembrand at apple.com
Tue Jan 21 06:48:37 PST 2014

Hi ,
please remove me from this thread, i think  you meant to sent it to Nadav Rotem 

Nadav Wurembrand
 Apple Inc., HDC | Mail: nadav at apple.com | Mobile: +972-54-975-5016

On 21 בינו 2014, at 16:18, Diego Novillo <dnovillo at google.com> wrote:

> On 16/01/2014, 23:47 , Andrew Trick wrote:
>> On Jan 15, 2014, at 4:13 PM, Diego Novillo <dnovillo at google.com> wrote:
>>> Chandler also pointed me at the vectorizer, which has its own
>>> unroller. However, the vectorizer only unrolls enough to serve the
>>> target, it's not as general as the runtime-triggered unroller. From
>>> what I've seen, it will get a maximum unroll factor of 2 on x86 (4 on
>>> avx targets). Additionally, the vectorizer only unrolls to aid
>>> reduction variables. When I forced the vectorizer to unroll these
>>> loops, the performance effects were nil.
>> Vectorization and partial unrolling (aka runtime unrolling) for ILP should to be the same pass. The profitability analysis required in each case is very closely related, and you never want to do one before or after the other. The analysis makes sense even for targets without vector units. The “vector unroller” has an extra restriction (unlike the LoopUnroll pass) in that it must be able to interleave operations across iterations. This is usually a good thing to check before unrolling, but the compiler’s dependence analysis may be too conservative in some cases.
> In addition to tuning the cost model, I found that the vectorizer does not even choose to get that far into its analysis for some loops that I need unrolled. In this particular case, there are three loops that need to be unrolled to get the performance I'm looking for. Of the three, only one gets far enough in the analysis to decide whether we unroll it or not.
> But I found a bigger issue. The loop optimizers run under the loop pass manager (I am still trying to wrap my head around that. I find it very odd and have not convinced myself why there is a separate manager for loops). Inside the loop pass manager, I am not allowed to call the block frequency analysis. Any attempts I make at scheduling BF analysis, sends the compiler into an infinite loop during initialization.
> Chandler suggested a way around the problem. I'll work on that first.
>> Currently, the cost model is conservative w.r.t unrolling because we don't want to increase code size. But minimally, we should unroll until we can saturate the resources/ports. e.g. a loop with a single load should be unrolled x2 so we can do two loads per cycle. If you can come up with improved heuristics without generally impacting code size that’s great.
> Oh, code size will always go up. That's pretty much unavoidable when you decide to unroll. The trick here is to only unroll select loops. The profiler does not tell you the trip count. What it will do is cause the loop header to be excessively heavy wrt its parent in the block frequency analysis. In this particular case, you get something like:
> ---- Block Freqs ----
>  entry = 1.0
>   entry -> if.else = 0.375
>   entry -> if.then = 0.625
>  if.then = 0.625
>   if.then -> if.end3 = 0.625
>  if.else = 0.375
>   if.else -> for.cond.preheader = 0.37487
>   if.else -> if.end3 = 0.00006
>  for.cond.preheader = 0.37487
>   for.cond.preheader -> for.body.lr.ph = 0.37463
>   for.cond.preheader -> for.end = 0.00018
>  for.body.lr.ph = 0.37463
>   for.body.lr.ph -> for.body = 0.37463
>  for.body = 682.0
>   for.body -> for.body = 681.65466
>   for.body -> for.end = 0.34527
>  for.end = 0.34545
>   for.end -> if.end3 = 0.34545
>  if.end3 = 0.9705
> Notice how the head of the loop has weight 682, which is 682x the weight of its parent (the function entry, since this is an outermost loop).
> With static heuristics, this ratio is significantly lower (about 3x).
> When we see this, we can decide to unroll the loop.
> Diego.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/050cc105/attachment.html>

More information about the llvm-dev mailing list