[PATCH] D20017: Aggressive choosing best loop top

Thu May 12 10:22:51 PDT 2016

On Thu, May 12, 2016 at 8:27 AM, Chuang-Yu Cheng <
cycheng at multicorewareinc.com> wrote:

> cycheng planned changes to this revision.
> cycheng added a comment.
>
> davidxl:
>
> Yes, -force-precise-rotation-cost=true solve this issue, and the result
> looks better then this patch. I hope we can enable it by default.
>
> Test on Power8 with ref data size (largest size), iteration = 1, cpu
> frequency governor is "performance" (maximum speed), bind on physical-cpu 0:
>
> - llvm: r269273 (2016 May 12)
> - -force-precise-rotation-cost=true v.s. false, > 1.0 means true is good.
> - Result:
>
>   400.perlbench         1.00x
>   401.bzip2             1.00x
>   403.gcc               1.07x
>   429.mcf               1.04x
>   445.gobmk             1.02x
>   456.hmmer             1.00x
>   458.sjeng             1.01x
>   462.libquantum        1.00x
>   464.h264ref           1.04x
>   471.omnetpp           1.06x
>   473.astar             0.97x
>   483.xalancbmk         0.99x
>
>
>
> - llvm r267873 (2016 Apr 28) + this patch
> - aggressive-best-top=true v.s. false, > 1.0 means true is good.
> - Result:
>
>   400.perlbench         1.01x
>   401.bzip2             1.01x
>   403.gcc               1.07x
>   429.mcf               1.00x
>   445.gobmk             1.02x
>   456.hmmer             0.98x
>   458.sjeng             0.99x
>   462.libquantum        1.03x
>   464.h264ref           1.06x
>   471.omnetpp           1.03x
>   473.astar             1.00x
>   483.xalancbmk         0.99x
>
> So I thought I should use existing rotation strategy.
>

Thanks for the testing. We will also do more internal testing first.

>
> By the way, please look at this pattern:
>
>             entry
>               |
>   ------> loop.header (body)
>   |97%    /       \
>   |      /50%      \50%
>   --- latch <--- if.then
>          |
>          |3%
>      loop.end
>
> Current strategy generates this order: entry loop.header if.then latch
> loop.end
> But I thought this pattern is also eligible to optimize, right?
>

yes -- it is possible to rotate -- but the current cost model decides it is
not a good thing to do. The main reason is that the estimated loop trip
count is small (~30), so the relative additional cost due to jumps added to
the loop header and the lost fall through to the exit BB from the loop is
high.  Loop rotation will happen if

1) -jump-inst-cost=0 option is used, or
2) increase the loop trip count to larger value such as 100 (with backedge
weights of 100 and 1) or larger.

David

>
> The cost of loop rotation calculated by current rotation strategy:
>
>   BB#1 ('header') to the top: 273
>   BB#4 ('if.then') to the top: 297
>   BB#2 ('latch') to the top: 297
>
>
> ================
> Comment at: lib/CodeGen/AtomicExpandPass.cpp:932-934
> @@ -929,3 +931,5 @@
>
> -  Builder.CreateCondBr(Success, ExitBB, LoopBB);
> +  // Set LoopBB -> ExitBB and LoopBB -> LoopBB with 50% probability, so
> block
> +  // placement won't apply aggressive-best-top on LoopBB
> +  Builder.CreateCondBr(Success, ExitBB, LoopBB, BrWeight);
>
> ----------------
> chandlerc wrote:
> > This comment doesn't really help here. What is "aggressive-best-top" and
> what does it mean?
> >
> > I think what you're actually doing is trying to give probabilities to
> override the default loop probabilities because the spin loop for an atomic
> isn't expected to loop often if at all. I'm actually surprised you would
> have a 50% probability of looping here, I would have expected a relatively
> low probability of spinning to be more appropriate.
> >
> > I think you should:
> >
> > 1) split this out into an independent change with its own description
> and test case
> > 2) improve this comment to talk generically about the fact that the loop
> isn't expected to be taken unless there is contention.
> >
> > Then the rest of this change can be predicated on that. I'm assuming
> that you can test this independently because its actually annotating the IR.
> Exactly! And yes, that loop isn't expected to be taken, I should use lower
> value.
>
> ================
> Comment at: lib/CodeGen/MachineBlockPlacement.cpp:114-117
> @@ -113,1 +113,6 @@
>
> +static cl::opt<bool>
> +    AggressiveBestTop("aggressive-best-top",
> +    cl::desc("Find best top from all latches even with conditional
> exit."),
> +    cl::init(true), cl::Hidden);
> +
> ----------------
> chandlerc wrote:
> > Any particular reason for a flag? Or for calling this aggressive? It
> seems pretty straight forward to try to select the best loop top among
> latches.
> For testing purpose :P
>
> ================
> Comment at: test/CodeGen/X86/code_placement_loop_rotation2.ll:8
> @@ -7,1 +7,3 @@
>  ; CHECK-LABEL: foo:
> +; CHECK: jmp
> +; CHECK: callq h
> ----------------
> davidxl wrote:
> > By the way, this example shows that the new expected layout is not as
> optimal as the one when -force-precise-rotation-cost is on (see below
> CHECK-PROFILE).
> >
> > The total branch cost of the optimal layout [e, f, c, d, h, b, g ] is:
> >
> > C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c)  + C(g->h)
> >
> > while the total cost of aggressive loop top layout [h, b, g, f, c, d, e]
> is
> >
> > C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c)  + C(g->h) + C(e->f)
> >
> > Edge e->f is in the inner loop and it is a hot edge , the additional
> cost of C(e->f) can be high.
> Thanks for pointing out this : D
>
>
> http://reviews.llvm.org/D20017
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160512/befd08c4/attachment.html>