[PATCH] Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.

Mon Jun 29 19:42:15 PDT 2015

On Mon, Jun 29, 2015 at 6:52 PM, Xinliang David Li <davidxl at google.com>
wrote:

> On Mon, Jun 29, 2015 at 5:14 PM, Xinliang David Li <davidxl at google.com>
> wrote:
> >>> > The patch here can handle the example proposed by you. We consider to
> >>> > rotate
> >>> > the chain B1 B2 B3 B4, and the result B3 B4 B1 B2 is best. This is
> >>> > because
> >>> > there is no edge from B2 to B3, so in our "benifit" analysis this
> won't
> >>> > become a negative contribution to the benefit.
> >>>
> >>> True, but It is unclear whether it works by accident or by design?  We
> >>> need to be able to prove that the current benefit analysis is good for
> >>> arbitrary control flow in loop.
> >>>
> >>
> >> When rotating the loop, all impacts in terms of our cost model are
> >> considered, so this patch should not hurt the performance if our cost
> model
> >> is correct.
> >
> > yes -- that is what I'd like to hear but analytically prove it will be
> better :)
>
> Thinking about it a little more -- your 3-parts cost function is
> actually complete and can be proved. Imagining the ideal base layout
> where the tail to head edge is also a fall-through -- this layout has
> a fixed cost/benefit. To compute the cost or benefit of a rotation, we
> only need to consider the delta between when tail-> head edge converts
> from fall-through to non-fall-through edge.
>
> There is a minor issue with the part-3 computation. It is the case
> when tail to head is a conditional branch, and the other edge also
> loops back into the loop: in this case should you also consider the
> additional cost of needing another unconditional jump ?
>

This is a good point! In this case we also need to consider the cost of the
additional jump. If we assume the cost of misfetch and jmp instruction are
identical, then the penalty here should be the frequency of the tail node
not the frequency of the edge from tail to head. I will update the patch to
reflect this new conclusion. Thanks!

Cong

>
> David
>
> >
> >> In some cases, with the current LLVM's bb-layout algorithm,
> >> rotating loop could not give us the best result: this is not the fault
> of
> >> the loop rotation, but how we build chains for loops. In other words,
> in the
> >> future if we improve the loop chain builds in LLVM, we can still use the
> >> same loop rotation algorithm.
> >
> > yes, because only one dimension of the search space is explored.
> > Another dimension will be the base layout (e.g, considering splitting
> > etc).  My comment is not about your algorithm to find the optimal
> > rotation in this dimension (which is solid), but about where the 3
> > component cost function is sufficient.
> >
> > David
> >>
> >>
> >>>
> >>> A new test case like this one or other cases are useful in the patch
> too.
> >>
> >>
> >> I will add a new test case with diamond CFG to this patch.
> >>
> >>
> >> Cong
> >>
> >>
> >>>
> >>>
> >>> David
> >>>
> >>>
> >>>
> >>> >
> >>> >
> >>> >>
> >>> >>
> >>> >> David
> >>> >>
> >>> >>
> >>> >> >
> >>> >> >
> >>> >> > http://reviews.llvm.org/D10717
> >>> >> >
> >>> >> > EMAIL PREFERENCES
> >>> >> >   http://reviews.llvm.org/settings/panel/emailpreferences/
> >>> >> >
> >>> >> >
> >>> >
> >>> >
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150629/51a93df7/attachment.html>