[PATCH] Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.

Cong Hou congh at google.com
Tue Jul 7 16:33:19 PDT 2015


On Tue, Jul 7, 2015 at 4:13 PM, Xinliang David Li <davidxl at google.com>
wrote:

> On Tue, Jul 7, 2015 at 4:00 PM, Cong Hou <congh at google.com> wrote:
> > On Tue, Jul 7, 2015 at 3:35 PM, Xinliang David Li <davidxl at google.com>
> > wrote:
> >>
> >> The already rotated inner loop needs to be treated as a single node
> >> when participating in parent loop's rotation, otherwise it may end up
> >> with wasting compile time and a suboptimal solution.
> >>
> >> Consider the loop nest:
> >>
> >> Entry
> >> do { // outer loop
> >>
> >>  B0
> >>  if (...) {
> >>   do {    // inner loop
> >>     B1
> >>     if (..) {
> >>      B2
> >>     } else {
> >>      B3
> >>     }
> >>     B4
> >>   } while (..);  // inner loop
> >>  }
> >> else {
> >>   B6
> >> }
> >> B5;
> >> } while (...); // outer loop
> >>
> >> B7
> >>
> >>
> >> The optimal inner loop layout is B3 B4 B1 B2;
> >> The original outloop layout is :   B0 B6 (B3 B4 B1 B2) B5
> >>
> >> The optimal rotation of the outer loop should produce:    (B3 B4 B1
> >> B2) B5 B0 B6, with the final layout be:
> >>
> >> Entry ((B3 B4 B1 B2) B5 B0 B6) B7
> >>
> >> However the current algorithm may produce  Entry (B5 B0 B6 (B3 B4 B1
> >> B2)) B7  because it will have the same cost as the the optimal one.
> >> The problem is that cost analysis needs to consider the edge from
> >> inside of the inner loop to the top of the outer loop chain -- in the
> >> bad solution, there is an edge from B4 to B5 whose cost should be
> >> considered as part-3 cost.
> >
> >
> > Why the layout  Entry (B5 B0 B6 (B3 B4 B1B2)) B7 is worse than Entry
> ((B3 B4
> > B1 B2) B5 B0 B6) B7? In both cases, the inner loop chain is not split
> and is
> > treated as a single node.
> >
>
> It matters when the outer loop is hot and the inner loop has
> relatively small trip count.
>
> > The edge from B4 to B5 is relatively cold comparing to edges in the inner
> > loop. It won't be a fall-through as B4 is not the tail of the inner loop
> > chain.
>
> That is true, but treating B4 to B5 as a 'fall through' also enables
> the layout such that layout in both paths of the outer loop are more
> compact. In the suboptimal case, B6 is sitting in the middle of B5 and
>  B0 which has lower cache utilization.
>

In the suboptimal case, B6 is not in the middle of B5 and B0. I guess you
mean B5 B0 B6 are separated from B7? So it is better to put B5 B0 B6 B7
together? But how about Entry? If we treat Entry the same as B7, there is
still no significant difference between the two layouts shown above.

If the inner loop is very cold (like < 20%), we need to take it out from
the outer loop chain. I will tackle this issue in the coming patch.


Cong


>
> Given that the optimal outer loop layout does not actually reduce the
> branch cost (but only icache reuse), I am fine leaving the
> implementation as is assuming making it optimal requires a lot of
> effort. I will be more comfortable if a loop nest example is added as
> a test.
>
> David
>
>
> >
> >
> > Cong
> >
> >
> >>
> >>
> >> David
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Jul 6, 2015 at 10:22 AM, Cong Hou <congh at google.com> wrote:
> >> > When the outer loop is rotated, the inner loop is already linked to
> >> > other
> >> > CFG nodes in the outer loop. So I think we won't have to adjust the
> >> > current
> >> > algorithm as we have already considered all costs the rotation may
> bring
> >> > or
> >> > reduce.
> >> >
> >> >
> >> > thanks,
> >> > Cong
> >> >
> >> > On Mon, Jul 6, 2015 at 9:57 AM, Xinliang David Li <
> xinliangli at gmail.com>
> >> > wrote:
> >> >>
> >> >> Does the cost analysis work well for loop nest? After the inner loop
> >> >> chain
> >> >> is formed and rotated, it will be later be merged into the parent
> loop
> >> >> chain. The cost analysis for the parent loop may need to be adjusted
> to
> >> >> consider the inner loops that are already rotated.
> >> >>
> >> >> David
> >> >>
> >> >> On Tue, Jun 30, 2015 at 2:29 PM, Cong Hou <congh at google.com> wrote:
> >> >>>
> >> >>> Update the patch by adding two opt parameters that define the cost
> of
> >> >>> misfetch and jump instruction, and use them when rotating loops.
> >> >>>
> >> >>>
> >> >>> http://reviews.llvm.org/D10717
> >> >>>
> >> >>> Files:
> >> >>>   lib/CodeGen/MachineBlockPlacement.cpp
> >> >>>   test/CodeGen/X86/code_placement_loop_rotation.ll
> >> >>>
> >> >>> EMAIL PREFERENCES
> >> >>>   http://reviews.llvm.org/settings/panel/emailpreferences/
> >> >>>
> >> >>> _______________________________________________
> >> >>> llvm-commits mailing list
> >> >>> llvm-commits at cs.uiuc.edu
> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >> >>>
> >> >>
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150707/a8ab36a6/attachment.html>


More information about the llvm-commits mailing list