[PATCH] Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.

Mon Jun 29 19:46:30 PDT 2015

On Mon, Jun 29, 2015 at 7:42 PM, Cong Hou <congh at google.com> wrote:
> On Mon, Jun 29, 2015 at 6:52 PM, Xinliang David Li <davidxl at google.com>
> wrote:
>>
>> On Mon, Jun 29, 2015 at 5:14 PM, Xinliang David Li <davidxl at google.com>
>> wrote:
>> >>> > The patch here can handle the example proposed by you. We consider
>> >>> > to
>> >>> > rotate
>> >>> > the chain B1 B2 B3 B4, and the result B3 B4 B1 B2 is best. This is
>> >>> > because
>> >>> > there is no edge from B2 to B3, so in our "benifit" analysis this
>> >>> > won't
>> >>> > become a negative contribution to the benefit.
>> >>>
>> >>> True, but It is unclear whether it works by accident or by design?  We
>> >>> need to be able to prove that the current benefit analysis is good for
>> >>> arbitrary control flow in loop.
>> >>>
>> >>
>> >> When rotating the loop, all impacts in terms of our cost model are
>> >> considered, so this patch should not hurt the performance if our cost
>> >> model
>> >> is correct.
>> >
>> > yes -- that is what I'd like to hear but analytically prove it will be
>> > better :)
>>
>> Thinking about it a little more -- your 3-parts cost function is
>> actually complete and can be proved. Imagining the ideal base layout
>> where the tail to head edge is also a fall-through -- this layout has
>> a fixed cost/benefit. To compute the cost or benefit of a rotation, we
>> only need to consider the delta between when tail-> head edge converts
>> from fall-through to non-fall-through edge.
>>
>> There is a minor issue with the part-3 computation. It is the case
>> when tail to head is a conditional branch, and the other edge also
>> loops back into the loop: in this case should you also consider the
>> additional cost of needing another unconditional jump ?
>
>
> This is a good point! In this case we also need to consider the cost of the
> additional jump. If we assume the cost of misfetch and jmp instruction are
> identical, then the penalty here should be the frequency of the tail node
> not the frequency of the edge from tail to head. I will update the patch to
> reflect this new conclusion. Thanks!

See my previous comments. I think it is better to split the cost
weights into two tunable parameters. The default setting may be target
dependent. It also makes the code more readable.

David

>
> Cong
>
>>
>>
>> David
>>
>> >
>> >> In some cases, with the current LLVM's bb-layout algorithm,
>> >> rotating loop could not give us the best result: this is not the fault
>> >> of
>> >> the loop rotation, but how we build chains for loops. In other words,
>> >> in the
>> >> future if we improve the loop chain builds in LLVM, we can still use
>> >> the
>> >> same loop rotation algorithm.
>> >
>> > yes, because only one dimension of the search space is explored.
>> > Another dimension will be the base layout (e.g, considering splitting
>> > etc).  My comment is not about your algorithm to find the optimal
>> > rotation in this dimension (which is solid), but about where the 3
>> > component cost function is sufficient.
>> >
>> > David
>> >>
>> >>
>> >>>
>> >>> A new test case like this one or other cases are useful in the patch
>> >>> too.
>> >>
>> >>
>> >> I will add a new test case with diamond CFG to this patch.
>> >>
>> >>
>> >> Cong
>> >>
>> >>
>> >>>
>> >>>
>> >>> David
>> >>>
>> >>>
>> >>>
>> >>> >
>> >>> >
>> >>> >>
>> >>> >>
>> >>> >> David
>> >>> >>
>> >>> >>
>> >>> >> >
>> >>> >> >
>> >>> >> > http://reviews.llvm.org/D10717
>> >>> >> >
>> >>> >> > EMAIL PREFERENCES
>> >>> >> >   http://reviews.llvm.org/settings/panel/emailpreferences/
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>
>> >>
>
>