[PATCH] D28593: Update loop branch_weight metadata after loop rotation.

Thu Jan 12 13:21:27 PST 2017

On Thu, Jan 12, 2017 at 11:58 AM, Michael Kuperstein <mkuper at google.com> wrote:
> I'm not disagreeing with you. We definitely don't want to drop the metadata
> completely.
> It's just a question of which assumptions we want to make.
>
> The problem is that the cases where the "normal" assumptions start to break
> down are exactly the cases we care about. We don't care whether for a loop
> with 1000000 iterations, we estimate the iteration count as 1000000 or
> 1000001, since we're unlikely to take any action based on that. But it is
> probably more important to distinguish loops that, on average, have 0
> iterations from the ones that have 1.
>
> So, the question is - what should happen when the average iteration count of
> the loop, pre-rotation, is < 1?

I break down the problem as this.

* The loop have a very average-like tripcount distribution.  (i.e. no
case like some iterations have very big trip counts and the rest have
close to zero).  In this case, the patch handles it.

* The loop does have a very polarized distribution. i.e. cases where a
few iterations takes all the trip counts and rest have close to zero.
I think we will do fine for this case as well as I explain follow.

In this case, we basically underestimate the weight of the branch from
guard block to exit block, as there should be more iterations of the
loop that exits before even one iteration is executed. This will
propagate into the CFG and result in us underestimating the average
tripcount of the loops when it actually executes (now we have fewer
iterations responsible for all these body weight). This is OK, we
become less aggressive in peeling, we will lose some performance, but
only if the loop distribution is polarized.

For loops with *average* trip count < 1, after thinking about it, we should
> And do we have any other interesting cases where this is gives us bad
> results? ("Bad" as opposed to "wrong" - that is, probability assignments
> that preserve all the right invariants, but lead unrolling/peeling to make a
> decision we don't want...)
>
> Dehao may have a somewhat different opinion. :-)
>
> On Thu, Jan 12, 2017 at 11:45 AM, Xin Tong <trent.xin.tong at gmail.com> wrote:
>>
>> On Thu, Jan 12, 2017 at 11:31 AM, Michael Kuperstein <mkuper at google.com>
>> wrote:
>> > Dehao and I spent some time staring at this on a whiteboard, and we
>> > don't
>> > have a good conclusion.
>> > What loop rotation does now looks technically wrong, but it's not clear
>> > what
>> > the right fix is. This is basically the same problem as we have inside
>> > loop
>> > peeling - there isn't really enough information to assign probabilities
>> > after rotation without knowing the actual distribution.
>>
>> I have the similar feeling, its difficult to make this precise without
>> knowing the distribution of the loop trip count. what we have is
>> merely a summation of all the trip counts for all iterations of the
>> loop.
>>
>> >
>> > We can't just say "If the average trip of the original loop count is X,
>> > the
>> > average trip count of the rotated loop is X-1" and update the edges
>> > accordingly.
>>
>> This is not optimal for sure. I think we have 2 options here.
>> 1. We drop the metadata completely, this is not what we want because
>> it will defeat the purpose of PGO completely.
>> 2. We try to make it as good as possible by making assumptions. Right
>> now I feel what we have in LoopRotation is wrong as it makes no
>> attempt to do anything about the should-be-adjusted branch weights.
>> Having blind branch_weights can be
>> counter-productive for PGO as optimizations will make decisions based
>> on those imprecise data.
>>
>> > As Dehao's example showed, this breaks down, for example, when the
>> > average
>> > trip count is below 1. It's even worse for the first iteration edge -
>> > the
>> > probability is different for "a loop that is always taken 100 times" (in
>> > which case it's 100/0), and "a loop that is taken 100 times on average -
>> > half the time it's 0, half the time it's 200" (in which case it's
>> > 50/50)...
>>
>> Yes I considered this case, as I said its not perfect and I feel
>> assuming the loop has similar # of iterations is a reasonable tradeoff
>> here. If we can make that assumption in loop peel, i do not see why we
>> can not during loop rotation.
>> >
>> > We'll stare at this some more later.
>> >
>> > On Thu, Jan 12, 2017 at 10:37 AM, Xin Tong <trent.xin.tong at gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Jan 12, 2017 at 10:29 AM, Xin Tong <trent.xin.tong at gmail.com>
>> >> wrote:
>> >> > On Thu, Jan 12, 2017 at 10:26 AM, Dehao Chen <danielcdh at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> On Thu, Jan 12, 2017 at 9:26 AM, Michael Kuperstein
>> >> >> <mkuper at google.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> I haven't looked at the patch yet, (and I'm also not at the
>> >> >>> office),
>> >> >>> but:
>> >> >>>
>> >> >>> 1) The new probabilities should be the opposite of the last example
>> >> >>> Dehao
>> >> >>> had. That is, the external if should be 1/7, and the one on the
>> >> >>> backedge
>> >> >>> 7/28.
>> >> >>
>> >> >>
>> >> >> For the 2nd case, the expected trip count is 0.2, thus it's not
>> >> >> straight-forward to me how to set this. But if we make the expected
>> >> >> trip
>> >> >> count as 5, then looks to me the external if should be 5/1, and the
>> >> >> one
>> >> >> on
>> >> >> the backedge should be 4/1?
>> >> >>
>> >> >> And please make sure that when the input probability is the same,
>> >> >> the
>> >> >> output
>> >> >> probability should also be the same. E.g., if you annotate the
>> >> >> branch
>> >> >> as (5,
>> >> >> 1), and (10, 2), after the transformation, the probabilities should
>> >> >> be
>> >> >> equivalent.
>> >> >>
>> >> >>>
>> >> >>> 2) I think this is just conditional probability - cond hasn't
>> >> >>> changed,
>> >> >>> but
>> >> >>> P(cond | cond has been true once) is lower than P(cond). This
>> >> >>> exactly
>> >> >>> corresponds to lowering the number of expected iterations by 1,
>> >> >>> given
>> >> >>> that
>> >> >>> we know we already executed one iteration.
>> >> >>
>> >> >>
>> >> >> Seem with this patch, when we compute the estimated trip count, we
>> >> >> need
>> >> >> to
>> >> >> first check if it's rotated or not: if yes, we use
>> >> >> (BE.taken/BE.not_taken) +
>> >> >> 1, otherwise, we use (BE.taken/BE.not_taken). Is that true?
>> >> >
>> >>
>> >> Sorry, Dehao I feel i talked past you, our current way of computing
>> >> estimated trip count requires loop to be rotated.
>> >> Yes, in the general case, the trip count of the loop depends on
>> >> whether the loop is rotated or not as you said.
>> >>
>> >> > The way we compute EstimatedTripCount requires the loop to be
>> >> > rotated.
>> >> > i.e. we only look at the latch block and expect it to be bottom
>> >> > tested.
>> >> >>
>> >> >> Dehao
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Jan 12, 2017 09:21, "Xin Tong via Phabricator"
>> >> >>> <reviews at reviews.llvm.org> wrote:
>> >> >>>
>> >> >>> trentxintong added a comment.
>> >> >>>
>> >> >>> @danielcdh Its the true the cond is not changed, but the inputs to
>> >> >>> the
>> >> >>> condition has changed, before you were testing with index variable
>> >> >>> in
>> >> >>> the
>> >> >>> loop header, now u have 2 conditions and the SSA values you use in
>> >> >>> the
>> >> >>> conditions have changed, so the probability needs to be adjusted. I
>> >> >>> will
>> >> >>> explain more when i get to office.
>> >> >>>
>> >> >>>
>> >> >>> https://reviews.llvm.org/D28593
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>
>> >
>> >
>
>